VPGATHERDD/VPGATHERDQ—Gather Packed Dword, Packed Qword with Signed Dword Indices
INSTRUCTION SET REFERENCE, V-Z
5-250 Vol. 2C
instruction is repeatable - given the same input values and architectural state, the same set of elements to the
left of the faulting one will be gathered.
•
This instruction does not perform AC checks, and so will never deliver an AC fault.
•
Not valid with 16-bit effective addresses. Will deliver a #UD fault.
•
These instructions do not accept zeroing-masking since the 0 values in k1 are used to determine completion.
Note that the presence of VSIB byte is enforced in this instruction. Hence, the instruction will #UD fault if
ModRM.rm is different than 100b.
This instruction has the same disp8*N and alignment rules as for scalar instructions (Tuple 1).
The instruction will #UD fault if the destination vector zmm1 is the same as index vector VINDEX. The instruction
will #UD fault if the k0 mask register is specified.
The scaled index may require more bits to represent than the address bits used by the processor (e.g., in 32-bit
mode, if the scale is greater than one). In this case, the most significant bits beyond the number of address bits are
ignored.
Operation
BASE_ADDR stands for the memory operand base address (a GPR); may not exist
VINDEX stands for the memory operand vector of indices (a ZMM register)
SCALE stands for the memory operand scalar (1, 2, 4 or 8)
DISP is the optional 1, 2 or 4 byte displacement
VPGATHERDD (EVEX encoded version)
(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j 0 TO KL-1
i j * 32
IF k1[j]
THEN DEST[i+31:i] MEM[BASE_ADDR +
SignExtend(VINDEX[i+31:i]) * SCALE + DISP]), 1)
k1[j] 0
ELSE *DEST[i+31:i] remains unchanged*
; Only merging masking is allowed
FI;
ENDFOR
k1[MAX_KL-1:KL] 0
DEST[MAX_VL-1:VL] 0
VPGATHERDQ (EVEX encoded version)
(KL, VL) = (2, 128), (4, 256), (8, 512)
FOR j 0 TO KL-1
i j * 64
k j * 32
IF k1[j]
THEN DEST[i+63:i]
MEM[BASE_ADDR + SignExtend(VINDEX[k+31:k]) * SCALE + DISP])
k1[j] 0
ELSE *DEST[i+63:i] remains unchanged*
; Only merging masking is allowed
FI;
ENDFOR
k1[MAX_KL-1:KL] 0
DEST[MAX_VL-1:VL] 0