2-40 Vol. 2A
INSTRUCTION FORMAT
where EVEX encoded instructions are classified using the tupletype attribute. The scale factor N of each tupletype
is listed based on the vector length (VL) and other factors affecting it.
Table 2-34 covers EVEX-encoded instructions which has a load semantic in conjunction with additional computa-
tional or data element movement operation, operating either on the full vector or half vector (due to conversion of
numerical precision from a wider format to narrower format). EVEX.b is supported for such instructions for data
element sizes which are either dword or qword (see Section 2.6.11).
EVEX-encoded instruction that are pure load/store, and “Load+op” instruction semantic that operate on data
element size less then dword do not support broadcasting using EVEX.b. These are listed in Table 2-35. Table 2-35
also includes many broadcast instructions which perform broadcast using a subset of data elements without using
EVEX.b. These instructions and a few data element size conversion instruction are covered in Table 2-35. Instruc-
tion classified in Table 2-35 do not use EVEX.b and EVEX.b must be 0, otherwise #UD will occur.
The tupletype abbreviation will be referenced in the instruction operand encoding table in the reference page of
each instruction, providing the cross reference for the scaling factor N to encoding memory addressing operand.
Note that the disp8*N rules still apply when using 16b addressing.
Table 2-34. Compressed Displacement (DISP8*N) Affected by Embedded Broadcast
TupleType
EVEX.b InputSize EVEX.W Broadcast N (VL=128) N (VL=256)
N (VL= 512)
Comment
Full Vector
(FV)
0
32bit
0
none
16
32
64
Load+Op (Full Vector
Dword/Qword)
1
32bit
0
{1tox}
4
4
4
0
64bit
1
none
16
32
64
1
64bit
1
{1tox}
8
8
8
Half Vector
(HV)
0
32bit
0
none
8
16
32
Load+Op (Half Vector)
1
32bit
0
{1tox}
4
4
4
Table 2-35. EVEX DISP8*N for Instructions Not Affected by Embedded Broadcast
TupleType
InputSize EVEX.W N (VL= 128) N (VL= 256) N (VL= 512)
Comment
Full Vector Mem (FVM)
N/A
N/A
16
32
64
Load/store or subDword full vector
Tuple1 Scalar (T1S)
8bit
N/A
1
1
1
1Tuple less than Full Vector
16bit
N/A
2
2
2
32bit
0
4
4
4
64bit
1
8
8
8
Tuple1 Fixed (T1F)
32bit
N/A
4
4
4
1 Tuple memsize not affected by
EVEX.W
64bit
N/A
8
8
8
Tuple2 (T2)
32bit
0
8
8
8
Broadcast (2 elements)
64bit
1
NA
16
16
Tuple4 (T4)
32bit
0
NA
16
16
Broadcast (4 elements)
64bit
1
NA
NA
32
Tuple8 (T8)
32bit
0
NA
NA
32
Broadcast (8 elements)
Half Mem (HVM)
N/A
N/A
8
16
32
SubQword Conversion
QuarterMem (QVM)
N/A
N/A
4
8
16
SubDword Conversion
OctMem (OVM)
N/A
N/A
2
4
8
SubWord Conversion
Mem128 (M128)
N/A
N/A
16
16
16
Shift count from memory
MOVDDUP (DUP)
N/A
N/A
8
32
64
VMOVDDUP