Vol. 1 14-13
PROGRAMMING WITH AVX, FMA AND AVX2
Description of Column “Reason not promoted?”
MMX: Instructions referencing MMX registers do not support VEX
Scalar: Scalar instructions are not promoted to 256-bit
integer: integer instructions are not promoted.
VI: “Vector Integer” instructions are not promoted to 256-bit
14.2.4
Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing
Intel AVX provides a full complement of data processing instructions that employ VEX-prefix encoding. These VEX-
encoded instructions generally provide the same functionality over instructions operating on XMM register that are
encoded using SIMD prefixes.
A subset of new functionalities listed in Table 14-4 is also extended via VEX.128 encoding. These enhancements in
AVX on 128-bit data processing primitives include 11 new instructions (see Table 14-6) with the following capabil-
ities:
•
Non-unit-strided fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching
primitives:
— broadcast of single data element into a 128-bit destination,
— masked move primitives to load or store SIMD data elements conditionally,
•
Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data
manipulation primitives:
— permute primitives to facilitate efficient manipulation of floating-point data elements in 128-bit SIMD
registers
•
Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:
— new variable blend instructions supports four-operand syntax with non-destructive source syntax.
Branching conditions dependent on floating-point data or integer data can benefit from Intel AVX. This is
more flexible than non-VEX encoded instruction syntax that uses the XMM0 register as implied mask for
blend selection. While variable blend with implied XMM0 syntax is supported in SSE4 using SIMD prefix
encoding, VEX-encoded 128-bit variable blend instructions only support the more flexible four-operand
syntax.
— Packed TEST instructions for floating-point data.
no
yes
AESDEC, AESDECLAST
VI
no
yes
AESENC, AESENCLAST
VI
no
yes
AESIMX, AESKEYGENASSIST
VI
Table 14-6. 128-bit AVX Instruction Enhancement
Instruction
Description
VBROADCASTSS xmm1, m32
Broadcast single-precision floating-point element in mem to four locations in xmm1.
VMASKMOVPS xmm1, xmm2, m128
Load packed single-precision values from mem using mask in xmm2 and store in xmm1
VMASKMOVPD xmm1, xmm2, m128
Load packed double-precision values from mem using mask in xmm2 and store in xmm1
VMASKMOVPS m128, xmm1, xmm2
Store packed single-precision values from xmm2 using mask in xmm1
VMASKMOVPD m128, xmm1, xmm2
Store packed double-precision values from xmm2 using mask in xmm1
Table 14-5. Promotion of Legacy SIMD ISA to 128-bit Arithmetic AVX instruction
VEX.256
Encoding
VEX.128
Encoding
Instruction
Reason Not Promoted