background image

Vol. 1 14-13

PROGRAMMING WITH AVX, FMA AND AVX2

Description of Column “Reason not promoted?”
MMX: Instructions referencing MMX registers do not support VEX
Scalar: Scalar instructions are not promoted to 256-bit
integer: integer instructions are not promoted.
VI: “Vector Integer” instructions are not promoted to 256-bit

14.2.4 

Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing

Intel AVX provides a full complement of data processing instructions that employ VEX-prefix encoding. These VEX-
encoded instructions generally provide the same functionality over instructions operating on XMM register that are 
encoded using SIMD prefixes. 
A subset of new functionalities listed in Table 14-4 is also extended via VEX.128 encoding. These enhancements in 
AVX on 128-bit data processing primitives include 11 new instructions (see Table 14-6) with the following capabil-
ities:

Non-unit-strided fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching 
primitives: 
— broadcast of single data element into a 128-bit destination,
— masked move primitives to load or store SIMD data elements conditionally,

Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data 
manipulation primitives: 
— permute primitives to facilitate efficient manipulation of floating-point data elements in 128-bit SIMD 

registers

Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:
— new variable blend instructions supports four-operand syntax with non-destructive source syntax. 

Branching conditions dependent on floating-point data or integer data can benefit from Intel AVX. This is 
more flexible than non-VEX encoded instruction syntax that uses the XMM0 register as implied mask for 
blend selection. While variable blend with implied XMM0 syntax is supported in SSE4 using SIMD prefix 
encoding, VEX-encoded 128-bit variable blend instructions only support the more flexible four-operand 
syntax.

— Packed TEST instructions for floating-point data.

no

yes

AESDEC, AESDECLAST

VI

no

yes

AESENC, AESENCLAST

VI

no

yes

AESIMX, AESKEYGENASSIST

VI

Table 14-6.  128-bit AVX Instruction Enhancement

Instruction

Description

VBROADCASTSS xmm1, m32

Broadcast single-precision floating-point element in mem to four locations in xmm1.

VMASKMOVPS xmm1, xmm2, m128

Load packed single-precision values from mem using mask in xmm2 and store in xmm1

VMASKMOVPD xmm1, xmm2, m128

Load packed double-precision values from mem using mask in xmm2 and store in xmm1

VMASKMOVPS m128, xmm1, xmm2

Store packed single-precision values from xmm2 using mask in xmm1

VMASKMOVPD m128, xmm1, xmm2

Store packed double-precision values from xmm2 using mask in xmm1

Table 14-5.  Promotion of Legacy SIMD ISA to 128-bit Arithmetic AVX instruction 

VEX.256 

Encoding

VEX.128 

Encoding

Instruction

Reason Not Promoted