background image

Vol. 1 14-11

PROGRAMMING WITH AVX, FMA AND AVX2

14.2.3 

Arithmetic Primitives for 128-bit Vector and Scalar processing

Intel AVX provides a full complement of 128-bit numeric processing instructions that employ VEX-prefix encoding. 
These VEX-encoded instructions generally provide the same functionality over instructions operating on XMM 
register that are encoded using SIMD prefixes. The 128-bit numeric processing instructions in AVX cover floating-
point and integer data processing; across 128-bit vector and scalar processing. Table 14-5 lists the state of promo-
tion of legacy SIMD arithmetic ISA to VEX-128 encoding. Legacy SIMD floating-point arithmetic ISA promoted to 
VEX-256 encoding also support VEX-128 encoding (see Table 14-2).
The enhancement in AVX on 128-bit floating-point compare operation provides 32 conditional predicates to 
improve programming flexibility in evaluating conditional expressions. This contrasts with floating-point SIMD 
compare instructions in SSE and SSE2 supporting only 8 conditional predicates. 

VPERMILPD ymm1, ymm2/m256 imm8

Permute Double-Precision Floating-Point values in ymm2/mem using controls from imm8 

and store result in ymm1

VPERMILPS ymm1, ymm2, ymm/m256

Permute Single-Precision Floating-Point values in ymm2 using controls from ymm3/mem 

and store result in ymm1

VPERMILPS ymm1, ymm2/m256, imm8

Permute Single-Precision Floating-Point values in ymm2/mem using controls from imm8 

and store result in ymm1

VPERM2F128 ymm1, ymm2, 

ymm3/m256, imm8

Permute 128-bit floating-point fields in ymm2 and ymm3/mem using controls from imm8 

and store result in ymm1

VTESTPS ymm1, ymm2/m256

Set ZF if ymm2/mem AND ymm1 result is all 0s in packed single-precision sign bits. Set CF 

if ymm2/mem AND NOT ymm1 result is all 0s in packed single-precision sign bits.

VTESTPD ymm1, ymm2/m256

Set ZF if ymm2/mem AND ymm1 result is all 0s in packed double-precision sign bits. Set 

CF if ymm2/mem AND NOT ymm1 result is all 0s in packed double-precision sign bits.

VZEROALL

Zero all YMM registers

VZEROUPPER

Zero upper 128 bits of all YMM registers

Table 14-5.  Promotion of Legacy SIMD ISA to 128-bit Arithmetic AVX instruction 

VEX.256 

Encoding

VEX.128 

Encoding

Instruction

Reason Not Promoted

no

no

CVTPI2PS, CVTPI2PD, CVTPD2PI

MMX

no

no

CVTTPS2PI, CVTTPD2PI, CVTPS2PI

MMX

no

yes

CVTSI2SS, CVTSI2SD, CVTSD2SI

scalar

no

yes

CVTTSS2SI, CVTTSD2SI, CVTSS2SI

scalar

no

yes

COMISD, RSQRTSS, RCPSS

scalar

no

yes

UCOMISS, UCOMISD, COMISS, 

scalar

no

yes

ADDSS, ADDSD, SUBSS, SUBSD

scalar

no

yes

MULSS, MULSD, DIVSS, DIVSD

scalar

no

yes

SQRTSS, SQRTSD

scalar

no

yes

CVTSS2SD, CVTSD2SS

scalar

no

yes

MINSS, MINSD, MAXSS, MAXSD

scalar

no

yes

PAND, PANDN, POR, PXOR

VI

no

yes

PCMPGTB, PCMPGTW, PCMPGTD

VI

Table 14-4.  256-bit AVX Instruction Enhancement

Instruction

Description