Vol. 1 14-11
PROGRAMMING WITH AVX, FMA AND AVX2
14.2.3
Arithmetic Primitives for 128-bit Vector and Scalar processing
Intel AVX provides a full complement of 128-bit numeric processing instructions that employ VEX-prefix encoding.
These VEX-encoded instructions generally provide the same functionality over instructions operating on XMM
register that are encoded using SIMD prefixes. The 128-bit numeric processing instructions in AVX cover floating-
point and integer data processing; across 128-bit vector and scalar processing. Table 14-5 lists the state of promo-
tion of legacy SIMD arithmetic ISA to VEX-128 encoding. Legacy SIMD floating-point arithmetic ISA promoted to
VEX-256 encoding also support VEX-128 encoding (see Table 14-2).
The enhancement in AVX on 128-bit floating-point compare operation provides 32 conditional predicates to
improve programming flexibility in evaluating conditional expressions. This contrasts with floating-point SIMD
compare instructions in SSE and SSE2 supporting only 8 conditional predicates.
VPERMILPD ymm1, ymm2/m256 imm8
Permute Double-Precision Floating-Point values in ymm2/mem using controls from imm8
and store result in ymm1
VPERMILPS ymm1, ymm2, ymm/m256
Permute Single-Precision Floating-Point values in ymm2 using controls from ymm3/mem
and store result in ymm1
VPERMILPS ymm1, ymm2/m256, imm8
Permute Single-Precision Floating-Point values in ymm2/mem using controls from imm8
and store result in ymm1
VPERM2F128 ymm1, ymm2,
ymm3/m256, imm8
Permute 128-bit floating-point fields in ymm2 and ymm3/mem using controls from imm8
and store result in ymm1
VTESTPS ymm1, ymm2/m256
Set ZF if ymm2/mem AND ymm1 result is all 0s in packed single-precision sign bits. Set CF
if ymm2/mem AND NOT ymm1 result is all 0s in packed single-precision sign bits.
VTESTPD ymm1, ymm2/m256
Set ZF if ymm2/mem AND ymm1 result is all 0s in packed double-precision sign bits. Set
CF if ymm2/mem AND NOT ymm1 result is all 0s in packed double-precision sign bits.
VZEROALL
Zero all YMM registers
VZEROUPPER
Zero upper 128 bits of all YMM registers
Table 14-5. Promotion of Legacy SIMD ISA to 128-bit Arithmetic AVX instruction
VEX.256
Encoding
VEX.128
Encoding
Instruction
Reason Not Promoted
no
no
CVTPI2PS, CVTPI2PD, CVTPD2PI
MMX
no
no
CVTTPS2PI, CVTTPD2PI, CVTPS2PI
MMX
no
yes
CVTSI2SS, CVTSI2SD, CVTSD2SI
scalar
no
yes
CVTTSS2SI, CVTTSD2SI, CVTSS2SI
scalar
no
yes
COMISD, RSQRTSS, RCPSS
scalar
no
yes
UCOMISS, UCOMISD, COMISS,
scalar
no
yes
ADDSS, ADDSD, SUBSS, SUBSD
scalar
no
yes
MULSS, MULSD, DIVSS, DIVSD
scalar
no
yes
SQRTSS, SQRTSD
scalar
no
yes
CVTSS2SD, CVTSD2SS
scalar
no
yes
MINSS, MINSD, MAXSS, MAXSD
scalar
no
yes
PAND, PANDN, POR, PXOR
VI
no
yes
PCMPGTB, PCMPGTW, PCMPGTD
VI
Table 14-4. 256-bit AVX Instruction Enhancement
Instruction
Description