Page 339

Vol. 1 14-11

PROGRAMMING WITH AVX, FMA AND AVX2

14.2.3

Arithmetic Primitives for 128-bit Vector and Scalar processing

Intel AVX provides a full complement of 128-bit numeric processing instructions that employ VEX-prefix encoding.
These VEX-encoded instructions generally provide the same functionality over instructions operating on XMM
register that are encoded using SIMD prefixes. The 128-bit numeric processing instructions in AVX cover floating-
point and integer data processing; across 128-bit vector and scalar processing. Table 14-5 lists the state of promo-
tion of legacy SIMD arithmetic ISA to VEX-128 encoding. Legacy SIMD floating-point arithmetic ISA promoted to
VEX-256 encoding also support VEX-128 encoding (see Table 14-2).
The enhancement in AVX on 128-bit floating-point compare operation provides 32 conditional predicates to
improve programming flexibility in evaluating conditional expressions. This contrasts with floating-point SIMD
compare instructions in SSE and SSE2 supporting only 8 conditional predicates.

VPERMILPD ymm1, ymm2/m256 imm8

Permute Double-Precision Floating-Point values in ymm2/mem using controls from imm8

and store result in ymm1

VPERMILPS ymm1, ymm2, ymm/m256

Permute Single-Precision Floating-Point values in ymm2 using controls from ymm3/mem

and store result in ymm1

VPERMILPS ymm1, ymm2/m256, imm8

Permute Single-Precision Floating-Point values in ymm2/mem using controls from imm8

and store result in ymm1

VPERM2F128 ymm1, ymm2,

ymm3/m256, imm8

Permute 128-bit floating-point fields in ymm2 and ymm3/mem using controls from imm8

and store result in ymm1

VTESTPS ymm1, ymm2/m256

Set ZF if ymm2/mem AND ymm1 result is all 0s in packed single-precision sign bits. Set CF

if ymm2/mem AND NOT ymm1 result is all 0s in packed single-precision sign bits.

VTESTPD ymm1, ymm2/m256

Set ZF if ymm2/mem AND ymm1 result is all 0s in packed double-precision sign bits. Set

CF if ymm2/mem AND NOT ymm1 result is all 0s in packed double-precision sign bits.

VZEROALL

Zero all YMM registers

VZEROUPPER

Zero upper 128 bits of all YMM registers

Table 14-5. Promotion of Legacy SIMD ISA to 128-bit Arithmetic AVX instruction

VEX.256

Encoding

VEX.128

Encoding

Instruction

Reason Not Promoted

CVTPI2PS, CVTPI2PD, CVTPD2PI

MMX

CVTTPS2PI, CVTTPD2PI, CVTPS2PI

MMX

yes

CVTSI2SS, CVTSI2SD, CVTSD2SI

scalar

yes

CVTTSS2SI, CVTTSD2SI, CVTSS2SI

scalar

yes

COMISD, RSQRTSS, RCPSS

scalar

yes

UCOMISS, UCOMISD, COMISS,

scalar

yes

ADDSS, ADDSD, SUBSS, SUBSD

scalar

yes

MULSS, MULSD, DIVSS, DIVSD

scalar

yes

SQRTSS, SQRTSD

scalar

yes

CVTSS2SD, CVTSD2SS

scalar

yes

MINSS, MINSD, MAXSS, MAXSD

scalar

yes

PAND, PANDN, POR, PXOR

yes

PCMPGTB, PCMPGTW, PCMPGTD

Table 14-4. 256-bit AVX Instruction Enhancement

Instruction

Description