Page 372

15-8 Vol. 1

PROGRAMMING WITH INTEL® AVX-512

15.5

ACCESSING XMM, YMM AND ZMM REGISTERS

The lower 128 bits of a YMM register is aliased to the corresponding XMM register. Legacy SSE instructions (i.e.,
SIMD instructions operating on XMM state but not using the VEX prefix, also referred to non-VEX encoded SIMD
instructions) will not access the upper bits (MAX_VL-1:128) of the YMM registers. AVX and FMA instructions with a
VEX prefix and vector length of 128-bits zeroes the upper 128 bits of the YMM register.
Upper bits of YMM registers (255:128) can be read and written to by many instructions with a VEX.256 prefix.
XSAVE and XRSTOR may be used to save and restore the upper bits of the YMM registers.
The lower 256 bits of a ZMM register are aliased to the corresponding YMM register. Legacy SSE instructions (i.e.,
SIMD instructions operating on XMM state but not using the VEX prefix, also referred to non-VEX encoded SIMD
instructions) will not access the upper bits (MAX_VL-1:128) of the ZMM registers, where MAX_VL is maximum
vector length (currently 512 bits). AVX and FMA instructions with a VEX prefix and vector length of 128-bits zero
the upper 384 bits of the ZMM register, while the VEX prefix and vector length of 256-bits zeroes the upper 256 bits
of the ZMM register.
Upper bits of ZMM registers (511:256) can be read and written to by instructions with an EVEX.512 prefix.

15.6

ENHANCED VECTOR PROGRAMMING ENVIRONMENT USING EVEX

ENCODING

EVEX-encoded AVX-512 instructions support an enhanced vector programming environment. The enhanced vector
programming environment uses the combination of EVEX bit-field encodings and a set of eight opmask registers to
provide the following capabilities:

•

Conditional vector processing of an EVEX-encoded instruction. Opmask registers k1 through k7 can be used to
conditionally govern the per-data-element computational operation and the per-element updates to the
destination operand of an AVX-512 Foundation instruction. Each bit of the opmask register governs one vector
element operation (a vector element can be 8 bits, 16 bits, 32 bits or 64 bits).

•

In addition to providing predication control on vector instructions via EVEX bit-field encoding, the opmask
registers can also be used similarly on general-purpose registers as source/destination operands using modR/M
encoding for non-mask-related instructions. In this case, an opmask register k0 through k7 can be selected.

•

In 64-bit mode, 32 vector registers can be encoded using the EVEX prefix.

•

Broadcast may be supported for some instructions on the operand that can be encoded as a memory vector.
The data elements of a memory vector may be conditionally fetched or written to, and the vector size is
dependent on the data transformation function.

•

Flexible rounding control for the register-to-register flavor of EVEX encoded 512-bit and scalar instructions.
Four rounding modes are supported by direct encoding within the EVEX prefix, overriding MXCSR settings.

•

Broadcast of one element to the rest of the destination vector register.

•

Compressed 8-bit displacement encoding scheme to increase the instruction encoding density for instructions
that normally require disp32 syntax.

Table 15-3. Instruction Mnemonics That Do Not Support EVEX.128 Encoding

Instruction Group

Instruction Mnemonics Supporting EVEX.256 Only Using AVX512VL

AVX512F

VBROADCASTSD, VBROADCASTF32X4, VEXTRACTI32X4, VINSERTF32X4, VINSERTI32X4, VPERMD,

VPERMPD, VPERMPS, VPERMQ, VSHUFF32X4, VSHUFF64X2, VSHUFI32X4, VSHUFI64X2

AVX512CD

AVX512DQ

VBROADCASTF32X2, VBROADCASTF64X2, VBROADCASTI32X4, VBROADCASTI64X2, VEXTRACTI64X2,

VINSERTF64X2, VINSERTI64X2,

AVX512BW