Vol. 1 15-9
PROGRAMMING WITH INTEL® AVX-512
15.6.1
OPMASK Register to Predicate Vector Data Processing
AVX-512 instructions using EVEX encode a predicate operand to conditionally control per-element computational
operation and updating of the result to the destination operand. The predicate operand is known as the opmask
register. The opmask is a set of eight architectural registers of size MAX_KL (64-bit). Note that from this set of eight
architectural registers, only k1 through k7 can be addressed as a predicate operand. k0 can be used as a regular
source or destination but cannot be encoded as a predicate operand. Note also that a predicate operand can be
used to enable memory fault-suppression for some instructions with a memory operand (source or destination).
As a predicate operand, the opmask registers contain one bit to govern the operation/update to each data element
of a vector register. In general, opmask registers can support instructions with all element sizes: byte (int8), word
(int16), single-precision floating-point (float32), integer doubleword(int32), double-precision floating-point
(float64), integer quadword (int64). Therefore, a ZMM vector register can hold 8, 16, 32, or 64 elements in prin-
ciple. The length of an opmask register, MAX_KL, is sufficient to handle up to 64 elements with one bit per element,
i.e., 64 bits. Masking is supported in most of the AVX-512 instructions. For a given vector length, each instruction
accesses only the number of least significant mask bits that are needed based on its data type. For example, AVX-
512 Foundation instructions operating on 64-bit data elements with a 512-bit vector length, only use the 8 least
significant bits of the opmask register.
An opmask register affects an AVX-512 instruction at per-element granularity. Any numeric or non-numeric oper-
ation of each data element and per-element updates of intermediate results to the destination operand are predi-
cated on the corresponding bit of the opmask register.
An opmask serving as a predicate operand in AVX-512 obeys the following properties:
•
The instruction’s operation is not performed for an element if the corresponding opmask bit is not set. This
implies that no exception or violation can be caused by an operation on a masked-off element. Consequently,
no MXCSR exception flag is updated as a result of a masked-off operation.
•
A destination element is not updated with the result of the operation if the corresponding writemask bit is not
set. Instead, the destination element value must be preserved (merging-masking) or it must be zeroed out
(zeroing-masking).
•
For some instructions with a memory operand, memory faults are suppressed for elements with a mask bit of
0.
Note that this feature provides a versatile construct to implement control-flow predication as the mask in effect
provides a merging behavior for AVX-512 vector register destinations. As an alternative the masking can be used
for zeroing instead of merging, so that the masked out elements are updated with 0 instead of preserving the old
value. The zeroing behavior is provided to remove the implicit dependency on the old value when it is not needed.
Most instructions with masking enabled accept both forms of masking. Instructions that must have EVEX.aaa bits
different than 0 (gather and scatter) and instructions that write to memory only accept merging-masking.
It’s important to note that the per-element destination update rule also applies when the destination operand is a
memory location. Vectors are written on a per element basis, based on the opmask register used as a predicate
operand.
The value of an opmask register can be:
•
Generated as a result of a vector instruction (e.g., CMP, FPCLASS, etc.).
•
Loaded from memory.
•
Loaded from a GPR register.
•
Modified by mask-to-mask operations.
Opmask registers can be used for purposes outside of predication. For example, they can be used to manipulate
sparse sets of elements from a vector, or used to set the EFLAGS based on the 0/0xFFFFFFFFFFFFFFFF/other status
of the OR of two opmask registers.
15.6.1.1 Opmask Register K0
The only exception to the opmask rules described above is that opmask k0 can not be used as a predicate operand.
Opmask k0 cannot be encoded as a predicate operand for a vector operation; the encoding value that would select
opmask k0 will instead select an implicit opmask value of 0xFFFFFFFFFFFFFFFF, thereby effectively disabling