15-14 Vol. 1
PROGRAMMING WITH INTEL® AVX-512
•
Explicitly-unaligned SIMD load and store instructions accessing 64 bytes or less of data from memory (e.g.,
VMOVUPD, VMOVUPS, VMOVDQU, VMOVQ, VMOVD, etc.). These instructions do not require the memory
address to be aligned on a natural vector-length byte boundary.
•
Most arithmetic and data processing instructions encoded using EVEX support memory access semantics.
When these instructions access from memory, there are no alignment restrictions.
Software may see performance penalties when unaligned accesses cross cacheline boundaries or vector-length
naturally-aligned boundaries, so reasonable attempts to align commonly used data sets should continue to be
pursued.
Atomic memory operation in Intel 64 and IA-32 architecture is guaranteed only for a subset of memory operand
sizes and alignment scenarios. The guaranteed atomic operations are described in Section 7.1.1, “Task Structure”
of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. AVX and FMA instructions do
not introduce any new guaranteed atomic memory operations.
AVX-512 instructions may generate an #AC(0) fault on misaligned 4 or 8-byte memory references in Ring-3 when
CR0.AM=1. 16, 32 and 64-byte memory references will not generate an #AC(0) fault. See Table 15-7 for details.
Certain AVX-512 Foundation instructions always require 64-byte alignment (see the complete list of VEX and EVEX
encoded instructions in Table 15-6). These instructions will #GP(0) if not aligned to 64-byte boundaries.
15.8
SIMD FLOATING-POINT EXCEPTIONS
AVX-512 instructions can generate SIMD floating-point exceptions (#XM) if embedded “suppress all exceptions”
(SAE) in EVEX is not set. When SAE is not set, these instructions will respond to exception masks of MXCSR in the
same way as VEX-encoded AVX instructions. When CR4.OSXMMEXCPT=0, any unmasked FP exceptions generate
an Undefined Opcode exception (#UD).
Table 15-6. SIMD Instructions Requiring Explicitly Aligned Memory
Require 16-byte alignment
Require 32-byte alignment
Require 64-byte alignment*
(V)MOVDQA xmm, m128
VMOVDQA ymm, m256
VMOVDQA zmm, m512
(V)MOVDQA m128, xmm
VMOVDQA m256, ymm
VMOVDQA m512, zmm
(V)MOVAPS xmm, m128
VMOVAPS ymm, m256
VMOVAPS zmm, m512
(V)MOVAPS m128, xmm
VMOVAPS m256, ymm
VMOVAPS m512, zmm
(V)MOVAPD xmm, m128
VMOVAPD ymm, m256
VMOVAPD zmm, m512
(V)MOVAPD m128, xmm
VMOVAPD m256, ymm
VMOVAPD m512, zmm
(V)MOVNTDQA xmm, m128
VMOVNTPS m256, ymm
VMOVNTPS m512, zmm
(V)MOVNTPS m128, xmm
VMOVNTPD m256, ymm
VMOVNTPD m512, zmm
(V)MOVNTPD m128, xmm
VMOVNTDQ m256, ymm
VMOVNTDQ m512, zmm
(V)MOVNTDQ m128, xmm
VMOVNTDQA ymm, m256
VMOVNTDQA zmm, m512
Table 15-7. Instructions Not Requiring Explicit Memory Alignment
(V)MOVDQU xmm, m128
VMOVDQU ymm, m256
VMOVDQU zmm, m512
(V)MOVDQU m128, m128
VMOVDQU m256, ymm
VMOVDQU m512, zmm
(V)MOVUPS xmm, m128
VMOVUPS ymm, m256
VMOVUPS zmm, m512
(V)MOVUPS m128, xmm
VMOVUPS m256, ymm
VMOVUPS m512, zmm
(V)MOVUPD xmm, m128
VMOVUPD ymm, m256
VMOVUPD zmm, m512
(V)MOVUPD m128, xmm
VMOVUPD m256, ymm
VMOVUPD m512, zmm