Page 329

Vol. 1 14-1

CHAPTER 14

PROGRAMMING WITH AVX, FMA AND AVX2

Intel

Advanced Vector Extensions (Intel

AVX) introduces 256-bit vector processing capability. The Intel AVX

instruction set extends 128-bit SIMD instruction sets by employing a new instruction encoding scheme via a vector
extension prefix (VEX). Intel AVX also offers several enhanced features beyond those available in prior generations
of 128-bit SIMD extensions.
FMA (Fused Multiply Add) extensions enhances Intel AVX further in floating-point numeric computations. FMA
provides high-throughput, arithmetic operations cover fused multiply-add, fused multiply-subtract, fused multiply
add/subtract interleave, signed-reversed multiply on fused multiply-add and multiply-subtract.
Intel AVX2 provides 256-bit integer SIMD extensions that accelerate computation across integer and floating-point
domains using 256-bit vector registers.
This chapter summarizes the key features of Intel AVX, FMA and AVX2.

14.1

INTEL AVX OVERVIEW

Intel AVX introduces the following architectural enhancements:

•

Support for 256-bit wide vectors with the YMM vector register set.

•

256-bit floating-point instruction set enhancement with up to 2X performance gain relative to 128-bit
Streaming SIMD extensions.

•

Enhancement of legacy 128-bit SIMD instruction extensions to support three-operand syntax and to simplify
compiler vectorization of high-level language expressions.

•

VEX prefix-encoded instruction syntax support for generalized three-operand syntax to improve instruction
programming flexibility and efficient encoding of new instruction extensions.

•

Most VEX-encoded 128-bit and 256-bit AVX instructions (with both load and computational operation
semantics) are not restricted to 16-byte or 32-byte memory alignment.

•

Support flexible deployment of 256-bit AVX code, 128-bit AVX code, legacy 128-bit code and scalar code.

With the exception of SIMD instructions operating on MMX registers, almost all legacy 128-bit SIMD instructions
have AVX equivalents that support three operand syntax. 256-bit AVX instructions employ three-operand syntax
and some with 4-operand syntax.

14.1.1

256-Bit Wide SIMD Register Support

Intel AVX introduces support for 256-bit wide SIMD registers (YMM0-YMM7 in operating modes that are 32-bit or
less, YMM0-YMM15 in 64-bit mode). The lower 128-bits of the YMM registers are aliased to the respective 128-bit
XMM registers.
Legacy SSE instructions (i.e. SIMD instructions operating on XMM state but not using the VEX prefix, also referred
to non-VEX encoded SIMD instructions) will not access the upper bits beyond bit 128 of the YMM registers. AVX
instructions with a VEX prefix and vector length of 128-bits zeroes the upper bits (above bit 128) of the YMM
register.