background image

Vol. 1 14-1

CHAPTER 14

PROGRAMMING WITH AVX, FMA AND AVX2

Intel

®

 Advanced Vector Extensions (Intel

® 

AVX) introduces 256-bit vector processing capability. The Intel AVX 

instruction set extends 128-bit SIMD instruction sets by employing a new instruction encoding scheme via a vector 
extension prefix (VEX). Intel AVX also offers several enhanced features beyond those available in prior generations 
of 128-bit SIMD extensions. 
FMA (Fused Multiply Add) extensions enhances Intel AVX further in floating-point numeric computations. FMA 
provides high-throughput, arithmetic operations cover fused multiply-add, fused multiply-subtract, fused multiply 
add/subtract interleave, signed-reversed multiply on fused multiply-add and multiply-subtract. 
Intel AVX2 provides 256-bit integer SIMD extensions that accelerate computation across integer and floating-point 
domains using 256-bit vector registers.
This chapter summarizes the key features of Intel AVX, FMA and AVX2.

14.1 

INTEL AVX OVERVIEW

Intel AVX introduces the following architectural enhancements:

Support for 256-bit wide vectors with the YMM vector register set. 

256-bit floating-point instruction set enhancement with up to 2X performance gain relative to 128-bit 
Streaming SIMD extensions.

Enhancement of legacy 128-bit SIMD instruction extensions to support three-operand syntax and to simplify 
compiler vectorization of high-level language expressions.

VEX prefix-encoded instruction syntax support for generalized three-operand syntax to improve instruction 
programming flexibility and efficient encoding of new instruction extensions.

Most VEX-encoded 128-bit and 256-bit AVX instructions (with both load and computational operation 
semantics) are not restricted to 16-byte or 32-byte memory alignment. 

Support flexible deployment of 256-bit AVX code, 128-bit AVX code, legacy 128-bit code and scalar code.

With the exception of SIMD instructions operating on MMX registers, almost all legacy 128-bit SIMD instructions 
have AVX equivalents that support three operand syntax. 256-bit AVX instructions employ three-operand syntax 
and some with 4-operand syntax. 

14.1.1 

256-Bit Wide SIMD Register Support

Intel AVX introduces support for 256-bit wide SIMD registers (YMM0-YMM7 in operating modes that are 32-bit or 
less, YMM0-YMM15 in 64-bit mode). The lower 128-bits of the YMM registers are aliased to the respective 128-bit 
XMM registers. 
Legacy SSE instructions (i.e. SIMD instructions operating on XMM state but not using the VEX prefix, also referred 
to non-VEX encoded SIMD instructions) will not access the upper bits beyond bit 128 of the YMM registers. AVX 
instructions with a VEX prefix and vector length of 128-bits zeroes the upper bits (above bit 128) of the YMM 
register.