Page 350

14-22 Vol. 1

PROGRAMMING WITH AVX, FMA AND AVX2

14.5.1 FMA

Instruction

Operand

Order and Arithmetic Behavior

FMA instruction mnemonics are defined explicitly with an ordered three digits, e.g. VFMADD132PD. The value of
each digit refers to the ordering of the three source operand as defined by instruction encoding specification:

•

‘1’: The first source operand (also the destination operand) in the syntactical order listed in this specification.

•

‘2’: The second source operand in the syntactical order. This is a YMM/XMM register, encoded using VEX prefix.

•

‘3’: The third source operand in the syntactical order. The first and third operand are encoded following ModR/M
encoding rules.

The ordering of each digit within the mnemonic refers to the floating-point data listed on the right-hand side of the
arithmetic equation of each FMA operation (see Table 14-17):

•

The first position in the three digits of a FMA mnemonic refers to the operand position of the first FP data
expressed in the arithmetic equation of FMA operation, the multiplicand.

•

The second position in the three digits of a FMA mnemonic refers to the operand position of the second FP data
expressed in the arithmetic equation of FMA operation, the multiplier.

•

The third position in the three digits of a FMA mnemonic refers to the operand position of the FP data being
added/subtracted to the multiplication result.

Note the non-numerical result of an FMA operation does not resemble the mathematically-defined commutative
property between the multiplicand and the multiplier values (see Table 14-17). Consequently, software tools (such
as an assembler) may support a complementary set of FMA mnemonics for each FMA instruction for ease of
programming to take advantage of the mathematical property of commutative multiplications. For example, an
assembler may optionally support the complementary mnemonic “VFMADD312PD” in addition to the true
mnemonic “VFMADD132PD“. The assembler will generate the same instruction opcode sequence corresponding to
VFMADD132PD. The processor executes VFMADD132PD and report any NAN conditions based on the definition of
VFMADD132PD. Similarly, if the complementary mnemonic VFMADD123PD is supported by an assembler at source
level, it must generate the opcode sequence corresponding to VFMADD213PD; the complementary mnemonic
VFMADD321PD must produce the opcode sequence defined by VFMADD231PD. In the absence of FMA operations
reporting a NAN result, the numerical results of using either mnemonic with an assembler supporting both
mnemonics will match the behavior defined in Table 14-17. Support for the complementary FMA mnemonics by
software tools is optional.

14.5.2

Fused-Multiply-ADD (FMA) Numeric Behavior

FMA instructions can perform fused-multiply-add operations (including fused-multiply-subtract, and other vari-
eties) on packed and scalar data elements in the instruction operands. Separate FMA instructions are provided to
handle different types of arithmetic operations on the three source operands.
FMA instruction syntax is defined using three source operands and the first source operand is updated based on the
result of the arithmetic operations of the data elements of 128-bit or 256-bit operands, i.e. The first source operand
is also the destination operand.
The arithmetic FMA operation performed in an FMA instruction takes one of several forms, r=(x*y)+z, r=(x*y)-z,
r=-(x*y)+z, or r=-(x*y)-z. Packed FMA instructions can perform eight single-precision FMA operations or four
double-precision FMA operations with 256-bit vectors.
Scalar FMA instructions only perform one arithmetic operation on the low order data element. The content of the
rest of the data elements in the lower 128-bits of the destination operand is preserved. the upper 128bits of the
destination operand are filled with zero.

VFNMSUB132SD/VFNMSUB213SD/VFNMSUB231SD

xmm0, xmm1, xmm2/m64

Fused Negative Multiply-Subtract of Scalar Double-Precision

Floating-Point Values

VFNMSUB132SS/VFNMSUB213SS/VFNMSUB231SS

xmm0, xmm1, xmm2/m32

Fused Negative Multiply-Subtract of Scalar Single-Precision

Floating-Point Values

Table 14-15. FMA Instructions

Instruction

Description