background image

Vol. 1 11-17

PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS 2 (INTEL® SSE2)

a QNaN depending on the exception condition detected. In most cases, the corresponding exception flag bit in 
MXCSR is also set. The one situation where an exception flag is not set is when an underflow condition is detected 
and it is not accompanied by an inexact result.
When operating on packed floating-point operands, the processor returns a masked result for each of the sub-
operand computations and sets a separate set of internal exception flags for each computation. It then performs a 
logical-OR on the internal exception flag settings and sets the exception flags in the MXCSR register according to 
the results of OR operations.
For example, Figure 11-9 shows the results of an MULPS instruction. In the example, all SIMD floating-point excep-
tions are masked. Assume that a denormal exception condition is detected prior to the multiplication of sub-oper-
ands X0 and Y0, no exception condition is detected for the multiplication of X1 and Y1, a numeric overflow 
exception condition is detected for the multiplication of X2 and Y2, and another denormal exception is detected 
prior to the multiplication of sub-operands X3 and Y3. Because denormal exceptions are masked, the processor 
uses the denormal source values in the multiplications of (X0 and Y0) and of (X3 and Y3) passing the results of the 
multiplications through to the destination operand. With the denormal operand, the result of the X0 and Y0 compu-
tation is a normalized finite value, with no exceptions detected. However, the X3 and Y3 computation produces a 
tiny and inexact result. This causes the corresponding internal numeric underflow and inexact-result exception 
flags to be set.

For the multiplication of X2 and Y2, the processor stores the floating-point ∞ in the destination operand, and sets 

the corresponding internal sub-operand numeric overflow flag. The result of the X1 and Y1 multiplication is passed 
through to the destination operand, with no internal sub-operand exception flags being set. Following the compu-
tations, the individual sub-operand exceptions flags for denormal operand, numeric underflow, inexact result, and 
numeric overflow are OR’d and the corresponding flags are set in the MXCSR register.
The net result of this computation is that:

Multiplication of X0 and Y0 produces a normalized finite result

Multiplication of X1 and Y1 produces a normalized finite result

Multiplication of X2 and Y2 produces a floating-point 

 result

Multiplication of X3 and Y3 produces a tiny, inexact, finite result

Denormal operand, numeric underflow, numeric underflow, and inexact result flags are set in the MXCSR 
register

11.5.3.2   Handling Unmasked Exceptions

If all exceptions are unmasked, the processor:
1. First detects any pre-computation exceptions: it ORs those exceptions, sets the appropriate exception flags, 

leaves the source and destination operands unaltered, and goes to step 2. If it does not detect any pre-
computation exceptions, it goes to step 5.

Figure 11-9.  Example Masked Response for Packed Operations

X3

X2

X1

X0 (Denormal)

Y3 (Denormal)

Y2

Y1 Y0 

Tiny, Inexact, Finite

Normalized Finite

MULPS

MULPS

MULPS

MULPS

Normalized Finite