background image

Vol. 1 10-5

PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS (INTEL® SSE)

Returns a zero result with the sign of the true result

Sets the precision and underflow exception flags

If the underflow exception is not masked, the flush-to-zero bit is ignored.
The flush-to-zero mode is not compatible with IEEE Standard 754. The IEEE-mandated masked response to under-
flow is to deliver the denormalized result (see Section 4.8.3.2, “Normalized and Denormalized Finite Numbers”). 
The flush-to-zero mode is provided primarily for performance reasons. At the cost of a slight precision loss, faster 
execution can be achieved for applications where underflows are common and rounding the underflow result to 
zero can be tolerated.
The flush-to-zero bit is cleared upon a power-up or reset of the processor, disabling the flush-to-zero mode.

10.2.3.4   Denormals-Are-Zeros

Bit 6 (DAZ) of the MXCSR register enables the denormals-are-zeros mode, which controls the processor’s response 
to a SIMD floating-point denormal operand condition. When the denormals-are-zeros flag is set, the processor 
converts all denormal source operands to a zero with the sign of the original operand before performing any 
computations on them. The processor does not set the denormal-operand exception flag (DE), regardless of the 
setting of the denormal-operand exception mask bit (DM); and it does not generate a denormal-operand exception 
if the exception is unmasked.
The denormals-are-zeros mode is not compatible with IEEE Standard 754 (see Section 4.8.3.2, “Normalized and 
Denormalized Finite Numbers”).
 The denormals-are-zeros mode is provided to improve processor performance for 
applications such as streaming media processing, where rounding a denormal operand to zero does not appre-
ciably affect the quality of the processed data.
The denormals-are-zeros flag is cleared upon a power-up or reset of the processor, disabling the denormals-are-
zeros mode.
The denormals-are-zeros mode was introduced in the Pentium 4 and Intel Xeon processor with the SSE2 exten-
sions; however, it is fully compatible with the SSE SIMD floating-point instructions (that is, the denormals-are-
zeros flag affects the operation of the SSE SIMD floating-point instructions). In earlier IA-32 processors and in 
some models of the Pentium 4 processor, this flag (bit 6) is reserved. See Section 11.6.3, “Checking for the DAZ 
Flag in the MXCSR Register,” for 
instructions for detecting the availability of this feature.
Attempting to set bit 6 of the MXCSR register on processors that do not support the DAZ flag will cause a general-
protection exception (#GP). See Section 11.6.6, “Guidelines for Writing to the MXCSR Register,” for instructions for 
preventing such general-protection exceptions by using the MXCSR_MASK value returned by the FXSAVE instruc-
tion.

10.2.4 

Compatibility of SSE Extensions with SSE2/SSE3/MMX and the x87 FPU

The state (XMM registers and MXCSR register) introduced into the IA-32 execution environment with the SSE 
extensions is shared with SSE2 and SSE3 extensions. SSE/SSE2/SSE3 instructions are fully compatible; they can 
be executed together in the same instruction stream with no need to save state when switching between instruc-
tion sets.
XMM registers are independent of the x87 FPU and MMX registers, so SSE/SSE2/SSE3 operations performed on the 
XMM registers can be performed in parallel with operations on the x87 FPU and MMX registers (see Section 11.6.7, 
“Interaction of SSE/SSE2 Instructions with x87 FPU and MMX Instructions”).
The FXSAVE and FXRSTOR instructions save and restore the SSE/SSE2/SSE3 states along with the x87 FPU and 
MMX state.

10.3 

SSE DATA TYPES

SSE extensions introduced one data type, the 128-bit packed single-precision floating-point data type, to the IA-
32 architecture (see Figure 10-4). This data type consists of four IEEE 32-bit single-precision floating-point values