background image

Vol. 1 11-19

PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS 2 (INTEL® SSE2)

The denormals-are-zeros mode set in the MXCSR register for SSE/SSE2/SSE3 instructions has no counterpart 
in the x87 FPU. For compatibility with the x87 FPU, set the denormals-are-zeros bit to 0.

An application that expects to detect x87 FPU exceptions that occur during the execution of x87 FPU instruc-
tions will not be notified if exceptions occurs during the execution of corresponding SSE/SSE2/SSE3

1

 instruc-

tions, unless the exception masks that are enabled in the x87 FPU control word have also been enabled in the 
MXCSR register and the application is capable of handling SIMD floating-point exceptions (#XM).
— Masked exceptions that occur during an SSE/SSE2/SSE3 library call cannot be detected by unmasking the 

exceptions after the call (in an attempt to generate the fault based on the fact that an exception flag is set). 
A SIMD floating-point exception flag that is set when the corresponding exception is unmasked will not 
generate a fault; only the next occurrence of that unmasked exception will generate a fault.

— An application which checks the x87 FPU status word to determine if any masked exception flags were set 

during an x87 FPU library call will also need to check the MXCSR register to detect a similar occurrence of a 
masked exception flag being set during an SSE/SSE2/SSE3 library call.

11.6 WRITING 

APPLICATIONS WITH SSE/SSE2 EXTENSIONS

The following sections give some guidelines for writing application programs and operating-system code that uses 
the SSE and SSE2 extensions. Because SSE and SSE2 extensions share the same state and perform companion 
operations, these guidelines apply to both sets of extensions.
Chapter 13 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, discusses the inter-
face to the processor for context switching as well as other operating system considerations when writing code that 
uses SSE/SSE2/SSE3 extensions.

11.6.1 

General Guidelines for Using SSE/SSE2 Extensions

The following guidelines describe how to take full advantage of the performance gains available with the SSE and 
SSE2 extensions:

Ensure that the processor supports the SSE and SSE2 extensions.

Ensure that your operating system supports the SSE and SSE2 extensions. (Operating system support for the 
SSE extensions implies support for SSE2 extension and vice versa.)

Use stack and data alignment techniques to keep data properly aligned for efficient memory use.

Use the non-temporal store instructions offered with the SSE and SSE2 extensions.

Employ the optimization and scheduling techniques described in the Intel Pentium 4 Optimization Reference 
Manual
 (see Section 1.4, “Related Literature,” for the order number for this manual).

11.6.2 

Checking for SSE/SSE2 Support

Before an application attempts to use the SSE and/or SSE2 extensions, it should check that they are present on the 
processor:
1. Check that the processor supports the CPUID instruction. Bit 21 of the EFLAGS register can be used to check 

processor’s support the CPUID instruction. 

2. Check that the processor supports the SSE and/or SSE2 extensions (true if CPUID.01H:EDX.SSE[bit 25] = 1 

and/or CPUID.01H:EDX.SSE2[bit 26] = 1).

Operating system must provide system level support for handling SSE state, exceptions before an application can 
use the SSE and/or SSE2 extensions (see Chapter 13 in the Intel® 64 and IA-32 Architectures Software Devel-
oper’s Manual, Volume 3A
).

1. SSE3 refers to ADDSUBPD, ADDSUBPS, HADDPD, HADDPS, HSUBPD and HSUBPS; the only other SSE3 instruction that can raise 

floating-point exceptions is FISTTP: it can generate x87 FPU invalid operation and inexact result exceptions.