Vol. 1 11-1
CHAPTER 11
PROGRAMMING WITH INTEL®
STREAMING SIMD EXTENSIONS 2 (INTEL® SSE2)
The streaming SIMD extensions 2 (SSE2) were introduced into the IA-32 architecture in the Pentium 4 and Intel
Xeon processors. These extensions enhance the performance of IA-32 processors for advanced 3-D graphics, video
decoding/encoding, speech recognition, E-commerce, Internet, scientific, and engineering applications.
This chapter describes the SSE2 extensions and provides information to assist in writing application programs that
use these and the SSE extensions.
11.1
OVERVIEW OF SSE2 EXTENSIONS
SSE2 extensions use the single instruction multiple data (SIMD) execution model that is used with MMX technology
and SSE extensions. They extend this model with support for packed double-precision floating-point values and for
128-bit packed integers.
If CPUID.01H:EDX.SSE2[bit 26] = 1, SSE2 extensions are present.
SSE2 extensions add the following features to the IA-32 architecture, while maintaining backward compatibility
with all existing IA-32 processors, applications and operating systems.
•
Six data types:
— 128-bit packed double-precision floating-point (two IEEE Standard 754 double-precision floating-point
values packed into a double quadword)
— 128-bit packed byte integers
— 128-bit packed word integers
— 128-bit packed doubleword integers
— 128-bit packed quadword integers
•
Instructions to support the additional data types and extend existing SIMD integer operations:
— Packed and scalar double-precision floating-point instructions
— Additional 64-bit and 128-bit SIMD integer instructions
— 128-bit versions of SIMD integer instructions introduced with the MMX technology and the SSE extensions
— Additional cacheability-control and instruction-ordering instructions
•
Modifications to existing IA-32 instructions to support SSE2 features:
— Extensions and modifications to the CPUID instruction
— Modifications to the RDPMC instruction
These new features extend the IA-32 architecture’s SIMD programming model in three important ways:
•
They provide the ability to perform SIMD operations on pairs of packed double-precision floating-point values.
This permits higher precision computations to be carried out in XMM registers, which enhances processor
performance in scientific and engineering applications and in applications that use advanced 3-D geometry
techniques (such as ray tracing). Additional flexibility is provided with instructions that operate on single
(scalar) double-precision floating-point values located in the low quadword of an XMM register.
•
They provide the ability to operate on 128-bit packed integers (bytes, words, doublewords, and quadwords) in
XMM registers. This provides greater flexibility and greater throughput when performing SIMD operations on
packed integers. The capability is particularly useful for applications such as RSA authentication and RC5
encryption. Using the full set of SIMD registers, data types, and instructions provided with the MMX technology
and SSE/SSE2 extensions, programmers can develop algorithms that finely mix packed single- and double-
precision floating-point data and 64- and 128-bit packed integer data.
•
SSE2 extensions enhance the support introduced with SSE extensions for controlling the cacheability of SIMD
data. SSE2 cache control instructions provide the ability to stream data in and out of the XMM registers without
polluting the caches and the ability to prefetch data before it is actually used.