background image

Vol. 1 10-1

CHAPTER 10

PROGRAMMING WITH INTEL®

STREAMING SIMD EXTENSIONS (INTEL® SSE)

The streaming SIMD extensions (SSE) were introduced into the IA-32 architecture in the Pentium III processor 
family. These extensions enhance the performance of IA-32 processors for advanced 2-D and 3-D graphics, motion 
video, image processing, speech recognition, audio synthesis, telephony, and video conferencing. 
This chapter describes SSE. Chapter 11, “Programming with Intel® Streaming SIMD Extensions 2 (Intel® SSE2),” 
provides information to assist in writing application programs that use SSE2 extensions. Chapter 12, “Program-
ming with Intel® SSE3, SSSE3, Intel® SSE4 and Intel® AESNI,” pro
vides this information for SSE3 extensions.

10.1 

OVERVIEW OF SSE EXTENSIONS

Intel MMX technology introduced single-instruction multiple-data (SIMD) capability into the IA-32 architecture, 
with the 64-bit MMX registers, 64-bit packed integer data types, and instructions that allowed SIMD operations to 
be performed on packed integers. SSE extensions expand the SIMD execution model by adding facilities for 
handling packed and scalar single-precision floating-point values contained in 128-bit registers.
If CPUID.01H:EDX.SSE[bit 25] = 1, SSE extensions are present.
SSE extensions add the following features to the IA-32 architecture, while maintaining backward compatibility with 
all existing IA-32 processors, applications and operating systems.

Eight 128-bit data registers (called XMM registers) in non-64-bit modes; sixteen XMM registers are available in 
64-bit mode.

The 32-bit MXCSR register, which provides control and status bits for operations performed on XMM registers.

The 128-bit packed single-precision floating-point data type (four IEEE single-precision floating-point values 
packed into a double quadword).

Instructions that perform SIMD operations on single-precision floating-point values and that extend SIMD 
operations that can be performed on integers:
— 128-bit Packed and scalar single-precision floating-point instructions that operate on data located in MMX 

registers

— 64-bit SIMD integer instructions that support additional operations on packed integer operands located in 

MMX registers

Instructions that save and restore the state of the MXCSR register.

Instructions that support explicit prefetching of data, control of the cacheability of data, and control the 
ordering of store operations.

Extensions to the CPUID instruction. 

These features extend the IA-32 architecture’s SIMD programming model in four important ways: 

The ability to perform SIMD operations on four packed single-precision floating-point values enhances the 
performance of IA-32 processors for advanced media and communications applications that use computation-
intensive algorithms to perform repetitive operations on large arrays of simple, native data elements. 

The ability to perform SIMD single-precision floating-point operations in XMM registers and SIMD integer 
operations in MMX registers provides greater flexibility and throughput for executing applications that operate 
on large arrays of floating-point and integer data.

Cache control instructions provide the ability to stream data in and out of XMM registers without polluting the 
caches and the ability to prefetch data to selected cache levels before it is actually used. Applications that 
require regular access to large amounts of data benefit from these prefetching and streaming store capabilities. 

The SFENCE (store fence) instruction provides greater control over the ordering of store operations when using 
weakly-ordered memory types.