background image

Vol. 1 12-1

CHAPTER 12

PROGRAMMING WITH INTEL® SSE3, SSSE3,

INTEL® SSE4 AND INTEL® AESNI

This chapter describes SSE3, SSSE3, SSE4 and provides information to assist in writing application programs that 
use these extensions. 
AESNI and PCLMLQDQ are instruction extensions targeted to accelerate high-speed block encryption and crypto-
graphic processing. Section 12.13 covers these instructions and their relationship to the Advanced Encryption 
Standard (AES).

12.1 

PROGRAMMING ENVIRONMENT AND DATA TYPES

The programming environment for using SSE3, SSSE3, and SSE4 is unchanged from those shown in Figure 3-1 and 
Figure 3-2. SSE3, SSSE3, and SSE4

 do not introduce new data types. XMM registers are used to operate on packed 

integer data, single-precision floating-point data, or double-precision floating-point data. 
One SSE3 instruction uses the x87 FPU for x87-style programming. There are two SSE3 instructions that use the 
general registers for thread synchronization. The MXCSR register governs SIMD floating-point operations. Note, 
however, that the x87FPU control word does not affect the SSE3 instruction that is executed by the x87 FPU 
(FISTTP), other than by unmasking an invalid operand or inexact result exception.
SSE4 instructions do not use MMX registers. The majority of SSE4.2

1

 instructions and SSE4.1 instructions operate 

on XMM registers.

12.1.1 

SSE3, SSSE3, SSE4 in 64-Bit Mode and Compatibility Mode

In compatibility mode, SSE3, SSSE3, and SSE4 function like they do in protected mode. In 64-bit mode, eight addi-
tional XMM registers are accessible. Registers XMM8-XMM15 are accessed by using REX prefixes. 
Memory operands are specified using the ModR/M, SIB encoding described in Section 3.7.5.
Some SSE3, SSSE3, and SSE4 instructions may be used to operate on general-purpose registers. Use the REX.W 
prefix to access 64-bit general-purpose registers. Note that if a REX prefix is used when it has no meaning, the 
prefix is ignored.

12.1.2 

Compatibility of SSE3/SSSE3 with MMX Technology, the x87 FPU Environment, and 

SSE/SSE2 Extensions

SSE3, SSSE3, and SSE4 do not introduce any new state to the Intel 64 and IA-32 execution environments. 
For SIMD and x87 programming, the FXSAVE and FXRSTOR instructions save and restore the architectural states 
of XMM, MXCSR, x87 FPU, and MMX registers. The MONITOR and MWAIT instructions use general purpose registers 
on input, they do not modify the content of those registers.

12.1.3 

Horizontal and Asymmetric Processing

Many SSE/SSE2/SSE3/SSSE3 instructions accelerate SIMD data processing using a model referred to as vertical 
computation. Using this model, data flow is vertical between the data elements of the inputs and the output. 
Figure 12-1 illustrates the asymmetric processing of the SSE3 instruction ADDSUBPD. Figure 12-2 illustrates the 
horizontal data movement of the SSE3 instruction HADDPD. 

1. Although the presence of CRC32 support is enumerated by CPUID.01:ECX[SSE4.2] = 1, CRC32 operates on general purpose regis-

ters.