background image

10-6 Vol. 1

PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS (INTEL® SSE)

packed into a double quadword. (See Figure 4-3 for the layout of a single-precision floating-point value; refer to 
Section 4.2.2, “Floating-Point Data Types,” for a detailed description of the single-precision floating-point format.)

This 128-bit packed single-precision floating-point data type is operated on in the XMM registers or in memory. 
Conversion instructions are provided to convert two packed single-precision floating-point values into two packed 
doubleword integers or a scalar single-precision floating-point value into a doubleword integer (see Figure 11-8).
SSE extensions provide conversion instructions between XMM registers and MMX registers, and between XMM 
registers and general-purpose bit registers. See Figure 11-8.
The address of a 128-bit packed memory operand must be aligned on a 16-byte boundary, except in the following 
cases: 

The MOVUPS instruction supports unaligned accesses.

Scalar instructions that use a 4-byte memory operand that is not subject to alignment requirements.

Figure 4-2 shows the byte order of 128-bit (double quadword) data types in memory.

10.4 

SSE INSTRUCTION SET

SSE instructions are divided into four functional groups

Packed and scalar single-precision floating-point instructions

64-bit SIMD integer instructions

State management instructions

Cacheability control, prefetch, and memory ordering instructions

The following sections give an overview of each of the instructions in these groups.

10.4.1 

SSE Packed and Scalar Floating-Point Instructions

The packed and scalar single-precision floating-point instructions are divided into the following subgroups:

Data movement instructions

Arithmetic instructions

Logical instructions

Comparison instructions

Shuffle instructions

Conversion instructions

The packed single-precision floating-point instructions perform SIMD operations on packed single-precision 
floating-point operands (see Figure 10-5). Each source operand contains four single-precision floating-point 
values, and the destination operand contains the results of the operation (OP) performed in parallel on the corre-
sponding values (X0 and Y0, X1 and Y1, X2 and Y2, and X3 and Y3) in each operand.

Figure 10-4.  128-Bit Packed Single-Precision Floating-Point Data Type

0

127

Contains 4 Single-Precision 

Floating-Point Values

64 63

31

32

95

96