Page 242

10-6 Vol. 1

PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS (INTEL® SSE)

packed into a double quadword. (See Figure 4-3 for the layout of a single-precision floating-point value; refer to
Section 4.2.2, “Floating-Point Data Types,” for a detailed description of the single-precision floating-point format.)

This 128-bit packed single-precision floating-point data type is operated on in the XMM registers or in memory.
Conversion instructions are provided to convert two packed single-precision floating-point values into two packed
doubleword integers or a scalar single-precision floating-point value into a doubleword integer (see Figure 11-8).
SSE extensions provide conversion instructions between XMM registers and MMX registers, and between XMM
registers and general-purpose bit registers. See Figure 11-8.
The address of a 128-bit packed memory operand must be aligned on a 16-byte boundary, except in the following
cases:

•

The MOVUPS instruction supports unaligned accesses.

•

Scalar instructions that use a 4-byte memory operand that is not subject to alignment requirements.

Figure 4-2 shows the byte order of 128-bit (double quadword) data types in memory.

10.4

SSE INSTRUCTION SET

SSE instructions are divided into four functional groups

•

Packed and scalar single-precision floating-point instructions

•

64-bit SIMD integer instructions

•

State management instructions

•

Cacheability control, prefetch, and memory ordering instructions

The following sections give an overview of each of the instructions in these groups.

10.4.1

SSE Packed and Scalar Floating-Point Instructions

The packed and scalar single-precision floating-point instructions are divided into the following subgroups:

•

Data movement instructions

•

Arithmetic instructions

•

Logical instructions

•

Comparison instructions

•

Shuffle instructions

•

Conversion instructions

The packed single-precision floating-point instructions perform SIMD operations on packed single-precision
floating-point operands (see Figure 10-5). Each source operand contains four single-precision floating-point
values, and the destination operand contains the results of the operation (OP) performed in parallel on the corre-
sponding values (X0 and Y0, X1 and Y1, X2 and Y2, and X3 and Y3) in each operand.

Figure 10-4. 128-Bit Packed Single-Precision Floating-Point Data Type

127

Contains 4 Single-Precision

Floating-Point Values

64 63