10-6 Vol. 1
PROGRAMMING WITH INTEL® STREAMING SIMD EXTENSIONS (INTEL® SSE)
packed into a double quadword. (See Figure 4-3 for the layout of a single-precision floating-point value; refer to
Section 4.2.2, “Floating-Point Data Types,” for a detailed description of the single-precision floating-point format.)
This 128-bit packed single-precision floating-point data type is operated on in the XMM registers or in memory.
Conversion instructions are provided to convert two packed single-precision floating-point values into two packed
doubleword integers or a scalar single-precision floating-point value into a doubleword integer (see Figure 11-8).
SSE extensions provide conversion instructions between XMM registers and MMX registers, and between XMM
registers and general-purpose bit registers. See Figure 11-8.
The address of a 128-bit packed memory operand must be aligned on a 16-byte boundary, except in the following
cases:
•
The MOVUPS instruction supports unaligned accesses.
•
Scalar instructions that use a 4-byte memory operand that is not subject to alignment requirements.
Figure 4-2 shows the byte order of 128-bit (double quadword) data types in memory.
10.4
SSE INSTRUCTION SET
SSE instructions are divided into four functional groups
•
Packed and scalar single-precision floating-point instructions
•
64-bit SIMD integer instructions
•
State management instructions
•
Cacheability control, prefetch, and memory ordering instructions
The following sections give an overview of each of the instructions in these groups.
10.4.1
SSE Packed and Scalar Floating-Point Instructions
The packed and scalar single-precision floating-point instructions are divided into the following subgroups:
•
Data movement instructions
•
Arithmetic instructions
•
Logical instructions
•
Comparison instructions
•
Shuffle instructions
•
Conversion instructions
The packed single-precision floating-point instructions perform SIMD operations on packed single-precision
floating-point operands (see Figure 10-5). Each source operand contains four single-precision floating-point
values, and the destination operand contains the results of the operation (OP) performed in parallel on the corre-
sponding values (X0 and Y0, X1 and Y1, X2 and Y2, and X3 and Y3) in each operand.
Figure 10-4. 128-Bit Packed Single-Precision Floating-Point Data Type
0
127
Contains 4 Single-Precision
Floating-Point Values
64 63
31
32
95
96