background image

Vol. 1 11-23

PROGRAMMING WITH INTELĀ® STREAMING SIMD EXTENSIONS 2 (INTELĀ® SSE2)

may result in a SIMD floating-point exception (such as numeric overflow [#O] or invalid operation [#I]) being 
generated, but the actual source of the problem (inconsistent data types) is not detected.
The ability to operate on an operand that contains a data type that is inconsistent with the typing of the instruction 
being executed, permits some valid operations to be performed. For example, the following instructions load a 
packed double-precision floating-point operand from memory to register XMM0, and a mask to register XMM1; 
then they use XORPD to toggle the sign bits of the two packed values in register XMM0.
movapd

xmm0, [eax]  ; EAX register contains pointer to packed 

; double-precision floating-point operand

movaps

xmm1, [ebx]

; EBX register contains pointer to packed

; double-precision floating-point mask

xorpd

xmm0, xmm1 ; XOR operation toggles sign bits using 

; the mask in xmm1

In this example: XORPS or PXOR can be used in place of XORPD and yield the same correct result. However, 
because of the type mismatch between the operand data type and the instruction data type, a latency penalty will 
be incurred due to implementations of the instructions at the microarchitecture level. 
Latency penalties can also be incurred by using move instructions of the wrong type. For example, MOVAPS and 
MOVAPD can both be used to move a packed single-precision operand from memory to an XMM register. However, 
if MOVAPD is used, a latency penalty will be incurred when a correctly typed instruction attempts to use the data in 
the register.
Note that these latency penalties are not incurred when moving data from XMM registers to memory.

11.6.10  Interfacing with SSE/SSE2 Procedures and Functions

SSE and SSE2 extensions allow direct access to XMM registers. This means that all existing interface conventions 
between procedures and functions that apply to the use of the general-purpose registers (EAX, EBX, etc.) also 
apply to XMM register usage.

11.6.10.1   Passing Parameters in XMM Registers

The state of XMM registers is preserved across procedure (or function) boundaries. Parameters can be passed from 
one procedure to another using XMM registers.

11.6.10.2   Saving XMM Register State on a Procedure or Function Call

The state of XMM registers can be saved in two ways: using an FXSAVE instruction or a move instruction. FXSAVE 
saves the state of all XMM registers (along with the state of MXCSR and the x87 FPU registers). This instruction is 
typically used for major changes in the context of the execution environment, such as a task switch. FXRSTOR 
restores the XMM, MXCSR, and x87 FPU registers stored with FXSAVE.
In cases where only XMM registers must be saved, or where selected XMM registers need to be saved, move 
instructions (MOVAPS, MOVUPS, MOVSS, MOVAPD, MOVUPD, MOVSD, MOVDQA, and MOVDQU) can be used. 
These instructions can also be used to restore the contents of XMM registers. To avoid performance degradation 
when saving XMM registers to memory or when loading XMM registers from memory, be sure to use the appropri-
ately typed move instructions.
The move instructions can also be used to save the contents of XMM registers on the stack. Here, the stack pointer 
(in the ESP register) can be used as the memory address to the next available byte in the stack. Note that the stack 
pointer is not automatically incremented when using a move instruction (as it is with PUSH). 
A move-instruction procedure that saves the contents of an XMM register to the stack is responsible for decre-
menting the value in the ESP register by 16. Likewise, a move-instruction procedure that loads an XMM register 
from the stack needs also to increment the ESP register by 16. To avoid performance degradation when moving the 
contents of XMM registers, use the appropriately typed move instructions.