Vol. 1 11-23
PROGRAMMING WITH INTELĀ® STREAMING SIMD EXTENSIONS 2 (INTELĀ® SSE2)
may result in a SIMD floating-point exception (such as numeric overflow [#O] or invalid operation [#I]) being
generated, but the actual source of the problem (inconsistent data types) is not detected.
The ability to operate on an operand that contains a data type that is inconsistent with the typing of the instruction
being executed, permits some valid operations to be performed. For example, the following instructions load a
packed double-precision floating-point operand from memory to register XMM0, and a mask to register XMM1;
then they use XORPD to toggle the sign bits of the two packed values in register XMM0.
movapd
xmm0, [eax] ; EAX register contains pointer to packed
; double-precision floating-point operand
movaps
xmm1, [ebx]
; EBX register contains pointer to packed
; double-precision floating-point mask
xorpd
xmm0, xmm1 ; XOR operation toggles sign bits using
; the mask in xmm1
In this example: XORPS or PXOR can be used in place of XORPD and yield the same correct result. However,
because of the type mismatch between the operand data type and the instruction data type, a latency penalty will
be incurred due to implementations of the instructions at the microarchitecture level.
Latency penalties can also be incurred by using move instructions of the wrong type. For example, MOVAPS and
MOVAPD can both be used to move a packed single-precision operand from memory to an XMM register. However,
if MOVAPD is used, a latency penalty will be incurred when a correctly typed instruction attempts to use the data in
the register.
Note that these latency penalties are not incurred when moving data from XMM registers to memory.
11.6.10 Interfacing with SSE/SSE2 Procedures and Functions
SSE and SSE2 extensions allow direct access to XMM registers. This means that all existing interface conventions
between procedures and functions that apply to the use of the general-purpose registers (EAX, EBX, etc.) also
apply to XMM register usage.
11.6.10.1 Passing Parameters in XMM Registers
The state of XMM registers is preserved across procedure (or function) boundaries. Parameters can be passed from
one procedure to another using XMM registers.
11.6.10.2 Saving XMM Register State on a Procedure or Function Call
The state of XMM registers can be saved in two ways: using an FXSAVE instruction or a move instruction. FXSAVE
saves the state of all XMM registers (along with the state of MXCSR and the x87 FPU registers). This instruction is
typically used for major changes in the context of the execution environment, such as a task switch. FXRSTOR
restores the XMM, MXCSR, and x87 FPU registers stored with FXSAVE.
In cases where only XMM registers must be saved, or where selected XMM registers need to be saved, move
instructions (MOVAPS, MOVUPS, MOVSS, MOVAPD, MOVUPD, MOVSD, MOVDQA, and MOVDQU) can be used.
These instructions can also be used to restore the contents of XMM registers. To avoid performance degradation
when saving XMM registers to memory or when loading XMM registers from memory, be sure to use the appropri-
ately typed move instructions.
The move instructions can also be used to save the contents of XMM registers on the stack. Here, the stack pointer
(in the ESP register) can be used as the memory address to the next available byte in the stack. Note that the stack
pointer is not automatically incremented when using a move instruction (as it is with PUSH).
A move-instruction procedure that saves the contents of an XMM register to the stack is responsible for decre-
menting the value in the ESP register by 16. Likewise, a move-instruction procedure that loads an XMM register
from the stack needs also to increment the ESP register by 16. To avoid performance degradation when moving the
contents of XMM registers, use the appropriately typed move instructions.