background image

12-6 Vol. 1

PROGRAMMING WITH INTEL® SSE3, SSSE3, INTEL® SSE4 AND INTEL® AESNI

MONITOR and MWAIT are targeted for system software that supports efficient thread synchronization, See Chapter 
13 in 
the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A for details.

12.4.3 

Enable FTZ and DAZ for SIMD Floating-Point Computation

Enabling the FTZ and DAZ flags in the MXCSR register is likely to accelerate SIMD floating-point computation where 
strict compliance to the IEEE standard 754-1985 is not required. The FTZ flag is available to Intel 64 and IA-32 
processors that support the SSE; DAZ is available to Intel 64 processors and to most IA-32 processors that support 
SSE/SSE2/SSE3. 
Software can detect the presence of DAZ, modify the MXCSR register, and save and restore state information by 
following the techniques discussed in Section 11.6.3 through Section 11.6.6.

12.4.4 

Programming SSE3 with SSE/SSE2 Extensions

SIMD instructions in SSE3 extensions are intended to complement the use of SSE/SSE2 in programming SIMD 
applications. Application software that intends to use SSE3 instructions should also check for the availability of 
SSE/SSE2 instructions.
The FISTTP instruction in SSE3 is intended to accelerate x87 style programming where performance is limited by 
frequent floating-point conversion to integers; this happens when the x87 FPU control word is modified frequently. 
Use of FISTTP can eliminate the need to access the x87 FPU control word.

12.5 

OVERVIEW OF SSSE3 INSTRUCTIONS

SSSE3 provides 32 instructions to accelerate a variety of multimedia and signal processing applications employing 
SIMD integer data. See:

Section 12.6, “SSSE3 Instructions,” provides an introduction to individual SSSE3 instructions. 

Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes 2A & 2B, provide detailed 
information on individual instructions.

Chapter 13, “System Programming for Instruction Set Extensions and Processor Extended States,” in the 
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, gives guidelines for integrating 
SSE/SSE2/SSE3/SSSE3 extensions into an operating-system environment.

12.6 SSSE3 

INSTRUCTIONS

SSSE3 instructions include:

Twelve instructions that perform horizontal addition or subtraction operations.

Six instructions that evaluate the absolute values.

Two instructions that perform multiply and add operations and speed up the evaluation of dot products.

Two instructions that accelerate packed-integer multiply operations and produce integer values with scaling.

Two instructions that perform a byte-wise, in-place shuffle according to the second shuffle control operand.

Six instructions that negate packed integers in the destination operand if the signs of the corresponding 
element in the source operand is less than zero.

Two instructions that align data from the composite of two operands.

The operands of these instructions are packed integers of byte, word, or double word sizes. The operands are 
stored as 64 or 128 bit data in MMX registers, XMM registers, or memory.
The instructions are discussed in more detail in the following paragraphs.