background image

Vol. 1 12-5

PROGRAMMING WITH INTEL® SSE3, SSSE3, INTEL® SSE4 AND INTEL® AESNI

The HSUBPD instruction performs a double-precision subtraction on contiguous data elements. The first data 
element of the result is obtained by subtracting the second element of the first operand from the first element of 
the first operand; the second element by subtracting the second element of the second operand from the first 
element of the second operand.

HSUBPD OperandA OperandB
— OperandA (128 bits, two data elements): 1

a

, 0

a

— OperandB (128 bits, two data elements): 1

b

, 0

b

— Result (Stored in OperandA): 0

b

-1

b

, 0

a

-1

a

12.3.6 

Two Thread Synchronization Instructions

The MONITOR instruction sets up an address range that is used to monitor write-back-stores. 
MWAIT enables a logical processor to enter into an optimized state while waiting for a write-back-store to the 
address range set up by MONITOR. MONITOR and MWAIT require the use of general purpose registers for its input. 
The registers used by MONITOR and MWAIT must be initialized properly; register content is not modified by these 
instructions.

12.4 

WRITING APPLICATIONS WITH SSE3 EXTENSIONS

The following sections give guidelines for writing application programs and operating-system code that use SSE3 
instructions. 

12.4.1 

Guidelines for Using SSE3 Extensions

The following guidelines describe how to maximize the benefits of using SSE3 extensions:

Check that the processor supports SSE3 extensions.
— Application may need to ensure that the target operating system supports SSE3. (Operating system 

support for the SSE extensions implies sufficient support for SSE2 extensions and SSE3 extensions.) 

Ensure your operating system supports MONITOR and MWAIT.

Employ the optimization and scheduling techniques described in the Intel® 64 and IA-32 Architectures Optimi-
zation Reference Manual 
(see Section 1.4, “Related Literature”).

12.4.2 

Checking for SSE3 Support

Before an application attempts to use the SIMD subset of SSE3 extensions, the application should follow the steps 
illustrated in Section 11.6.2, “Checking for SSE/SSE2 Support.” Next, use the additional step provided below:

Check that the processor supports the SIMD and x87 SSE3 extensions (if CPUID.01H:ECX.SSE3[bit 0] = 1). 

An operating systems that provides application support for SSE, SSE2 also provides sufficient application support 
for SSE3. To use FISTTP, software only needs to check support for SSE3.
In the initial implementation of MONITOR and MWAIT, these two instructions are available to ring 0 and condition-
ally available at ring level greater than 0. Before an application attempts to use the MONITOR and MWAIT instruc-
tions, the application should use the following steps:
1. Check that the processor supports MONITOR and MWAIT. If CPUID.01H:ECX.MONITOR[bit 3] = 1, MONITOR 

and MWAIT are available at ring 0. 

2. Query the smallest and largest line size that MONITOR uses. Use CPUID.05H:EAX.smallest[bits 

15:0];EBX.largest[bits15:0]. Values are returned in bytes in EAX and EBX.

3. Ensure the memory address range(s) that will be supplied to MONITOR meets memory type requirements.