background image

8-12 Vol. 3A


Because the Intel-64 memory-ordering model prevents loads from being reordered (see Section, 
processor 3’s loads occur in order and, therefore, processor 1’s XCHG occurs before processor 3’s load from x.

Since processor 0’s XCHG into x occurs before processor 1’s XCHG (by assumption), it occurs before 
processor 3’s load from x. Thus, r6 = 1.

A similar argument (referring instead to processor 2’s loads) applies if processor 1’s XCHG occurs before 
processor 0’s XCHG.  

Loads and Stores Are Not Reordered with Locked Instructions

The memory-ordering model prevents loads and stores from being reordered with locked instructions that execute 
earlier or later. The examples in this section illustrate only cases in which a locked instruction is executed before a 
load or a store. The reader should note that reordering is prevented also if the locked instruction is executed after 
a load or a store.
The first example illustrates that loads may not be reordered with earlier locked instructions:

As explained in Section, there is a total order of the executions of locked instructions. Without loss of 
generality, suppose that processor 0’s XCHG occurs first.
Because the Intel-64 memory-ordering model prevents processor 1’s load from being reordered with its earlier 
XCHG, processor 0’s XCHG occurs before processor 1’s load. This implies r4 = 1.
A similar argument (referring instead to processor 2’s accesses) applies if processor 1’s XCHG occurs before 
processor 0’s XCHG.
The second example illustrates that a store may not be reordered with an earlier locked instruction:

Assume r2 = 1.

Because r2 = 1, processor 0’s store to y occurs before processor 1’s load from y.

Because the memory-ordering model prevents a store from being reordered with an earlier locked instruction, 
processor 0’s XCHG into x occurs before its store to y. Thus, processor 0’s XCHG into x occurs before 
processor 1’s load from y.

Because the memory-ordering model prevents loads from being reordered (see Section, processor 1’s 
loads occur in order and, therefore, processor 1’s XCHG into x occurs before processor 1’s load from x. Thus, 
r3 = 1.


Fast-String Operation and Out-of-Order Stores

Section of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 described an optimi-
zation of repeated string operations called fast-string operation.

Example 8-9.  Loads Are not Reordered with Locks

Processor 0

Processor 1

xchg [ _x], r1

xchg [ _y], r3

mov r2, [ _y]

mov r4, [ _x]

Initially x = y = 0, r1 = r3 = 1
r2 = 0 and r4 = 0 is not allowed

Example 8-10.  Stores Are not Reordered with Locks

Processor 0

Processor 1

xchg [ _x], r1

mov r2, [ _y]

mov [ _y], 1

mov r3, [ _x]

Initially x = y = 0, r1 = 1
r2 = 1 and r3 = 0 is not allowed