background image

Vol. 3A 8-13

MULTIPLE-PROCESSOR MANAGEMENT

As explained in that section, the stores produced by fast-string operation may appear to execute out of order. Soft-
ware dependent upon sequential store ordering should not use string operations for the entire data structure to be 
stored. Data and semaphores should be separated. Order-dependent code should write to a discrete semaphore 
variable after any string operations to allow correctly ordered data to be seen by all processors. Atomicity of load 
and store operations is guaranteed only for native data elements of the string with native data size, and only if they 
are included in a single cache line.
Section 8.2.4.1 and Section 8.2.4.2 provide further explain and examples.

8.2.4.1  

Memory-Ordering Model for String Operations on Write-Back (WB) Memory

This section deals with the memory-ordering model for string operations on write-back (WB) memory for the Intel 
64 architecture. 
The memory-ordering model respects the follow principles:
1. Stores within a single string operation may be executed out of order.
2. Stores from separate string operations (for example, stores from consecutive string operations) do not execute 

out of order. All the stores from an earlier string operation will complete before any store from a later string 
operation. 

3. String operations are not reordered with other store operations.
Fast string operations (e.g. string operations initiated with the MOVS/STOS instructions and the REP prefix) may be 
interrupted by exceptions or interrupts. The interrupts are precise but may be delayed - for example, the interrup-
tions may be taken at cache line boundaries, after every few iterations of the loop, or after operating on every few 
bytes. Different implementations may choose different options, or may even choose not to delay interrupt 
handling, so software should not rely on the delay. When the interrupt/trap handler is reached, the source/destina-
tion registers point to the next string element to be operated on, while the EIP stored in the stack points to the 
string instruction, and the ECX register has the value it held following the last successful iteration. The return from 
that trap/interrupt handler should cause the string instruction to be resumed from the point where it was inter-
rupted.
The string operation memory-ordering principles, (item 2 and 3 above) should be interpreted by taking the incor-
ruptibility of fast string operations into account. For example, if a fast string operation gets interrupted after k iter-
ations, then stores performed by the interrupt handler will become visible after the fast string stores from iteration 
0 to k, and before the fast string stores from the (k+1)th iteration onward. 
Stores within a single string operation may execute out of order (item 1 above) only if fast string operation is 
enabled. Fast string operations are enabled/disabled through the IA32_MISC_ENABLE model specific register. 

8.2.4.2  

Examples Illustrating Memory-Ordering Principles for String Operations

The following examples uses the same notation and convention as described in Section 8.2.3.1.
In Example 8-11, processor 0 does one round of (128 iterations) doubleword string store operation via rep:stosd, 
writing the value 1 (value in EAX) into a block of 512 bytes from location _x (kept in ES:EDI) in ascending order. 
Since each operation stores a doubleword (4 bytes), the operation is repeated 128 times (value in ECX). The block 
of memory initially contained 0. Processor 1 is reading two memory locations that are part of the memory block 
being updated by processor 0, i.e, reading locations in the range _x to (_x+511).

Example 8-11.  Stores Within a String Operation May be Reordered

Processor 0

Processor 1

rep:stosd [ _x]

mov r1, [ _z]
mov r2, [ _y]

Initially on processor 0: EAX = 1, ECX=128, ES:EDI =_x 
Initially [_x] to 511[_x]= 0, _x <= _y < _z < _x+512
r1 = 1 and r2 = 0 is allowed