background image

8-4 Vol. 3A

MULTIPLE-PROCESSOR MANAGEMENT

A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may be 
interpreted by the system as a lock for a larger memory area.
Software should access semaphores (shared memory used for signalling between multiple processors) using iden-
tical addresses and operand lengths. For example, if one processor accesses a semaphore using a word access, 
other processors should not access the semaphore using a byte access. 

NOTE

Do not implement semaphores using the WC memory type. Do not perform non-temporal stores to 
a cache line containing a location used to implement a semaphore.

The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed 
for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses 
be aligned on their natural boundaries for better system performance:

Any boundary for an 8-bit access (locked or otherwise).

16-bit boundary for locked word accesses.

32-bit boundary for locked doubleword accesses.

64-bit boundary for locked quadword accesses.

Locked operations are atomic with respect to all other memory operations and all externally visible events. Only 
instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchro-
nize data written by one processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for 
them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load 
operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.
Locked instructions should not be used to ensure that data written can be fetched as instructions. 

NOTE

The locked instructions for the current versions of the Pentium 4, Intel Xeon, P6 family, Pentium, 
and Intel486 processors allow data written to be fetched as instructions. However, Intel 
recommends that developers who require the use of self-modifying code use a different synchro-
nizing mechanism, described in the following sections.

8.1.3 

Handling Self- and Cross-Modifying Code

The act of a processor writing data into a currently executing code segment with the intent of executing that data 
as code is called self-modifying code. IA-32 processors exhibit model-specific behavior when executing self-
modified code, depending upon how far ahead of the current execution pointer the code has been modified. 
As processor microarchitectures become more complex and start to speculatively execute code ahead of the retire-
ment point (as in P6 and more recent processor families), the rules regarding which code should execute, pre- or 
post-modification, become blurred. To write self-modifying code and ensure that it is compliant with current and 
future versions of the IA-32 architectures, use one of the following coding options:

(* OPTION 1 *)
Store modified code (as data) into code segment; 
Jump to new code or an intermediate location;
Execute new code;

(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Execute new code;

The use of one of these options is not required for programs intended to run on the Pentium or Intel486 processors, 
but are recommended to ensure compatibility with the P6 and more recent processor families.