8-6 Vol. 3A
MULTIPLE-PROCESSOR MANAGEMENT
illustrating the behavior of the memory-ordering model on IA-32 and Intel-64 processors. Section 8.2.4 considers
the special treatment of stores for string operations and Section 8.2.5 discusses how memory-ordering behavior
may be modified through the use of specific instructions.
8.2.1
Memory Ordering in the Intel
®
Pentium
®
and Intel486
™
Processors
The Pentium and Intel486 processors follow the processor-ordered memory model; however, they operate as
strongly-ordered processors under most circumstances. Reads and writes always appear in programmed order at
the system bus—except for the following situation where processor ordering is exhibited. Read misses are
permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and, there-
fore, are not directed to the same address being accessed by the read miss.
In the case of I/O operations, both reads and writes always appear in programmed order.
Software intended to operate correctly in processor-ordered processors (such as the Pentium 4, Intel Xeon, and P6
family processors) should not depend on the relatively strong ordering of the Pentium or Intel486 processors.
Instead, it should ensure that accesses to shared variables that are intended to control concurrent execution
among processors are explicitly required to obey program ordering through the use of appropriate locking or seri-
alizing operations (see Section 8.2.5, “Strengthening or Weakening the Memory-Ordering Model”).
8.2.2
Memory Ordering in P6 and More Recent Processor Families
The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium 4, and P6 family processors also use a processor-ordered
memory-ordering model that can be further defined as “write ordered with store-buffer forwarding.” This model
can be characterized as follows.
In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model
respects the following principles (Note the memory-ordering principles for single-processor and multiple-
processor systems are written from the perspective of software executing on the processor, where the term
“processor” refers to a logical processor. For example, a physical processor supporting multiple cores and/or Intel
Hyper-Threading Technology is treated as a multi-processor systems.):
•
Reads are not reordered with other reads.
•
Writes are not reordered with older reads.
•
Writes to memory are not reordered with other writes, with the following exceptions:
— streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ,
MOVNTDQ, MOVNTPS, and MOVNTPD); and
— string operations (see Section 8.2.4.1).
•
No write to memory may be reordered with an execution of the CLFLUSH instruction; a write may be reordered
with an execution of the CLFLUSHOPT instruction that flushes a cache line other than the one being written.
1
Executions of the CLFLUSH instruction are not reordered with each other. Executions of CLFLUSHOPT that
access different cache lines may be reordered with each other. An execution of CLFLUSHOPT may be reordered
with an execution of CLFLUSH that accesses a different cache line.
•
Reads may be reordered with older writes to different locations but not with older writes to the same location.
•
Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
•
Reads cannot pass earlier LFENCE and MFENCE instructions.
•
Writes and executions of CLFLUSH and CLFLUSHOPT cannot pass earlier LFENCE, SFENCE, and MFENCE
instructions.
•
LFENCE instructions cannot pass earlier reads.
•
SFENCE instructions cannot pass earlier writes or executions of CLFLUSH and CLFLUSHOPT.
•
MFENCE instructions cannot pass earlier reads, writes, or executions of CLFLUSH and CLFLUSHOPT.
1. Earlier versions of this manual specified that writes to memory may be reordered with executions of the CLFLUSH instruction. No
processors implementing the CLFLUSH instruction allow such reordering.