background image

8-2 Vol. 3A

MULTIPLE-PROCESSOR MANAGEMENT

Cache coherency protocols that ensure that atomic operations can be carried out on cached data structures 
(cache lock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 family processors

These mechanisms are interdependent in the following ways. Certain basic memory transactions (such as reading 
or writing a byte in system memory) are always guaranteed to be handled atomically. That is, once started, the 
processor guarantees that the operation will be completed before another processor or bus agent is allowed access 
to the memory location. The processor also supports bus locking for performing selected memory operations (such 
as a read-modify-write operation in a shared area of memory) that typically need to be handled atomically, but are 
not automatically handled this way. Because frequently used memory locations are often cached in a processor’s L1 
or L2 caches, atomic operations can often be carried out inside a processor’s caches without asserting the bus lock. 
Here the processor’s cache coherency protocols ensure that other processors that are caching the same memory 
locations are managed properly while atomic operations are performed on cached memory locations.

NOTE

Where there are contested lock accesses, software may need to implement algorithms that ensure 
fair access to resources in order to prevent lock starvation. The hardware provides no resource that 
guarantees fairness to participating agents. It is the responsibility of software to manage the 
fairness of semaphores and exclusive locking functions.

The mechanisms for handling locked atomic operations have evolved with the complexity of IA-32 processors. More 
recent IA-32 processors (such as the Pentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more 
refined locking mechanism than earlier processors. These mechanisms are described in the following sections.

8.1.1 Guaranteed 

Atomic 

Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will 
always be carried out atomically:

Reading or writing a byte

Reading or writing a word aligned on a 16-bit boundary

Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations 
will always be carried out atomically:

Reading or writing a quadword aligned on a 64-bit boundary

16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The P6 family processors (and newer processors since) guarantee that the following additional memory operation 
will always be carried out atomically:

Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be 
atomic by the Intel Core 2 Duo, Intel

®

 Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, 

Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel 
Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split 
accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and 
should be avoided.
An x87 instruction or an SSE instructions that accesses data larger than a quadword may be implemented using 
multiple memory accesses. If such an instruction stores to memory, some of the accesses may complete (writing 
to memory) while another causes the operation to fault for architectural reasons (e.g. due an page-table entry that 
is marked “not present”). In this case, the effects of the completed accesses may be visible to software even 
though the overall instruction caused a fault. If TLB invalidation has been delayed (see Section 4.10.4.4), such 
page faults may occur even if all accesses are to the same page.