Page 258

8-2 Vol. 3A

MULTIPLE-PROCESSOR MANAGEMENT

•

Cache coherency protocols that ensure that atomic operations can be carried out on cached data structures
(cache lock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 family processors

These mechanisms are interdependent in the following ways. Certain basic memory transactions (such as reading
or writing a byte in system memory) are always guaranteed to be handled atomically. That is, once started, the
processor guarantees that the operation will be completed before another processor or bus agent is allowed access
to the memory location. The processor also supports bus locking for performing selected memory operations (such
as a read-modify-write operation in a shared area of memory) that typically need to be handled atomically, but are
not automatically handled this way. Because frequently used memory locations are often cached in a processor’s L1
or L2 caches, atomic operations can often be carried out inside a processor’s caches without asserting the bus lock.
Here the processor’s cache coherency protocols ensure that other processors that are caching the same memory
locations are managed properly while atomic operations are performed on cached memory locations.

NOTE

Where there are contested lock accesses, software may need to implement algorithms that ensure
fair access to resources in order to prevent lock starvation. The hardware provides no resource that
guarantees fairness to participating agents. It is the responsibility of software to manage the
fairness of semaphores and exclusive locking functions.

The mechanisms for handling locked atomic operations have evolved with the complexity of IA-32 processors. More
recent IA-32 processors (such as the Pentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more
refined locking mechanism than earlier processors. These mechanisms are described in the following sections.

8.1.1 Guaranteed

Atomic

Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
always be carried out atomically:

•

Reading or writing a byte

•

Reading or writing a word aligned on a 16-bit boundary

•

Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations
will always be carried out atomically:

•

Reading or writing a quadword aligned on a 64-bit boundary

•

16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The P6 family processors (and newer processors since) guarantee that the following additional memory operation
will always be carried out atomically:

•

Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be
atomic by the Intel Core 2 Duo, Intel

Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family,

Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel
Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split
accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and
should be avoided.
An x87 instruction or an SSE instructions that accesses data larger than a quadword may be implemented using
multiple memory accesses. If such an instruction stores to memory, some of the accesses may complete (writing
to memory) while another causes the operation to fault for architectural reasons (e.g. due an page-table entry that
is marked “not present”). In this case, the effects of the completed accesses may be visible to software even
though the overall instruction caused a fault. If TLB invalidation has been delayed (see Section 4.10.4.4), such
page faults may occur even if all accesses are to the same page.