background image

Vol. 3A 8-1

CHAPTER 8

MULTIPLE-PROCESSOR MANAGEMENT

The Intel 64 and IA-32 architectures provide mechanisms for managing and improving the performance of multiple 
processors connected to the same system bus. These include:

Bus locking and/or cache coherency management for performing atomic operations on system memory.

Serializing instructions.

An advance programmable interrupt controller (APIC) located on the processor chip (see Chapter 10, 
“Advanced Programmable Interrupt Controller (APIC)”).
 This feature was introduced by the Pentium processor.

A second-level cache (level 2, L2). For the Pentium 4, Intel Xeon, and P6 family processors, the L2 cache is 
included in the processor package and is tightly coupled to the processor. For the Pentium and Intel486 
processors, pins are provided to support an external L2 cache.

A third-level cache (level 3, L3). For Intel Xeon processors, the L3 cache is included in the processor package 
and is tightly coupled to the processor.

Intel Hyper-Threading Technology. This extension to the Intel 64 and IA-32 architectures enables a single 
processor core to execute two or more threads concurrently (see Section 8.5, “Intel

®

 Hyper-Threading 

Technology and Intel

®

 Multi-Core Technology”).

These mechanisms are particularly useful in symmetric-multiprocessing (SMP) systems. However, they can also be 
used when an Intel 64 or IA-32 processor and a special-purpose processor (such as a communications, graphics, 
or video processor) share the system bus.
These multiprocessing mechanisms have the following characteristics:

To maintain system memory coherency — When two or more processors are attempting simultaneously to 
access the same address in system memory, some communication mechanism or memory access protocol 
must be available to promote data coherency and, in some instances, to allow one processor to temporarily lock 
a memory location.

To maintain cache consistency — When one processor accesses data cached on another processor, it must not 
receive incorrect data. If it modifies data, all other processors that access that data must receive the modified 
data.

To allow predictable ordering of writes to memory — In some circumstances, it is important that memory writes 
be observed externally in precisely the same order as programmed.

To distribute interrupt handling among a group of processors — When several processors are operating in a 
system in parallel, it is useful to have a centralized mechanism for receiving interrupts and distributing them to 
available processors for servicing.

To increase system performance by exploiting the multi-threaded and multi-process nature of contemporary 
operating systems and applications.

The caching mechanism and cache consistency of Intel 64 and IA-32 processors are discussed in Chapter 11. The 
APIC architecture is described in Chapter 10. Bus and memory locking, serializing instructions, memory ordering, 
and Intel Hyper-Threading Technology are discussed in the following sections. 

8.1 

LOCKED ATOMIC OPERATIONS

The 32-bit IA-32 processors support locked atomic operations on locations in system memory. These operations 
are typically used to manage shared data structures (such as semaphores, segment descriptors, system segments, 
or page tables) in which two or more processors may try simultaneously to modify the same field or flag. The 
processor uses three interdependent mechanisms for carrying out locked atomic operations:

Guaranteed atomic operations

Bus locking, using the LOCK# signal and the LOCK instruction prefix