background image

8-52 Vol. 3A

MULTIPLE-PROCESSOR MANAGEMENT

8.10.6.6   Eliminate Execution-Based Timing Loops

Intel discourages the use of timing loops that depend on a processor’s execution speed to measure time. There are 
several reasons:

Timing loops cause problems when they are calibrated on a IA-32 processor running at one frequency and then 
executed on a processor running at another frequency. 

Routines for calibrating execution-based timing loops produce unpredictable results when run on an IA-32 
processor supporting Intel Hyper-Threading Technology. This is due to the sharing of execution resources 
between the logical processors within a physical package. 

To avoid the problems described, timing loop routines must use a timing mechanism for the loop that does not 
depend on the execution speed of the logical processors in the system. The following sources are generally avail-
able:

A high resolution system timer (for example, an Intel 8254).

A high resolution timer within the processor (such as, the local APIC timer or the time-stamp counter).

For additional information, see the Intel® 64 and IA-32 Architectures Optimization Reference Manual.

8.10.6.7   Place Locks and Semaphores in Aligned, 128-Byte Blocks of Memory

When software uses locks or semaphores to synchronize processes, threads, or other code sections; Intel recom-
mends that only one lock or semaphore be present within a cache line (or 128 byte sector, if 128-byte sector is 
supported). In processors based on Intel NetBurst microarchitecture (which support 128-byte sector consisting of 
two cache lines), following this recommendation means that each lock or semaphore should be contained in a 128-
byte block of memory that begins on a 128-byte boundary. The practice minimizes the bus traffic required to 
service locks.

8.11 

MP INITIALIZATION FOR P6 FAMILY PROCESSORS

This section describes the MP initialization process for systems that use multiple P6 family processors. This process 
uses the MP initialization protocol that was introduced with the Pentium Pro processor (see Section 8.4, “Multiple-
Processor (MP) Initialization”). F
or P6 family processors, this protocol is typically used to boot 2 or 4 processors 
that reside on single system bus; however, it can support from 2 to 15 processors in a multi-clustered system when 
the APIC busses are tied together. Larger systems are not supported.

8.11.1 

Overview of the MP Initialization Process For P6 Family Processors

During the execution of the MP initialization protocol, one processor is selected as the bootstrap processor (BSP) 
and the remaining processors are designated as application processors (APs), see Section 8.4.1, “BSP and AP 
Processors.”
 Thereafter, the BSP manages the initialization of itself and the APs. This initialization includes 
executing BIOS initialization code and operating-system initialization code.
The MP protocol imposes the following requirements and restrictions on the system:

An APIC clock (APICLK) must be provided.

The MP protocol will be executed only after a power-up or RESET. If the MP protocol has been completed and a 
BSP has been chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP 
protocol to be repeated. Instead, each processor examines its BSP flag (in the APIC_BASE MSR) to determine 
whether it should execute the BIOS boot-strap code (if it is the BSP) or enter a wait-for-SIPI state (if it is an 
AP).

All devices in the system that are capable of delivering interrupts to the processors must be inhibited from 
doing so for the duration of the MP initialization protocol. The time during which interrupts must be inhibited 
includes the window between when the BSP issues an INIT-SIPI-SIPI sequence to an AP and when the AP 
responds to the last SIPI in the sequence.

The following special-purpose interprocessor interrupts (IPIs) are used during the boot phase of the MP initializa-
tion protocol. These IPIs are broadcast on the APIC bus.