background image

Vol. 3B 15-29

MACHINE-CHECK ARCHITECTURE

Example 15-3.  Machine-Check Error Logging Pseudocode

Assume that execution is restartable;

IF the processor supports MCA

THEN

FOR each bank of machine-check registers 

DO

READ IA32_MCi_STATUS;

IF VAL flag in IA32

_

MCi_STATUS = 1

THEN

IF ADDRV flag in IA32_MCi_STATUS = 1

THEN READ IA32_MCi_ADDR; 

FI;
IF MISCV flag in IA32_MCi_STATUS = 1

THEN READ IA32_MCi_MISC;

FI;
IF MCIP flag in IA32_MCG_STATUS = 1

(* Machine-check exception is in progress *) 
AND PCC flag in IA32_MCi_STATUS = 1

OR RIPV flag in IA32_MCG_STATUS = 0

(* execution is not restartable *)

THEN 

RESTARTABILITY = FALSE;

return RESTARTABILITY to calling procedure;

FI;
Save time-stamp counter and processor ID;

Set IA32_MCi_STATUS to all 0s;

Execute serializing instruction (i.e., CPUID);

FI;

OD;

FI;

If the processor supports the machine-check architecture, the utility reads through the banks of error-reporting 
registers looking for valid register entries. It then saves the values of the IA32_MCi_STATUS, IA32_MCi_ADDR, 
IA32_MCi_MISC and IA32_MCG_STATUS registers for each bank that is valid. The routine minimizes processing 
time by recording the raw data into a system data structure or file, reducing the overhead associated with polling. 
User utilities analyze the collected data in an off-line environment.
When the MCIP flag is set in the IA32_MCG_STATUS register, a machine-check exception is in progress and the 
machine-check exception handler has called the exception logging routine. 
Once the logging process has been completed the exception-handling routine must determine whether execution 
can be restarted, which is usually possible when damage has not occurred (The PCC flag is clear, in the 
IA32_MCi_STATUS register) and when the processor can guarantee that execution is restartable (the RIPV flag is 
set in the IA32_MCG_STATUS register). If execution cannot be restarted, the system is not recoverable and the 
exception-handling routine should signal the console appropriately before returning the error status to the Oper-
ating System kernel for subsequent shutdown.
The machine-check architecture allows buffering of exceptions from a given error-reporting bank although the 
Pentium 4, Intel Xeon, Intel Atom, and P6 family processors do not implement this feature. The error logging 
routine should provide compatibility with future processors by reading each hardware error-reporting bank's 
IA32_MCi_STATUS register and then writing 0s to clear the OVER and VAL flags in this register. The error logging 
utility should re-read the IA32_MCi_STATUS register for the bank ensuring that the valid bit is clear. The processor 
will write the next error into the register bank and set the VAL flags. 
Additional information that should be stored by the exception-logging routine includes the processor’s time-stamp 
counter value, which provides a mechanism to indicate the frequency of exceptions. A multiprocessing operating 
system stores the identity of the processor node incurring the exception using a unique identifier, such as the 
processor’s APIC ID (see Section 10.8, “Handling Interrupts”). 
The basic algorithm given in Example 15-3 can be modified to provide more robust recovery techniques. For 
example, software has the flexibility to attempt recovery using information unavailable to the hardware. Specifi-
cally, the machine-check exception handler can, after logging carefully analyze the error-reporting registers when 
the error-logging routine reports an error that does not allow execution to be restarted. These recovery techniques