Vol. 3B 15-27
MACHINE-CHECK ARCHITECTURE
To use the machine-check exception, the operating system or executive software must provide a machine-check
exception handler. This handler may need to be designed specifically for each family of processors.
A special program or utility is required to log machine errors.
Guidelines for writing a machine-check exception handler or a machine-error logging utility are given in the
following sections.
15.10.1 Machine-Check Exception Handler
The machine-check exception (#MC) corresponds to vector 18. To service machine-check exceptions, a trap gate
must be added to the IDT. The pointer in the trap gate must point to a machine-check exception handler. Two
approaches can be taken to designing the exception handler:
1. The handler can merely log all the machine status and error information, then call a debugger or shut down the
system.
2. The handler can analyze the reported error information and, in some cases, attempt to correct the error and
restart the processor.
For Pentium 4, Intel Xeon, Intel Atom, P6 family, and Pentium processors; virtually all machine-check conditions
cannot be corrected (they result in abort-type exceptions). The logging of status and error information is therefore
a baseline implementation requirement.
When IA32_MCG_CAP[24] is clear, consider the following when writing a machine-check exception handler:
•
To determine the nature of the error, the handler must read each of the error-reporting register banks. The
count field in the IA32_MCG_CAP register gives number of register banks. The first register of register bank 0
is at address 400H.
•
The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the error information in the register
is valid. If this flag is clear, the registers in that bank do not contain valid error information and do not need to
be checked.
•
To write a portable exception handler, only the MCA error code field in the IA32_MCi_STATUS register should be
checked. See Section 15.9, “Interpreting the MCA Error Codes,” for information that can be used to write an
algorithm to interpret this field.
•
Correctable errors are corrected automatically by the processor. The UC flag in each IA32_MCi_STATUS reg-
ister indicates whether the processor automatically corrected an error.
•
The RIPV, PCC, and OVER flags in each IA32_MCi_STATUS register indicate whether recovery from the error is
possible. If PCC or OVER are set, recovery is not possible. If RIPV is not set, program execution can not be
restarted reliably. When recovery is not possible, the handler typically records the error information and signals
an abort to the operating system.
•
The RIPV flag in the IA32_MCG_STATUS register indicates whether the program can be restarted at the
instruction indicated by the instruction pointer (the address of the instruction pushed on the stack when the
exception was generated). If this flag is clear, the processor may still be able to be restarted (for debugging
purposes) but not without loss of program continuity.
•
For unrecoverable errors, the EIPV flag in the IA32_MCG_STATUS register indicates whether the instruction
indicated by the instruction pointer pushed on the stack (when the exception was generated) is related to the
error. If the flag is clear, the pushed instruction may not be related to the error.
•
The MCIP flag in the IA32_MCG_STATUS register indicates whether a machine-check exception was generated.
Before returning from the machine-check exception handler, software should clear this flag so that it can be
used reliably by an error logging utility. The MCIP flag also detects recursion. The machine-check architecture
does not support recursion. When the processor detects machine-check recursion, it enters the shutdown
state.
Example 15-2 gives typical steps carried out by a machine-check exception handler.