background image

Vol. 3B 15-31

MACHINE-CHECK ARCHITECTURE

When the EN flag is zero but the VAL and UC flags are one in the IA32_MCi_STATUS register, the reported 
uncorrected error in this bank is not enabled. As uncorrected errors with the EN flag = 0 are not the source of 
machine check exceptions, the MCE handler should log and clear non-enabled errors when the S bit is set and 
should continue searching for enabled errors from the other IA32_MCi_STATUS registers. Note that when 
IA32_MCG_CAP [24] is 0, any uncorrected error condition (VAL =1 and UC=1) including the one with the EN 
flag cleared are fatal and the handler must signal the operating system to reset the system. For the errors that 
do not generate machine check exceptions, the EN flag has no meaning.

When the VAL flag is one, the UC flag is one, the EN flag is one and the PCC flag is zero in the 
IA32_MCi_STATUS register, the error in this bank is an uncorrected recoverable (UCR) error. The MCE handler 
needs to examine the S flag and the AR flag to find the type of the UCR error for software recovery and 
determine if software error recovery is possible. 

When both the S and the AR flags are clear in the IA32_MCi_STATUS register for the UCR error (VAL=1, UC=1, 
EN=x and PCC=0), the error in this bank is an uncorrected no-action required error (UCNA). UCNA errors are 
uncorrected but do not require any OS recovery action to continue execution. These errors indicate that some 
data in the system is corrupt, but that data has not been consumed and may not be consumed.   If that data is 
consumed a non-UNCA machine check exception will be generated. UCNA errors are signaled in the same way 
as corrected machine check errors and the CMCI and CMC polling handler is primarily responsible for handling 
UCNA errors. Like corrected errors, the MCA handler can optionally log and clear UCNA errors as long as it can 
avoid the undesired race condition with the CMCI or CMC polling handler. As UCNA errors are not the source of 
machine check exceptions, the MCA handler should continue searching for uncorrected or software recoverable 
errors in all other MC banks. 

When the S flag in the IA32_MCi_STATUS register is set for the UCR error ((VAL=1, UC=1, EN=1 and PCC=0), 
the error in this bank is software recoverable and it was signaled through a machine-check exception.  The AR 
flag in the IA32_MCi_STATUS register further clarifies the type of the software recoverable errors. 

When the AR flag in the IA32_MCi_STATUS register is clear for the software recoverable error (VAL=1, UC=1, 
EN=1, PCC=0 and S=1), the error in this bank is a software recoverable action optional (SRAO) error. The MCE 
handler and the operating system can analyze the IA32_MCi_STATUS [15:0] to implement MCA error code 
specific optional recovery action, but this recovery action is optional. System software can resume the program 
execution from the instruction pointer saved on the stack for the machine check exception when the RIPV flag 
in the IA32_MCG_STATUS register is set. 

Even if the OVER flag in the IA32_MCi_STATUS register is set for the SRAO error (VAL=1, UC=1, EN=1, PCC=0, 
S=1 and AR=0), the MCE handler can take recovery action for the SRAO error logged in the IA32_MCi_STATUS 
register. Since the recovery action for SRAO errors is optional, restarting the program execution from the 
instruction pointer saved on the stack for the machine check exception is still possible for the overflowed SRAO 
error if the RIPV flag in the IA32_MCG_STATUS is set. 

When the AR flag in the IA32_MCi_STATUS register is set for the software recoverable error (VAL=1, UC=1, 
EN=1, PCC=0 and S=1), the error in this bank is a software recoverable action required (SRAR) error. The MCE 
handler and the operating system must take recovery action in order to continue execution after the machine-
check exception. The MCA handler and the operating system need to analyze the IA32_MCi_STATUS [15:0] to 
determine the MCA error code specific recovery action. If no recovery action can be performed, the operating 
system must reset the system. 

When the OVER flag in the IA32_MCi_STATUS register is set for the SRAR error (VAL=1, UC=1, EN=1, PCC=0, 
S=1 and AR=1), the MCE handler cannot take recovery action as the information of the SRAR error in the 
IA32_MCi_STATUS register was potentially lost due to the overflow condition. Since the recovery action for 
SRAR errors must be taken, the MCE handler must signal the operating system to reset the system. 

When the MCE handler cannot find any uncorrected (VAL=1, UC=1 and EN=1) or any software recoverable 
errors (VAL=1, UC=1, EN=1, PCC=0 and S=1) in any of the IA32_MCi banks of the processors, this is an 
unexpected condition for the MCE handler and the handler should signal the operating system to reset the 
system. 

Before returning from the machine-check exception handler, software must clear the MCIP flag in the 
IA32_MCG_STATUS register. The MCIP flag is used to detect recursion. The machine-check architecture does 
not support recursion. When the processor receives a machine check when MCIP is set, it automatically enters 
the shutdown state.

Example 15-4 gives pseudocode for an MC exception handler that supports recovery of UCR.