background image

15-16 Vol. 3B

MACHINE-CHECK ARCHITECTURE

15.6.1 

Detection of Software Error Recovery Support

Software must use bit 24 of IA32_MCG_CAP (MCG_SER_P) to detect the presence of software error recovery 
support (see Figure 15-2). When IA32_MCG_CAP[24] is set, this indicates that the processor supports software 
error recovery. When this bit is clear, this indicates that there is no support for error recovery from the processor 
and the primary responsibility of the machine check handler is logging the machine check error information and 
shutting down the system. 
The new class of architectural MCA errors from which system software can attempt recovery is called Uncorrected 
Recoverable (UCR) Errors. UCR errors are uncorrected errors that have been detected and signaled but have not 
corrupted the processor context. For certain UCR errors, this means that once system software has performed a 
certain recovery action, it is possible to continue execution on this processor. UCR error reporting provides an error 
containment mechanism for data poisoning. The machine check handler will use the error log information from the 
error reporting registers to analyze and implement specific error recovery actions for UCR errors. 

15.6.2 

UCR Error Reporting and Logging

IA32_MCi_STATUS MSR is used for reporting UCR errors and existing corrected or uncorrected errors. The defini-
tions of IA32_MCi_STATUS, including bit fields to identify UCR errors, is shown in Figure 15-6. UCR errors can be 
signaled through either the corrected machine check interrupt (CMCI) or machine check exception (MCE) path 
depending on the type of the UCR error. 
When IA32_MCG_CAP[24] is set, a UCR error is indicated by the following bit settings in the IA32_MCi_STATUS 
register: 

Valid (bit 63) = 1

UC (bit 61) = 1

PCC (bit 57) = 0

Additional information from the IA32_MCi_MISC and the IA32_MCi_ADDR registers for the UCR error are available 
when the ADDRV and the MISCV flags in the IA32_MCi_STATUS register are set (see Section 15.3.2.4). The MCA 
error code field of the IA32_MCi_STATUS register indicates the type of UCR error. System software can interpret 
the MCA error code field to analyze and identify the necessary recovery action for the given UCR error.
In addition, the IA32_MCi_STATUS register bit fields, bits 56:55, are defined (see Figure 15-6) to provide addi-
tional information to help system software to properly identify the necessary recovery action for the UCR error:

S (Signaling) flag, bit 56 - Indicates (when set) that a machine check exception was generated for the UCR 
error reported in this MC bank and system software needs to check the AR flag and the MCA error code fields in 
the IA32_MCi_STATUS register to identify the necessary recovery action for this error. When the S flag in the 
IA32_MCi_STATUS register is clear, this UCR error was not signaled via a machine check exception and instead 
was reported as a corrected machine check (CMC). System software is not required to take any recovery action 
when the S flag in the IA32_MCi_STATUS register is clear. 

AR (Action Required) flag, bit 55 - Indicates (when set) that MCA error code specific recovery action must be 
performed by system software at the time this error was signaled. This recovery action must be completed 
successfully before any additional work is scheduled for this processor. When the RIPV flag in the 
IA32_MCG_STATUS is clear, an alternative execution stream needs to be provided; when the MCA error code 
specific recovery specific recovery action cannot be successfully completed, system software must shut down 
the system. When the AR flag in the IA32_MCi_STATUS register is clear, system software may still take MCA 
error code specific recovery action but this is optional; system software can safely resume program execution 
at the instruction pointer saved on the stack from the machine check exception when the RIPV flag in the 
IA32_MCG_STATUS register is set. 

Both the S and the AR flags in the IA32_MCi_STATUS register are defined to be sticky bits, which mean that once 
set, the processor does not clear them. Only software and good power-on reset can clear the S and the AR-flags. 
Both the S and the AR flags are only set when the processor reports the UCR errors (MCG_CAP[24] is set).

15.6.3 

UCR Error Classification

With the S and AR flag encoding in the IA32_MCi_STATUS register, UCR errors can be classified as: