Vol. 3B 15-17
MACHINE-CHECK ARCHITECTURE
•
Uncorrected no action required (UCNA) - is a UCR error that is not signaled via a machine check exception and,
instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some
data in the system is corrupted, but the data has not been consumed and the processor state is valid and you
may continue execution on this processor. UCNA errors require no action from system software to continue
execution. A UNCA error is indicated with UC=1, PCC=0, S=0 and AR=0 in the IA32_MCi_STATUS register.
•
Software recoverable action optional (SRAO) - a UCR error is signaled either via a machine check exception or
CMCI. System software recovery action is optional and not required to continue execution from this machine
check exception. SRAO errors indicate that some data in the system is corrupt, but the data has not been
consumed and the processor state is valid. SRAO errors provide the additional error information for system
software to perform a recovery action. An SRAO error when signaled as a machine check is indicated with
UC=1, PCC=0, S=1, EN=1 and AR=0 in the IA32_MCi_STATUS register. In cases when SRAO is signaled via
CMCI the error signature is indicated via UC=1, PCC=0, S=0. Recovery actions for SRAO errors are MCA error
code specific. The MISCV and the ADDRV flags in the IA32_MCi_STATUS register are set when the additional
error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. System software
needs to inspect the MCA error code fields in the IA32_MCi_STATUS register to identify the specific recovery
action for a given SRAO error. If MISCV and ADDRV are not set, it is recommended that no system software
error recovery be performed however, system software can resume execution.
•
Software recoverable action required (SRAR) - a UCR error that requires system software to take a recovery
action on this processor before scheduling another stream of execution on this processor. SRAR errors indicate
that the error was detected and raised at the point of the consumption in the execution flow. An SRAR error is
indicated with UC=1, PCC=0, S=1, EN=1 and AR=1 in the IA32_MCi_STATUS register. Recovery actions are
MCA error code specific. The MISCV and the ADDRV flags in the IA32_MCi_STATUS register are set when the
additional error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. System
software needs to inspect the MCA error code fields in the IA32_MCi_STATUS register to identify the specific
recovery action for a given SRAR error. If MISCV and ADDRV are not set, it is recommended that system
software shutdown the system.
Table 15-6 summarizes UCR, corrected, and uncorrected errors.
15.6.4
UCR Error Overwrite Rules
In general, the overwrite rules are as follows:
Table 15-6. MC Error Classifications
Type of Error
1
NOTES:
1. SRAR, SRAO and UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
UC EN
PCC
S
AR Signaling Software Action
Example
Uncorrected Error (UC) 1
1
1
x
x
MCE
If EN=1, reset the system, else log
and OK to keep the system running.
SRAR
1
1
0
1
1
MCE
For known MCACOD, take specific
recovery action;
For unknown MCACOD, must
bugcheck.
If OVER=1, reset system, else take
specific recovery action.
Cache to processor load
error.
SRAO
1
x
2
2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.
0
x
2
0
MCE/CMC For known MCACOD, take specific
recovery action;
For unknown MCACOD, OK to keep
the system running.
Patrol scrub and explicit
writeback poison errors.
UCNA
1
x
0
0
0
CMC
Log the error and Ok to keep the
system running.
Poison detection error.
Corrected Error (CE)
0
x
x
x
x
CMC
Log the error and no corrective
action required.
ECC in caches and
memory.