background image

Vol. 3B 15-15

MACHINE-CHECK ARCHITECTURE

If IA32_MCi_CTL2[30] = 0, proceed to step c.

c. Check whether writing a 1 into IA32_MCi_CTL2[30] can return with 1 on a subsequent read to determine 

this bank can support CMCI. 

If IA32_MCi_CTL2[30] = 0, this bank does not support CMCI. This thread can not own bank i and should 

proceed to step b. and examine the next machine check bank until all of the machine check banks are 
exhausted.

If IA32_MCi_CTL2[30] = 1, modify the per-thread data structure to indicate this thread claims 

ownership to the MC bank; proceed to initialize the error threshold count (bits 15:0) of that bank as 
described in Chapter 15, “CMCI Threshold Management”. Then proceed to step b. and examine the next 
machine check bank until all of the machine check banks are exhausted.

After the thread has examined all of the machine check banks, it sees if it owns any MC banks to service CMCI. 
If any bank has been claimed by this thread:
— Ensure that the CMCI interrupt handler has been set up as described in Chapter 15, “CMCI Interrupt 

Handler”.

— Initialize the CMCI LVT entry, as described in Section 15.5.1, “CMCI Local APIC Interface”.
— Log and clear all of IA32_MCi_Status registers for the banks that this thread owns. This will allow new 

errors to be logged.

15.5.2.2   CMCI Threshold Management

The Corrected MC error threshold field, IA32_MCi_CTL2[15:0], is architecturally defined. Specifically, all these bits 
are writable by software, but different processor implementations may choose to implement less than 15 bits as 
threshold for the overflow comparison with IA32_MCi_STATUS[52:38]. The following describes techniques that 
software can manage CMCI threshold to be compatible with changes in implementation characteristics:

Software can set the initial threshold value to 1 by writing 1 to IA32_MCi_CTL2[15:0]. This will cause overflow 
condition on every corrected MC error and generates a CMCI interrupt.

To increase the threshold and reduce the frequency of CMCI servicing:
a. Find the maximum threshold value a given processor implementation supports. The steps are:

Write 7FFFH to IA32_MCi_CTL2[15:0],

Read back IA32_MCi_CTL2[15:0], the lower 15 bits (14:0) is the maximum threshold supported by the 

processor.

b. Increase the threshold to a value below the maximum value discovered using step a.

15.5.2.3   CMCI Interrupt Handler

The following describes techniques system software may consider to implement a CMCI service routine:

The service routine examines its private per-thread data structure to check which set of MC banks it has 
ownership. If the thread does not have ownership of a given MC bank, proceed to the next MC bank. Ownership 
is determined at initialization time which is described in Section [Cross Reference to 14.5.2.1].

If the thread had claimed ownership to an MC bank, this technique will allow each logical processors to handle 
corrected MC errors independently and requires no synchronization to access shared MSR resources. Consult 
Example 15-5 for guidelines on logging when processing CMCI.

15.6 

RECOVERY OF UNCORRECTED RECOVERABLE (UCR) ERRORS 

Recovery of uncorrected recoverable machine check errors is an enhancement in machine-check architecture. The 
first processor that supports this feature is 45 nm Intel 64 processor on which CPUID reports 
DisplayFamily_DisplayModel as 06H_2EH (see CPUID instruction in Chapter 3, “Instruction Set Reference, A-L” in 
the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A
). This allow system software to 
perform recovery action on certain class of uncorrected errors and continue execution.