background image

33-10 Vol. 3C

HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR

33.4.3.1   VMM Error Handling Strategies

Broadly speaking, there are two strategies that VMMs may take for error handling: 

Basic error handling: in this approach the guest VM is treated as any other thread of execution. If the error 
recovery action does not support restarting the thread after handling the error, the guest VM should be 
terminated.

MCA virtualization: in this approach, the VMM virtualizes the MCA events and hardware. This enables the VMM 
to intercept MCA events and inject an MCA into the guest VM. The guest VM then has the opportunity to attempt 
error recovery actions, rather than being terminated by the VMM.

Details of these approaches and implementation considerations for hosted and native VMMs are discussed below.

33.4.3.2   Basic VMM MCA error recovery handling

The simplest approach is for the VMM to treat the guest VM as any other thread of execution:

MCE's that occur outside the stream of execution of a virtual machine guest will cause an MCE abort and may 
be handled by the MCA error handler following the recovery actions and guidelines described in Section 15.9, 
and Section 15.10. This includes logging the error and taking appropriate recovery actions when necessary. The 
VMM must not resume the interrupted thread of execution or another VM until it has taken the appropriate 
recovery action or, in the case of fatal MCAs, reset the system.

MCE's that occur while executing in the context of a virtual machine will be intercepted by the VMM. The MCA 
intercept handler may follow the error handling guidelines listed in Section 15.9 and Section 15.10 for SRAO 
and SRAR errors. For SRAR errors, terminating the thread of execution will involve terminating the affected 
guest VM. For fatal errors the MCA handler should log the error and reset the system -- the VMM should not 
resume execution of the interrupted VM.

33.4.3.3   Implementation Considerations for the Basic Model

For hosted VMMs, the host OS MCA error handling code will perform error analysis and initiate the appropriate 
recovery actions. For the basic model this flow does not change when terminating a guest VM although the specific 
actions needed to terminate a guest VM may be different than terminating an application or user process.
For native, hypervisor-based VMMs, MCA errors will either be delivered directly to the VMM MCA handler (when the 
error is signaled while in the VMM context) or cause a VM exit from a guest VM or be delivered to the MCA intercept 
handler. There are two general approaches the hypervisor can use to handle the MCA error: either by forwarding 
the error to the control OS or within the hypervisor itself. These approaches are described in the following para-
graphs.
The hypervisor may forward the error to the control OS for handling errors. This approach simplifies the hypervisor 
error handling since it relies on the control OS to implement the basic error handling model.  The control OS error 
handling code will be similar to the error handling code in the hosted VMM. Errors can be forwarded to the control 
OS via an OS callback or by injecting an MCE event into the control OS. Injecting an MCE will cause the control OS 
MCA error handler to be invoked. The control OS is responsible for terminating the affected guest VM, if necessary, 
which may require cooperation from the hypervisor.
Alternatively, the error may be handled completely in the hypervisor. The hypervisor error handler is enhanced to 
implement the basic error handling model and the hypervisor error handler has the capability to fully analyze the 
error information and take recovery actions based on the guidelines. In this case error handling steps in the hyper-
visor are similar to those for the hosted VMM described above (where the hypervisor replaces the host OS actions). 
The hypervisor is responsible for terminating the affected guest VM, if necessary.
In all cases, if a fatal error is detected the VMM error handler should log the error and reset the system. The VMM 
error handler must ensure that guest VMs are not resumed after a fatal error is detected to ensure error contain-
ment is maintained.

33.4.3.4   MCA Virtualization

A more sophisticated approach for handling errors is to virtualize the MCA. This involves virtualizing the MCA hard-
ware and intercepting the MCA event in the VMM when a guest VM is interrupted by an MCA. After analyzing the