background image

Vol. 3A 13-7

SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR EXTENDED STATES

The operating system can take the responsibility for saving the states as part of the task switch process and 
restoring the state of the registers when a suspended task is resumed. This approach is appropriate for 
preemptive multitasking operating systems, where the application cannot know when it is going to be 
preempted and cannot prepare in advance for task switching. 

The operating system can take the responsibility for saving the states as part of the task switch process, but 
delay the restoring of the states until an instruction operating on the states is actually executed by the new 
task. See Section 13.4.1, “Using the TS Flag to Control the Saving of the x87 FPU and SSE State,” for more 
information. This approach is called lazy restore.
The use of lazy restore mechanism in context switches is not recommended when XSAVE feature set is used to 
save/restore states for the following reasons.
— With XSAVE feature set, Intel processors have optimizations in place to avoid saving the state components 

that are in their initial configurations or when they have not been modified since they were restored last. 
These optimizations eliminate the need for lazy restore. See section 13.5.4 in Intel® 64 and IA-32 Archi-
tectures Software Developer’s Manual, Volume 1

— Intel processors have power optimizations when state components are in their initial configurations. Use of 

lazy restore retains the non-initial configuration of the last thread and is not power efficient.

— Not all extended states support lazy restore mechanisms. As such, when one or more such states are 

enabled it becomes very inefficient to use lazy restore as it results in two separate state restore, one in 
context switch for the states that does not support lazy restore and one in the #NM handler for states that 
support lazy restore.

13.4.1 

Using the TS Flag to Control the Saving of the x87 FPU and SSE State

The TS flag in control register CR0 is provided to allow the operating system to delay saving/restoring the x87 FPU 
and SSE state until an instruction that actually accesses this state is encountered in a new task. When the TS flag 
is set, the processor monitors the instruction stream for x87 FPU, MMX, SSE instructions. When the processor 
detects one of these instructions, it raises a device-not-available exception (#NM) prior to executing the instruc-
tion. The #NM exception handler can then be used to save the x87 FPU and SSE state for the previous task (using 
an FXSAVE, XSAVE, or XSAVEOPT instruction) and load the x87 FPU and SSE state for the current task (using an 
FXRSTOR or XRSOTR instruction). If the task never encounters an x87 FPU, MMX, or SSE instruction, the device-
not-available exception will not be raised and a task state will not be saved/restored unnecessarily.

NOTE

The CRC32 and POPCNT instructions do not operate on the x87 FPU or SSE state. They operate on 
the general-purpose registers and are not involved with the techniques described above.

The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0) or implicitly (using 
the IA-32 architecture’s native task switching mechanism). When the native task switching mechanism is used, the 
processor automatically sets the TS flag on a task switch. After the device-not-available handler has saved the x87 
FPU and SSE state, it should execute the CLTS instruction to clear the TS flag.

13.5 

THE XSAVE FEATURE SET AND PROCESSOR EXTENDED STATE 

MANAGEMENT 

The architecture of XSAVE feature set is described in CHAPTER 13 of Intel® 64 and IA-32 Architectures Software 
Developer’s Manual, Volume 1
. The XS
AVE feature set includes the following:

An extensible data layout for existing and future processor state extensions. The layout of the XSAVE area 
extends from the 512-byte FXSAVE/FXRSTOR layout to provide compatibility and migration path from 
managing the legacy FXSAVE/FXRSTOR area. The XSAVE area is described in more detail in Section 13.4 of the 
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.

CPUID enhancements for feature enumeration. See Section 13.2 of the Intel® 64 and IA-32 Architectures 
Software Developer’s Manual, Volume 1
.