background image

Vol. 3B 14-31

POWER AND THERMAL MANAGEMENT

Passive cooling (frequency throttling) should be driven by measuring (a) the core and package temperatures, or 
(b) only the package temperature. If measured package temperature led the power management agent to choose 
which core to execute passive cooling, then all cores need to execute passive cooling. Core temperature is 
measured using the IA32_THERMAL_STATUS and IA32_THERMAL_INTERRUPT MSRs. The exact implementation 
details depend on the platform firmware and possible solutions include defining two different thermal zones (one 
for core temperature and passive cooling and the other for package temperature and active cooling).

14.9 

PLATFORM SPECIFIC POWER MANAGEMENT SUPPORT

This section covers power management interfaces that are not architectural but addresses the power management 
needs of several platform specific components. Specifically, RAPL (Running Average Power Limit) interfaces 
provide mechanisms to enforce power consumption limit. Power limiting usages have specific usages in client and 
server platforms. 
For client platform power limit control and for server platforms used in a data center, the following power and 
thermal related usages are desirable:

Platform Thermal Management: Robust mechanisms to manage component, platform, and group-level 
thermals, either proactively or reactively (e.g., in response to a platform-level thermal trip point).

Platform Power Limiting: More deterministic control over the system's power consumption, for example to 
meet battery life targets on rack-level or container-level power consumption goals within a datacenter. 

Power/Performance Budgeting: Efficient means to control the power consumed (and therefore the sustained 
performance delivered) within and across platforms.

The server and client usage models are addressed by RAPL interfaces, which expose multiple domains of power 
rationing within each processor socket. Generally, these RAPL domains may be viewed to include hierarchically:

Package domain is the processor die. 

Memory domain includes the directly-attached DRAM; an additional power plane may constitute a separate 
domain. 

In order to manage the power consumed across multiple sockets via RAPL, individual limits must be programmed 
for each processor complex. Programming specific RAPL domain across multiple sockets is not supported.

14.9.1 RAPL 

Interfaces

RAPL interfaces consist of non-architectural MSRs. Each RAPL domain supports the following set of capabilities, 
some of which are optional as stated below.

Power limit - MSR interfaces to specify power limit, time window; lock bit, clamp bit etc.

Energy Status - Power metering interface providing energy consumption information.

Perf Status (Optional) - Interface providing information on the performance effects (regression) due to power 
limits. It is defined as a duration metric that measures the power limit effect in the respective domain. The 
meaning of duration is domain specific.

Power Info (Optional) - Interface providing information on the range of parameters for a given domain, 
minimum power, maximum power etc.

Policy (Optional) - 4-bit priority information that is a hint to hardware for dividing budget between sub-domains 
in a parent domain.

Each of the above capabilities requires specific units in order to describe them. Power is expressed in Watts, Time 
is expressed in Seconds, and Energy is expressed in Joules. Scaling factors are supplied to each unit to make the 
information presented meaningful in a finite number of bits. Units for power, energy, and time are exposed in the 
read-only MSR_RAPL_POWER_UNIT MSR.