Vol. 3B 14-31
POWER AND THERMAL MANAGEMENT
Passive cooling (frequency throttling) should be driven by measuring (a) the core and package temperatures, or
(b) only the package temperature. If measured package temperature led the power management agent to choose
which core to execute passive cooling, then all cores need to execute passive cooling. Core temperature is
measured using the IA32_THERMAL_STATUS and IA32_THERMAL_INTERRUPT MSRs. The exact implementation
details depend on the platform firmware and possible solutions include defining two different thermal zones (one
for core temperature and passive cooling and the other for package temperature and active cooling).
14.9
PLATFORM SPECIFIC POWER MANAGEMENT SUPPORT
This section covers power management interfaces that are not architectural but addresses the power management
needs of several platform specific components. Specifically, RAPL (Running Average Power Limit) interfaces
provide mechanisms to enforce power consumption limit. Power limiting usages have specific usages in client and
server platforms.
For client platform power limit control and for server platforms used in a data center, the following power and
thermal related usages are desirable:
•
Platform Thermal Management: Robust mechanisms to manage component, platform, and group-level
thermals, either proactively or reactively (e.g., in response to a platform-level thermal trip point).
•
Platform Power Limiting: More deterministic control over the system's power consumption, for example to
meet battery life targets on rack-level or container-level power consumption goals within a datacenter.
•
Power/Performance Budgeting: Efficient means to control the power consumed (and therefore the sustained
performance delivered) within and across platforms.
The server and client usage models are addressed by RAPL interfaces, which expose multiple domains of power
rationing within each processor socket. Generally, these RAPL domains may be viewed to include hierarchically:
•
Package domain is the processor die.
•
Memory domain includes the directly-attached DRAM; an additional power plane may constitute a separate
domain.
In order to manage the power consumed across multiple sockets via RAPL, individual limits must be programmed
for each processor complex. Programming specific RAPL domain across multiple sockets is not supported.
14.9.1 RAPL
Interfaces
RAPL interfaces consist of non-architectural MSRs. Each RAPL domain supports the following set of capabilities,
some of which are optional as stated below.
•
Power limit - MSR interfaces to specify power limit, time window; lock bit, clamp bit etc.
•
Energy Status - Power metering interface providing energy consumption information.
•
Perf Status (Optional) - Interface providing information on the performance effects (regression) due to power
limits. It is defined as a duration metric that measures the power limit effect in the respective domain. The
meaning of duration is domain specific.
•
Power Info (Optional) - Interface providing information on the range of parameters for a given domain,
minimum power, maximum power etc.
•
Policy (Optional) - 4-bit priority information that is a hint to hardware for dividing budget between sub-domains
in a parent domain.
Each of the above capabilities requires specific units in order to describe them. Power is expressed in Watts, Time
is expressed in Seconds, and Energy is expressed in Joules. Scaling factors are supplied to each unit to make the
information presented meaningful in a finite number of bits. Units for power, energy, and time are exposed in the
read-only MSR_RAPL_POWER_UNIT MSR.