background image

17-52 Vol. 3B

DEBUG, BRANCH PROFILE, TSC, AND RESOURCE MONITORING FEATURES

shows various non-overlapped partitioning schemes. As a matter of software policy for extensibility COS0 should 
typically be considered and configured as the highest priority COS, followed by COS1, and so on, though there is 
no hardware restriction enforcing this mapping. When the system boots all threads are initialized to COS0, which 
has full access to the cache by default.
Though the representation of the CBMs looks similar to a way-based mapping they are independent of any specific 
enforcement implementation (e.g. way partitioning.) Rather, this is a convenient manner to represent capacity, 
overlap and isolation of cache space. For example, executing a POPCNT instruction (population count of set bits) on 
the capacity bitmask can provide the fraction of cache space that a class of service can allocate into. In addition to 
the fraction, the exact location of the bits also shows whether the class of service overlaps with other classes of 
service or is entirely isolated in terms of cache space used. 

Figure 17-28 shows how the Cache Capacity Bitmasks and the per-logical-processor Class of Service are logically 
used to enable Cache Allocation Technology. All (and only) contiguous 1's in the CBM are permitted. The length of 
CBM may vary from resource to resource or between processor generations and can be enumerated using CPUID. 
From the available mask set and based on the goals of the OS/VMM (shared or isolated cache, etc.) bitmasks are 
selected and associated with different classes of service. For the available Classes of Service the associated CBMs 
can be programmed via the global set of CAT configuration registers (in the case of L3 CAT, via the 
IA32_L3_MASK_n MSRs, where ā€œnā€ is the Class of Service, starting from zero). In all architectural implementations 
supporting CPUID it is possible to change the CBMs dynamically, during program execution, unless stated other-
wise by Intel. 
The currently running application's Class of Service is communicated to the hardware through the per-logical-
processor PQR MSR (IA32_PQR_ASSOC MSR). When the OS schedules an application thread on a logical processor, 
the application thread is associated with a specific COS (i.e. the corresponding COS in the PQR) and all requests to 
the CAT-capable resource from that logical processor are tagged with that COS (in other words, the application 
thread is configured to belong to a specific COS). The cache subsystem uses this tagged request information to 
enforce QoS. The capacity bitmask may be mapped into a way bitmask (or a similar enforcement entity based on 
the implementation) at the cache before it is applied to the allocation policy. For example, the capacity bitmask can 
be an 8-bit mask and the enforcement may be accomplished using a 16-way bitmask for a cache enforcement 
implementation based on way partitioning.

Figure 17-28.  Class of Service and Cache Capacity Bitmasks

Set 1

Set 2

....

Cache Subsystem

Config

Tag with Cache

Enforcement

Set n

way 1

......

way 16

Enforce Mask

Capacity bitmask 3

COS 3

Capacity bitmask 3

COS 2

Capacity bitmask 3

COS 1

Capacity bitmask 3

COS 0

Cache Allocation

Transaction

COS

COS = 2

Mem Request

Class of Service

Application

Memory Request

Set Class of Service

Association

in IA32_PQR

OS Context

Switch

Configure CBM for 

Enum/Confg

each Class of Service

Enumerate

Enforcement