background image

Vol. 3B 18-39

PERFORMANCE MONITORING

Performance counters (fixed and general-purpose) are prioritized in index order. That is, counter IA32_PMC0 takes 
precedence over all other counters. Counter IA32_PMC1 takes precedence over counters IA32_PMC2 and 
IA32_PMC3, and so on. This means that if simultaneous overflows or PEBS assists occur, the appropriate action will 
be taken for the highest priority performance counter. For example, if IA32_PMC1 cause an overflow interrupt and 
IA32_PMC2 causes an PEBS assist simultaneously, then the overflow interrupt will be serviced first. 
The PEBS threshold interrupt is triggered by the PEBS assist, and is by definition prioritized lower than the PEBS 
assist. Hardware will not generate separate interrupts for each counter that simultaneously overflows. General-
purpose performance counters are prioritized over fixed counters.
If a counter is programmed with a precise (PEBS-enabled) event and programmed to generate a counter overflow 
interrupt, the PEBS assist is serviced before the counter overflow interrupt is serviced. If in addition the PEBS inter-
rupt threshold is met, the
threshold interrupt is generated after the PEBS assist completes, followed by the counter overflow interrupt (two 
separate interrupts are generated).
Uncore counters may be programmed to interrupt one or more processor cores (see Section 18.8.2). It is possible 
for interrupts posted from the uncore facility to occur coincident with counter overflow interrupts from the 
processor core. Software must check core and uncore status registers to determine the exact origin of counter 
overflow interrupts.

18.8.1.2   Load Latency Performance Monitoring Facility

The load latency facility provides software a means to characterize the average load latency to different levels of 
cache/memory hierarchy. This facility requires processor supporting enhanced PEBS record format in the PEBS 
buffer, see Table 18-23. This field measures the load latency from load's first dispatch of till final data writeback 
from the memory subsystem. The latency is reported for retired demand load operations and in core cycles (it 
accounts for re-dispatches).
To use this feature software must assure:

One of the IA32_PERFEVTSELx MSR is programmed to specify the event unit MEM_INST_RETIRED, and the 
LATENCY_ABOVE_THRESHOLD event mask must be specified (IA32_PerfEvtSelX[15:0] = 100H). The corre-
sponding counter IA32_PMCx will accumulate event counts for architecturally visible loads which exceed the 
programmed latency threshold specified separately in a MSR. Stores are ignored when this event is 
programmed. The CMASK or INV fields of the IA32_PerfEvtSelX register used for counting load latency must be 
0. Writing other values will result in undefined behavior. 

The MSR_PEBS_LD_LAT_THRESHOLD MSR is programmed with the desired latency threshold in core clock 
cycles. Loads with latencies greater than this value are eligible for counting and latency data reporting. The 
minimum value that may be programmed in this register is 3 (the minimum detectable load latency is 4 core 
clock cycles).

The PEBS enable bit in the IA32_PEBS_ENABLE register is set for the corresponding IA32_PMCx counter 
register. This means that both the PEBS_EN_CTRX and LL_EN_CTRX bits must be set for the counter(s) of 
interest. For example, to enable load latency on counter IA32_PMC0, the IA32_PEBS_ENABLE register must be 
programmed with the 64-bit value 00000001_00000001H.

When the load-latency facility is enabled, load operations are randomly selected by hardware and tagged to carry 
information related to data source locality and latency. Latency and data source information of tagged loads are 
updated internally. 
When a PEBS assist occurs, the last update of latency and data source information are captured by the assist and 
written as part of the PEBS record. The PEBS sample after value (SAV), specified in PEBS CounterX Reset, operates 
orthogonally to the tagging mechanism. Loads are randomly tagged to collect latency data. The SAV controls the 
number of tagged loads with latency information that will be written into the PEBS record field by the PEBS assists. 
The load latency data written to the PEBS record will be for the last tagged load operation which retired just before 
the PEBS assist was invoked.
The load-latency information written into a PEBS record (see Table 18-23, bytes AFH:98H) consists of:

Data Linear Address: This is the linear address of the target of the load operation.

Latency Value: This is the elapsed cycles of the tagged load operation between dispatch to GO, measured in 
processor core clock domain.