background image

Vol. 3B 18-57

PERFORMANCE MONITORING

Complete the PEBS configuration steps.

Program the MEM_TRANS_RETIRED.PRECISE_STORE event in IA32_PERFEVTSEL3. Only counter 3 
(IA32_PMC3) supports collection of precise store information. 

Set IA32_PEBS_ENABLE[3] and IA32_PEBS_ENABLE[63]. This enables IA32_PMC3 as a PEBS counter and 
enables the precise store facility, respectively.

The precise store information written into a PEBS record affects entries at offset 98H, A0H and A8H of Table 18-23. 
The specificity of Data Source entry at offset A0H has been enhanced to report three piece of information. 

18.9.4.4   Precise Distribution of Instructions Retired (PDIR) 

Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the 
microcode starts to carry out its data collection obligations. INST_RETIRED is a very common event that is used to 
sample where performance bottleneck happened and to help identify its location in instruction address space. Even 
if the delay is constant in core clock space, it invariably manifest as variable “skids” in instruction address space. 
This creates a challenge for programmers to profile a workload and pinpoint the location of bottlenecks.
The core PMU in processors based on Intel microarchitecture code name Sandy Bridge include a facility referred to 
as precise distribution of Instruction Retired (PDIR). 
The PDIR facility mitigates the “skid” problem by providing an early indication of when the INST_RETIRED counter 
is about to overflow, allowing the machine to more precisely trap on the instruction that actually caused the 
counter overflow thus eliminating skid.
PDIR applies only to the INST_RETIRED.ALL precise event, and must use IA32_PMC1 with PerfEvtSel1 property 
configured and bit 1 in the IA32_PEBS_ENABLE set to 1. INST_RETIRED.ALL is a non-architectural performance 
event, it is not supported in prior generation microarchitectures. Additionally, on processors with CPUID 
DisplayFamily_DisplayModel signatures of 06_2A and 06_2D, the tool that programs PDIR should quiesce the rest 
of the programmable counters in the core when PDIR is active. 

18.9.5 

Off-core Response Performance Monitoring 

The core PMU in processors based on Intel microarchitecture code name Sandy Bridge provides off-core response 
facility similar to prior generation. Off-core response can be programmed only with a specific pair of event select 
and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attri-
butes of the off-core transaction. Two event codes are dedicated for off-core response event programming. Each 
event code for off-core response monitoring requires programming an associated configuration MSR, 
MSR_OFFCORE_RSP_x. Table 18-35 lists the event code, mask value and additional off-core configuration MSR 
that must be programmed to count off-core response events using IA32_PMCx. 

Table 18-34.  Layout of Precise Store Information In PEBS Record

Field Offset 

Description

Store Data 

Linear Address

98H

The linear address of the destination of the store.

Store Status

A0H

L1D Hit (Bit 0): The store hit the data cache closest to the core (lowest latency cache) if this bit is set, 

otherwise the store missed the data cache.
STLB Miss (bit 4): The store missed the STLB if set, otherwise the store hit the STLB
Locked Access (bit 5): The store was part of a locked access if set, otherwise the store was not part of a 

locked access.

Reserved

A8H

Reserved