Vol. 3B 18-57
PERFORMANCE MONITORING
•
Complete the PEBS configuration steps.
•
Program the MEM_TRANS_RETIRED.PRECISE_STORE event in IA32_PERFEVTSEL3. Only counter 3
(IA32_PMC3) supports collection of precise store information.
•
Set IA32_PEBS_ENABLE[3] and IA32_PEBS_ENABLE[63]. This enables IA32_PMC3 as a PEBS counter and
enables the precise store facility, respectively.
The precise store information written into a PEBS record affects entries at offset 98H, A0H and A8H of Table 18-23.
The specificity of Data Source entry at offset A0H has been enhanced to report three piece of information.
18.9.4.4 Precise Distribution of Instructions Retired (PDIR)
Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the
microcode starts to carry out its data collection obligations. INST_RETIRED is a very common event that is used to
sample where performance bottleneck happened and to help identify its location in instruction address space. Even
if the delay is constant in core clock space, it invariably manifest as variable “skids” in instruction address space.
This creates a challenge for programmers to profile a workload and pinpoint the location of bottlenecks.
The core PMU in processors based on Intel microarchitecture code name Sandy Bridge include a facility referred to
as precise distribution of Instruction Retired (PDIR).
The PDIR facility mitigates the “skid” problem by providing an early indication of when the INST_RETIRED counter
is about to overflow, allowing the machine to more precisely trap on the instruction that actually caused the
counter overflow thus eliminating skid.
PDIR applies only to the INST_RETIRED.ALL precise event, and must use IA32_PMC1 with PerfEvtSel1 property
configured and bit 1 in the IA32_PEBS_ENABLE set to 1. INST_RETIRED.ALL is a non-architectural performance
event, it is not supported in prior generation microarchitectures. Additionally, on processors with CPUID
DisplayFamily_DisplayModel signatures of 06_2A and 06_2D, the tool that programs PDIR should quiesce the rest
of the programmable counters in the core when PDIR is active.
18.9.5
Off-core Response Performance Monitoring
The core PMU in processors based on Intel microarchitecture code name Sandy Bridge provides off-core response
facility similar to prior generation. Off-core response can be programmed only with a specific pair of event select
and counter MSR, and with specific event codes and predefine mask bit value in a dedicated MSR to specify attri-
butes of the off-core transaction. Two event codes are dedicated for off-core response event programming. Each
event code for off-core response monitoring requires programming an associated configuration MSR,
MSR_OFFCORE_RSP_x. Table 18-35 lists the event code, mask value and additional off-core configuration MSR
that must be programmed to count off-core response events using IA32_PMCx.
Table 18-34. Layout of Precise Store Information In PEBS Record
Field Offset
Description
Store Data
Linear Address
98H
The linear address of the destination of the store.
Store Status
A0H
L1D Hit (Bit 0): The store hit the data cache closest to the core (lowest latency cache) if this bit is set,
otherwise the store missed the data cache.
STLB Miss (bit 4): The store missed the STLB if set, otherwise the store hit the STLB
Locked Access (bit 5): The store was part of a locked access if set, otherwise the store was not part of a
locked access.
Reserved
A8H
Reserved