18-32 Vol. 3B
PERFORMANCE MONITORING
With PEBS record format encoding 0011b, offset 90H reports the “applicable counter” field, which is a multi-
counter PEBS resolution index allowing software to correlate the PEBS record entry with the eventing PEBS over-
flow when multiple counters are configured to record PEBS records. Additionally, offset C0H captures a snapshot of
the TSC that provides a time line annotation for each PEBS record entry.
18.7.1.1 PEBS Data Linear Address Profiling
Goldmont supports the Data Linear Address field introduced in Haswell. It does not support the Data Source
Encoding or Latency Value fields that are also part of Data Address Profiling. The fields are present in the record but
are reserved.
For Goldmont, the Data Linear Address field will record the linear address of memory accesses in the previous
instruction (e.g. the one that triggered a precise event that caused the PEBS record to be generated).
18.7.1.2 Reduced Skid PEBS
For precise events, upon triggering a PEBS assist, there will be a finite delay between the time the counter over-
flows and when the microcode starts to carry out its data collection obligations. The Reduced Skid mechanism miti-
gates the “skid” problem by providing an early indication of when the counter is about to overflow, allowing the
machine to more precisely trap on the instruction that actually caused the counter overflow thus greatly reducing
skid.
This mechanism is a superset of the PDIR mechanism available in the Sandy Bridge microarchitecture. See Section
18.9.4.4
In the Goldmont microarchitecture, the mechanism applies to all precise events including INST_RETIRED.
However, the Reduced Skid mechanism is disabled for any counter when the INV, ANY, E, or CMASK fields are set.
To ensure the Reduced Skid mechanism operates correctly, disable PEBS via the IA32_PEBS_ENABLE or
IA32_PERF_GLOBAL_CTRL MSRs before writing to the configuration registers (IA32_PERFEVTSELx) or to the coun-
ters (IA32_PMCx and IA32_A_PMCx).
18.7.1.3 Enhancements to IA32_PERF_GLOBAL_STATUS.OvfDSBuffer[62]
In addition to IA32_PERF_GLOBAL_STATUS.OvfDSBuffer[62] being set when PEBS_Index reaches the
PEBS_Interrupt_Theshold, the bit is also set when PEBS_Index is out of bounds. That is, the bit will be set when
PEBS_Index < PEBS_Buffer_Base or PEBS_Index > PEBS_Absolute_Maximum. Note that when an out of bound
condition is encountered, the overflow bits in IA32_PERF_GLOBAL_STATUS will be cleared according to Applicable
Counters, however the IA32_PMCx values will not be reloaded with the Reset values stored in the DS_AREA.
18.7.2
Offcore Response Event
Event number 0B7H support offcore response monitoring using an associated configuration MSR,
MSR_OFFCORE_RSP0 (address 1A6H) in conjunction with umask value 01H or MSR_OFFCORE_RSP1 (address
1A7H) in conjunction with umask value 02H. Table 18-14 lists the event code, mask value and additional off-core
configuration MSR that must be programmed to count off-core response events using IA32_PMCx.
The Goldmont microarchitecture provides unique pairs of MSR_OFFCORE_RSPx registers per core.
The layout of MSR_OFFCORE_RSP0 and MSR_OFFCORE_RSP1 are organized as follows:
•
Bits 15:0 specifies the request type of a transaction request to the uncore. This is described in Table 18-21.
•
Bits 30:16 specifies common supplier information or an L2 Hit, and is described in Table 18-16.
•
If L2 misses, then Bits 37:31 can be used to specify snoop response information and is described in
Table 18-22.
•
For outstanding requests, bit 38 can enable measurement of average latency of specific type of offcore
transaction requests using two programmable counter simultaneously; see Section 18.6.3 for details.