background image

18-32 Vol. 3B

PERFORMANCE MONITORING

With PEBS record format encoding 0011b, offset 90H reports the “applicable counter” field, which is a multi-
counter PEBS resolution index allowing software to correlate the PEBS record entry with the eventing PEBS over-
flow when multiple counters are configured to record PEBS records. Additionally, offset C0H captures a snapshot of 
the TSC that provides a time line annotation for each PEBS record entry.

18.7.1.1   PEBS Data Linear Address Profiling

Goldmont supports the Data Linear Address field introduced in Haswell. It does not support the Data Source 
Encoding or Latency Value fields that are also part of Data Address Profiling. The fields are present in the record but 
are reserved. 
For Goldmont, the Data Linear Address field will record the linear address of memory accesses in the previous 
instruction (e.g. the one that triggered a precise event that caused the PEBS record to be generated).

18.7.1.2   Reduced Skid PEBS

For precise events, upon triggering a PEBS assist, there will be a finite delay between the time the counter over-
flows and when the microcode starts to carry out its data collection obligations. The Reduced Skid mechanism miti-
gates the “skid” problem by providing an early indication of when the counter is about to overflow, allowing the 
machine to more precisely trap on the instruction that actually caused the counter overflow thus greatly reducing 
skid.
This mechanism is a superset of the PDIR mechanism available in the Sandy Bridge microarchitecture. See Section 
18.9.4.4
In the Goldmont microarchitecture, the mechanism applies to all precise events including INST_RETIRED. 
However, the Reduced Skid mechanism is disabled for any counter when the INV, ANY, E, or CMASK fields are set.
To ensure the Reduced Skid mechanism operates correctly, disable PEBS via the IA32_PEBS_ENABLE or 
IA32_PERF_GLOBAL_CTRL MSRs before writing to the configuration registers (IA32_PERFEVTSELx) or to the coun-
ters (IA32_PMCx and IA32_A_PMCx).

18.7.1.3   Enhancements to IA32_PERF_GLOBAL_STATUS.OvfDSBuffer[62] 

In addition to IA32_PERF_GLOBAL_STATUS.OvfDSBuffer[62] being set when PEBS_Index reaches the 
PEBS_Interrupt_Theshold, the bit is also set when PEBS_Index is out of bounds. That is, the bit will be set when 
PEBS_Index < PEBS_Buffer_Base or PEBS_Index > PEBS_Absolute_Maximum. Note that when an out of bound 
condition is encountered, the overflow bits in IA32_PERF_GLOBAL_STATUS will be cleared according to Applicable 
Counters, however the IA32_PMCx values will not be reloaded with the Reset values stored in the DS_AREA.

18.7.2 

Offcore Response Event

Event number 0B7H support offcore response monitoring using an associated configuration MSR, 
MSR_OFFCORE_RSP0 (address 1A6H) in conjunction with umask value 01H or MSR_OFFCORE_RSP1 (address 
1A7H) in conjunction with umask value 02H. Table 18-14 lists the event code, mask value and additional off-core 
configuration MSR that must be programmed to count off-core response events using IA32_PMCx. 
The Goldmont microarchitecture provides unique pairs of MSR_OFFCORE_RSPx registers per core.
The layout of MSR_OFFCORE_RSP0 and MSR_OFFCORE_RSP1 are organized as follows:

Bits 15:0 specifies the request type of a transaction request to the uncore. This is described in Table 18-21.

Bits 30:16 specifies common supplier information or an L2 Hit, and is described in Table 18-16

If L2 misses, then Bits 37:31 can be used to specify snoop response information and is described in 
Table 18-22. 

For outstanding requests, bit 38 can enable measurement of average latency of specific type of offcore 
transaction requests using two programmable counter simultaneously; see Section 18.6.3 for details.