18-28 Vol. 3B
PERFORMANCE MONITORING
18.6.3
Average Offcore Request Latency Measurement
Average latency for offcore transactions can be determined by using both MSR_OFFCORE_RSP registers. Using two
performance monitoring counters, program the two OFFCORE_RESPONSE event encodings into the corresponding
IA32_PERFEVTSELx MSRs. Count the weighted cycles via MSR_OFFCORE_RSP0 by programming a request type in
MSR_OFFCORE_RSP0.[15:0] and setting MSR_OFFCORE_RSP0.OUTSTANDING[38] to 1, white setting the
remaining bits to 0. Count the number of requests via MSR_OFFCORE_RSP1 by programming the same request
type from MSR_OFFCORE_RSP0 into MSR_OFFCORE_RSP1[bit 15:0], and setting
MSR_OFFCORE_RSP1.ANY_RESPONSE[16] = 1, while setting the remaining bits to 0. The average latency can be
obtained by dividing the value of the IA32_PMCx register that counted weight cycles by the register that counted
requests.
18.7
PERFORMANCE MONITORING FOR GOLDMONT MICROARCHITECTURE
Next generation Intel Atom processors are based on the Goldmont microarchitecture. They report architectural
performance monitoring versionID = 4 (see Section 18.2.4) and support non-architectural monitoring capabilities
described in this section.
Architectural performance monitoring version 4 capabilities are described in Section 18.2.4.
The bit fields (except bit 21) within each IA32_PERFEVTSELx MSR are defined in Figure 18-6 and described in
Section 18.2.1.1 and Section 18.2.3. Architectural and non-architectural performance monitoring events in the
Goldmont microarchitecture ignore the AnyThread qualification regardless of its setting in the IA32_PERFEVTSELx
MSR.
The core PMU’s capability is similar to that of the Silvermont microarchitecture described in Section 18.6 , with
some differences and enhancements summarized in Table 18-18.
Table 18-17. MSR_OFFCORE_RSPx Snoop Info Field Definition
Subtype
Bit Name
Offset
Description
Snoop
Info
SNP_NONE
31
(R/W). No details on snoop-related information
Reserved
32
Reserved
SNOOP_MISS
33
(R/W). Counts the number of snoop misses when L2 misses
SNOOP_HIT
34
(R/W). Counts the number of snoops hit in the other module where no modified copies
were found
Reserved
35
Reserved
HITM
36
(R/W). Counts the number of snoops hit in the other module where modified copies
were found in other core's L1 cache.
NON_DRAM
37
(R/W). Target was non-DRAM system address. This includes MMIO transactions.
AVG_LATENCY
38
(R/W). Enable average latency measurement by counting weighted cycles of
outstanding offcore requests of the request type specified in bits 15:0 and any
response (bits 37:16 cleared to 0).
This bit is available in MSR_OFFCORE_RESP0. The weighted cycles is accumulated in the
specified programmable counter IA32_PMCx and the occurrence of specified requests
are counted in the other programmable counter.