background image

18-28 Vol. 3B

PERFORMANCE MONITORING

18.6.3 

Average Offcore Request Latency Measurement

Average latency for offcore transactions can be determined by using both MSR_OFFCORE_RSP registers. Using two 
performance monitoring counters, program the two OFFCORE_RESPONSE event encodings into the corresponding 
IA32_PERFEVTSELx MSRs. Count the weighted cycles via MSR_OFFCORE_RSP0 by programming a request type in 
MSR_OFFCORE_RSP0.[15:0] and setting MSR_OFFCORE_RSP0.OUTSTANDING[38] to 1, white setting the 
remaining bits to 0. Count the number of requests via MSR_OFFCORE_RSP1 by programming the same request 
type from MSR_OFFCORE_RSP0 into MSR_OFFCORE_RSP1[bit 15:0], and setting 
MSR_OFFCORE_RSP1.ANY_RESPONSE[16] = 1, while setting the remaining bits to 0. The average latency can be 
obtained by dividing the value of the IA32_PMCx register that counted weight cycles by the register that counted 
requests.

18.7 

PERFORMANCE MONITORING FOR GOLDMONT MICROARCHITECTURE

Next generation Intel Atom processors are based on the Goldmont microarchitecture. They report architectural 
performance monitoring versionID = 4 (see Section 18.2.4) and support non-architectural monitoring capabilities 
described in this section.
Architectural performance monitoring version 4 capabilities are described in Section 18.2.4.
The bit fields (except bit 21) within each IA32_PERFEVTSELx MSR are defined in Figure 18-6 and described in 
Section 18.2.1.1 and Section 18.2.3. Architectural and non-architectural performance monitoring events in the 
Goldmont microarchitecture ignore the AnyThread qualification regardless of its setting in the IA32_PERFEVTSELx 
MSR. 
The core PMU’s capability is similar to that of the Silvermont microarchitecture described in Section 18.6 , with 
some differences and enhancements summarized in Table 18-18.

Table 18-17.  MSR_OFFCORE_RSPx Snoop Info Field Definition

Subtype

Bit Name

Offset

Description

Snoop 

Info

SNP_NONE

31

(R/W). No details on snoop-related information

Reserved

32

Reserved

SNOOP_MISS

33

(R/W). Counts the number of snoop misses when L2 misses

SNOOP_HIT

34

(R/W). Counts the number of snoops hit in the other module where no modified copies 

were found

Reserved

35

Reserved

HITM

36

(R/W). Counts the number of snoops hit in the other module where modified copies 

were found in other core's L1 cache.

NON_DRAM

37

(R/W). Target was non-DRAM system address. This includes MMIO transactions.

AVG_LATENCY

38

(R/W). Enable average latency measurement by counting weighted cycles of 

outstanding offcore requests of the request type specified in bits 15:0 and any 

response (bits 37:16 cleared to 0). 
This bit is available in MSR_OFFCORE_RESP0. The weighted cycles is accumulated in the 

specified programmable counter IA32_PMCx and the occurrence of specified requests 

are counted in the other programmable counter.