background image

18-42 Vol. 3B

PERFORMANCE MONITORING

18.8.2 

Performance Monitoring Facility in the Uncore

The “uncore” in Intel microarchitecture code name Nehalem refers to subsystems in the physical processor 
package that are shared by multiple processor cores. Some of the sub-systems in the uncore include the L3 cache, 
Intel QuickPath Interconnect link logic, and integrated memory controller. The performance monitoring facilities 
inside the uncore operates in the same clock domain as the uncore (U-clock domain), which is usually different 
from the processor core clock domain. The uncore performance monitoring facilities described in this section apply 
to Intel Xeon processor 5500 series and processors with the following CPUID signatures: 06_1AH, 06_1EH, 06_1FH 
(see Chapter 35). An overview of the uncore performance monitoring facilities is described separately. 
The performance monitoring facilities available in the U-clock domain consist of:

Eight General-purpose counters (MSR_UNCORE_PerfCntr0 through MSR_UNCORE_PerfCntr7). The counters 
are 48 bits wide. Each counter is associated with a configuration MSR, MSR_UNCORE_PerfEvtSelx, to specify 
event code, event mask and other event qualification fields. A set of global uncore performance counter 
enabling/overflow/status control MSRs are also provided for software.

Performance monitoring in the uncore provides an address/opcode match MSR that provides event qualification 
control based on address value or QPI command opcode.

One fixed-function counter, MSR_UNCORE_FixedCntr0. The fixed-function uncore counter increments at the 
rate of the U-clock when enabled.
The frequency of the uncore clock domain can be determined from the uncore clock ratio which is available in 
the PCI configuration space register at offset C0H under device number 0 and Function 0. 

18.8.2.1   Uncore Performance Monitoring Management Facility

MSR_UNCORE_PERF_GLOBAL_CTRL provides bit fields to enable/disable general-purpose and fixed-function coun-
ters in the uncore. Figure 18-25 shows the layout of MSR_UNCORE_PERF_GLOBAL_CTRL for an uncore that is 
shared by four processor cores in a physical package. 

EN_PCn (bit n, n = 0, 7): When set, enables counting for the general-purpose uncore counter 
MSR_UNCORE_PerfCntr n.

EN_FC0 (bit 32): When set, enables counting for the fixed-function uncore counter MSR_UNCORE_FixedCntr0.

PF_IFETCH

6

(R/W). Counts the number of code reads generated by L2 prefetchers.

OTHER

7

(R/W). Counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, 

WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock.

UNCORE_HIT

8

(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore with no coherency actions 

required (snooping).

OTHER_CORE_HI

T_SNP

9

(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by 

another core with a cross core snoop where no modified copies were found (clean).

OTHER_CORE_HI

TM

10

(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by 

another core with a cross core snoop where modified copies were found (HITM).

Reserved

11

Reserved

REMOTE_CACHE_

FWD

12

(R/W). L3 Miss: local homed requests that missed the L3 cache and was serviced by forwarded data 

following a cross package snoop where no modified copies found. (Remote home requests are not 

counted)

REMOTE_DRAM

13

(R/W). L3 Miss: remote home requests that missed the L3 cache and were serviced by remote DRAM.

LOCAL_DRAM

14

(R/W). L3 Miss: local home requests that missed the L3 cache and were serviced by local DRAM.

NON_DRAM

15

(R/W). Non-DRAM requests that were serviced by IOH.

Table 18-26.  MSR_OFFCORE_RSP_0 and MSR_OFFCORE_RSP_1 Bit Field Definition (Contd.)

Bit Name

Offset

Description