18-42 Vol. 3B
PERFORMANCE MONITORING
18.8.2
Performance Monitoring Facility in the Uncore
The “uncore” in Intel microarchitecture code name Nehalem refers to subsystems in the physical processor
package that are shared by multiple processor cores. Some of the sub-systems in the uncore include the L3 cache,
Intel QuickPath Interconnect link logic, and integrated memory controller. The performance monitoring facilities
inside the uncore operates in the same clock domain as the uncore (U-clock domain), which is usually different
from the processor core clock domain. The uncore performance monitoring facilities described in this section apply
to Intel Xeon processor 5500 series and processors with the following CPUID signatures: 06_1AH, 06_1EH, 06_1FH
(see Chapter 35). An overview of the uncore performance monitoring facilities is described separately.
The performance monitoring facilities available in the U-clock domain consist of:
•
Eight General-purpose counters (MSR_UNCORE_PerfCntr0 through MSR_UNCORE_PerfCntr7). The counters
are 48 bits wide. Each counter is associated with a configuration MSR, MSR_UNCORE_PerfEvtSelx, to specify
event code, event mask and other event qualification fields. A set of global uncore performance counter
enabling/overflow/status control MSRs are also provided for software.
•
Performance monitoring in the uncore provides an address/opcode match MSR that provides event qualification
control based on address value or QPI command opcode.
•
One fixed-function counter, MSR_UNCORE_FixedCntr0. The fixed-function uncore counter increments at the
rate of the U-clock when enabled.
The frequency of the uncore clock domain can be determined from the uncore clock ratio which is available in
the PCI configuration space register at offset C0H under device number 0 and Function 0.
18.8.2.1 Uncore Performance Monitoring Management Facility
MSR_UNCORE_PERF_GLOBAL_CTRL provides bit fields to enable/disable general-purpose and fixed-function coun-
ters in the uncore. Figure 18-25 shows the layout of MSR_UNCORE_PERF_GLOBAL_CTRL for an uncore that is
shared by four processor cores in a physical package.
•
EN_PCn (bit n, n = 0, 7): When set, enables counting for the general-purpose uncore counter
MSR_UNCORE_PerfCntr n.
•
EN_FC0 (bit 32): When set, enables counting for the fixed-function uncore counter MSR_UNCORE_FixedCntr0.
PF_IFETCH
6
(R/W). Counts the number of code reads generated by L2 prefetchers.
OTHER
7
(R/W). Counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes,
WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock.
UNCORE_HIT
8
(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore with no coherency actions
required (snooping).
OTHER_CORE_HI
T_SNP
9
(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by
another core with a cross core snoop where no modified copies were found (clean).
OTHER_CORE_HI
TM
10
(R/W). L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by
another core with a cross core snoop where modified copies were found (HITM).
Reserved
11
Reserved
REMOTE_CACHE_
FWD
12
(R/W). L3 Miss: local homed requests that missed the L3 cache and was serviced by forwarded data
following a cross package snoop where no modified copies found. (Remote home requests are not
counted)
REMOTE_DRAM
13
(R/W). L3 Miss: remote home requests that missed the L3 cache and were serviced by remote DRAM.
LOCAL_DRAM
14
(R/W). L3 Miss: local home requests that missed the L3 cache and were serviced by local DRAM.
NON_DRAM
15
(R/W). Non-DRAM requests that were serviced by IOH.
Table 18-26. MSR_OFFCORE_RSP_0 and MSR_OFFCORE_RSP_1 Bit Field Definition (Contd.)
Bit Name
Offset
Description