Vol. 3B 18-35
PERFORMANCE MONITORING
18.8.1
Enhancements of Performance Monitoring in the Processor Core
The notable enhancements in the monitoring of performance events in the processor core include:
•
Four general purpose performance counters, IA32_PMCx, associated counter configuration MSRs,
IA32_PERFEVTSELx, and global counter control MSR supporting simplified control of four counters. Each of the
four performance counter can support processor event based sampling (PEBS) and thread-qualification of
architectural and non-architectural performance events. Width of IA32_PMCx supported by hardware has been
increased. The width of counter reported by CPUID.0AH:EAX[23:16] is 48 bits. The PEBS facility in Intel micro-
architecture code name Nehalem has been enhanced to include new data format to capture additional infor-
mation, such as load latency.
•
Load latency sampling facility. Average latency of memory load operation can be sampled using load-latency
facility in processors based on Intel microarchitecture code name Nehalem. This field measures the load
latency from load's first dispatch of till final data writeback from the memory subsystem. The latency is
reported for retired demand load operations and in core cycles (it accounts for re-dispatches). This facility is
used in conjunction with the PEBS facility.
•
Off-core response counting facility. This facility in the processor core allows software to count certain
transaction responses between the processor core to sub-systems outside the processor core (uncore).
Counting off-core response requires additional event qualification configuration facility in conjunction with
IA32_PERFEVTSELx. Two off-core response MSRs are provided to use in conjunction with specific event codes
that must be specified with IA32_PERFEVTSELx.
18.8.1.1 Processor Event Based Sampling (PEBS)
All four general-purpose performance counters, IA32_PMCx, can be used for PEBS if the performance event
supports PEBS. Software uses IA32_MISC_ENABLE[7] and IA32_MISC_ENABLE[12] to detect whether the perfor-
mance monitoring facility and PEBS functionality are supported in the processor. The MSR IA32_PEBS_ENABLE
provides 4 bits that software must use to enable which IA32_PMCx overflow condition will cause the PEBS record
to be captured.
Additionally, the PEBS record is expanded to allow latency information to be captured. The MSR
IA32_PEBS_ENABLE provides 4 additional bits that software must use to enable latency data recording in the PEBS
record upon the respective IA32_PMCx overflow condition. The layout of IA32_PEBS_ENABLE for processors based
on Intel microarchitecture code name Nehalem is shown in Figure 18-21.
Figure 18-20. IA32_PERF_GLOBAL_STATUS MSR
CHG (R/W)
OVF_PMI (R/W)
8 7
0
32
3
1
Reserved
63
2
4
31
5
6
62
60
61
OVF_PC7 (R/O), if CCNT>7
OVF_PC6 (R/O), if CCNT>6
OVF_PC5 (R/O), if CCNT>5
OVF_PC4 (R/O), if CCNT>4
OVF_PC3 (R/O)
OVF_PC2 (R/O)
OVF_PC1 (R/O)
OVF_PC0 (R/O)
RESET Value — 00000000_00000000H
OVF_FC2 (R/O)
OVF_FC1 (R/O)
353433
OVF_FC0 (R/O)
CCNT: CPUID.AH:EAX[15:8]