background image

18-86 Vol. 3B

PERFORMANCE MONITORING

When setting up an ESCR, the event select field is used to select a specific class of events to count, such as retired 
branches. The event mask field is then used to select one or more of the specific events within the class to be 
counted. For example, when counting retired branches, four different events can be counted: branch not taken 
predicted, branch not taken mispredicted, branch taken predicted, and branch taken mispredicted. The OS and 
USR flags allow counts to be enabled for events that occur when operating system code and/or application code are 
being executed. If neither the OS nor USR flag is set, no events will be counted.
The ESCRs are initialized to all 0s on reset. The flags and fields of an ESCR are configured by writing to the ESCR 
using the WRMSR instruction. Table 18-63 gives the addresses of the ESCR MSRs. 
Writing to an ESCR MSR does not enable counting with its associated performance counter; it only selects the event 
or events to be counted. The CCCR for the selected performance counter must also be configured. Configuration of 
the CCCR includes selecting the ESCR and enabling the counter.

18.15.2 Performance 

Counters

The performance counters in conjunction with the counter configuration control registers (CCCRs) are used for 
filtering and counting the events selected by the ESCRs. Processors based on Intel NetBurst microarchitecture 
provide 18 performance counters organized into 9 pairs. A pair of performance counters is associated with a partic-
ular subset of events and ESCR’s (see Table 18-63). The counter pairs are partitioned into four groups:

The BPU group, includes two performance counter pairs:
— MSR_BPU_COUNTER0 and MSR_BPU_COUNTER1.
— MSR_BPU_COUNTER2 and MSR_BPU_COUNTER3.

The MS group, includes two performance counter pairs:
— MSR_MS_COUNTER0 and MSR_MS_COUNTER1.
— MSR_MS_COUNTER2 and MSR_MS_COUNTER3.

The FLAME group, includes two performance counter pairs:
— MSR_FLAME_COUNTER0 and MSR_FLAME_COUNTER1.
— MSR_FLAME_COUNTER2 and MSR_FLAME_COUNTER3.

The IQ group, includes three performance counter pairs:
— MSR_IQ_COUNTER0 and MSR_IQ_COUNTER1.
— MSR_IQ_COUNTER2 and MSR_IQ_COUNTER3.
— MSR_IQ_COUNTER4 and MSR_IQ_COUNTER5.

The MSR_IQ_COUNTER4 counter in the IQ group provides support for the PEBS. 
Alternate counters in each group can be cascaded: the first counter in one pair can start the first counter in the 
second pair and vice versa. A similar cascading is possible for the second counters in each pair. For example, within 
the BPU group of counters, MSR_BPU_COUNTER0 can start MSR_BPU_COUNTER2 and vice versa, and 
MSR_BPU_COUNTER1 can start MSR_BPU_COUNTER3 and vice versa (see Section 18.15.5.6, “Cascading Coun-
ters”). The cascade
 flag in the CCCR register for the performance counter enables the cascading of counters.
Each performance counter is 40-bits wide (see Figure 18-44). The RDPMC instruction is intended to allow reading 
of either the full counter-width (40-bits) or the low 32-bits of the counter. Reading the low 32-bits is faster than 
reading the full counter width and is appropriate in situations where the count is small enough to be contained in 
32 bits.
The RDPMC instruction can be used by programs or procedures running at any privilege level and in virtual-8086 
mode to read these counters. The PCE flag in control register CR4 (bit 8) allows the use of this instruction to be 
restricted to only programs and procedures running at privilege level 0.