background image

18-102 Vol. 3B

PERFORMANCE MONITORING

Cascade flag, bit 30 — When set, enables counting on one counter of a counter pair when its alternate 
counter in the other the counter pair in the same counter group overflows (see Section 18.15.2, “Performance 
Counters,” 
for further details); when clear, disables cascading of counters.

OVF flag, bit 31 — Indicates that the counter has overflowed when set. This flag is a sticky flag that must be 
explicitly cleared by software.

18.16.3 IA32_PEBS_ENABLE 

MSR

In a processor supporting Intel Hyper-Threading Technology and based on the Intel NetBurst microarchitecture, 
PEBS is enabled and qualified with two bits in the MSR_PEBS_ENABLE MSR: bit 25 (ENABLE_PEBS_MY_THR) and 
26 (ENABLE_PEBS_OTH_THR) respectively. These bits do not explicitly identify a specific logical processor by logic 
processor ID(T0 or T1); instead, they allow a software agent to enable PEBS for subsequent threads of execution 
on the same logical processor on which the agent is running (“my thread”) or for the other logical processor in the 
physical package on which the agent is not running (“other thread”).
PEBS is supported for only a subset of the at-retirement events: Execution_event, Front_end_event, and 
Replay_event. Also, PEBS can be carried out only with two performance counters: MSR_IQ_CCCR4 (MSR address 
370H) for logical processor 0 and MSR_IQ_CCCR5 (MSR address 371H) for logical processor 1.
Performance monitoring tools should use a processor affinity mask to bind the kernel mode components that need 
to modify the ENABLE_PEBS_MY_THR and ENABLE_PEBS_OTH_THR bits in the MSR_PEBS_ENABLE MSR to a 
specific logical processor. This is to prevent these kernel mode components from migrating between different 
logical processors due to OS scheduling.   

18.16.4  Performance Monitoring Events

All of the events listed in Table 19-28 and 19-29 are available in an Intel Xeon processor MP. When Intel Hyper-
Threading Technology is active, many performance monitoring events can be can be qualified by the logical 
processor ID, which corresponds to bit 0 of the initial APIC ID. This allows for counting an event in any or all of the 
logical processors. However, not all the events have this logic processor specificity, or thread specificity. 
Here, each event falls into one of two categories: 

Thread specific (TS) — The event can be qualified as occurring on a specific logical processor.

Thread independent (TI) — The event cannot be qualified as being associated with a specific logical 
processor. 

Table 19-34 gives logical processor specific information (TS or TI) for each of the events described in Tables 19-28 
and 19-29. If for example, a TS event occurred in logical processor T0, the counting of the event (as shown in Table 
18-66
) depends only on the setting of the T0_USR and T0_OS flags in the ESCR being used to set up the event 
counter. The T1_USR and T1_OS flags have no effect on the count.

Table 18-66.  Effect of Logical Processor and CPL Qualification 

for Logical-Processor-Specific (TS) Events

T1_OS/T1_USR = 00

T1_OS/T1_USR = 01

T1_OS/T1_USR = 11

T1_OS/T1_USR = 10

T0_OS/T0_USR = 00

Zero count

Counts while T1 in USR

Counts while T1 in OS or 

USR

Counts while T1 in OS

T0_OS/T0_USR = 01

Counts while T0 in USR

Counts while T0 in USR 

or T1 in USR

Counts while (a) T0 in 

USR or (b) T1 in OS or (c) 

T1 in USR

Counts while (a) T0 in OS 

or (b) T1 in OS

T0_OS/T0_USR = 11

Counts while T0 in OS or 

USR

Counts while (a) T0 in OS 

or (b) T0 in USR or (c) T1 

in USR

Counts irrespective of 

CPL, T0, T1

Counts while (a) T0 in OS 

or (b) or T0 in USR or (c) 

T1 in OS

T0_OS/T0_USR = 10

Counts T0 in OS

Counts T0 in OS or T1 in 

USR

Counts while (a)T0 in Os 

or (b) T1 in OS or (c) T1 

in USR

Counts while (a) T0 in OS 

or (b) T1 in OS