18-102 Vol. 3B
PERFORMANCE MONITORING
•
Cascade flag, bit 30 — When set, enables counting on one counter of a counter pair when its alternate
counter in the other the counter pair in the same counter group overflows (see Section 18.15.2, “Performance
Counters,” for further details); when clear, disables cascading of counters.
•
OVF flag, bit 31 — Indicates that the counter has overflowed when set. This flag is a sticky flag that must be
explicitly cleared by software.
18.16.3 IA32_PEBS_ENABLE
MSR
In a processor supporting Intel Hyper-Threading Technology and based on the Intel NetBurst microarchitecture,
PEBS is enabled and qualified with two bits in the MSR_PEBS_ENABLE MSR: bit 25 (ENABLE_PEBS_MY_THR) and
26 (ENABLE_PEBS_OTH_THR) respectively. These bits do not explicitly identify a specific logical processor by logic
processor ID(T0 or T1); instead, they allow a software agent to enable PEBS for subsequent threads of execution
on the same logical processor on which the agent is running (“my thread”) or for the other logical processor in the
physical package on which the agent is not running (“other thread”).
PEBS is supported for only a subset of the at-retirement events: Execution_event, Front_end_event, and
Replay_event. Also, PEBS can be carried out only with two performance counters: MSR_IQ_CCCR4 (MSR address
370H) for logical processor 0 and MSR_IQ_CCCR5 (MSR address 371H) for logical processor 1.
Performance monitoring tools should use a processor affinity mask to bind the kernel mode components that need
to modify the ENABLE_PEBS_MY_THR and ENABLE_PEBS_OTH_THR bits in the MSR_PEBS_ENABLE MSR to a
specific logical processor. This is to prevent these kernel mode components from migrating between different
logical processors due to OS scheduling.
18.16.4 Performance Monitoring Events
All of the events listed in Table 19-28 and 19-29 are available in an Intel Xeon processor MP. When Intel Hyper-
Threading Technology is active, many performance monitoring events can be can be qualified by the logical
processor ID, which corresponds to bit 0 of the initial APIC ID. This allows for counting an event in any or all of the
logical processors. However, not all the events have this logic processor specificity, or thread specificity.
Here, each event falls into one of two categories:
•
Thread specific (TS) — The event can be qualified as occurring on a specific logical processor.
•
Thread independent (TI) — The event cannot be qualified as being associated with a specific logical
processor.
Table 19-34 gives logical processor specific information (TS or TI) for each of the events described in Tables 19-28
and 19-29. If for example, a TS event occurred in logical processor T0, the counting of the event (as shown in Table
18-66) depends only on the setting of the T0_USR and T0_OS flags in the ESCR being used to set up the event
counter. The T1_USR and T1_OS flags have no effect on the count.
Table 18-66. Effect of Logical Processor and CPL Qualification
for Logical-Processor-Specific (TS) Events
T1_OS/T1_USR = 00
T1_OS/T1_USR = 01
T1_OS/T1_USR = 11
T1_OS/T1_USR = 10
T0_OS/T0_USR = 00
Zero count
Counts while T1 in USR
Counts while T1 in OS or
USR
Counts while T1 in OS
T0_OS/T0_USR = 01
Counts while T0 in USR
Counts while T0 in USR
or T1 in USR
Counts while (a) T0 in
USR or (b) T1 in OS or (c)
T1 in USR
Counts while (a) T0 in OS
or (b) T1 in OS
T0_OS/T0_USR = 11
Counts while T0 in OS or
USR
Counts while (a) T0 in OS
or (b) T0 in USR or (c) T1
in USR
Counts irrespective of
CPL, T0, T1
Counts while (a) T0 in OS
or (b) or T0 in USR or (c)
T1 in OS
T0_OS/T0_USR = 10
Counts T0 in OS
Counts T0 in OS or T1 in
USR
Counts while (a)T0 in Os
or (b) T1 in OS or (c) T1
in USR
Counts while (a) T0 in OS
or (b) T1 in OS