background image

18-104 Vol. 3B

PERFORMANCE MONITORING

The first two methods use performance counters and can be set up to cause an interrupt upon overflow (for 
sampling). They may also be useful where it is easier for a tool to read a performance counter than to use a time 
stamp counter (the timestamp counter is accessed using the RDTSC instruction). 
For applications with a significant amount of I/O, there are two ratios of interest:

Non-halted CPI — Non-halted clockticks/instructions retired measures the CPI for phases where the CPU was 
being used. This ratio can be measured on a logical-processor basis when Intel Hyper-Threading Technology is 
enabled.

Nominal CPI — Time-stamp counter ticks/instructions retired measures the CPI over the duration of a 
program, including those periods when the machine halts while waiting for I/O.

18.17.1 Non-Halted 

Clockticks

Use the following procedure to program ESCRs and CCCRs to obtain non-halted clockticks on processors based on 
Intel NetBurst microarchitecture: 
1. Select an ESCR for the global_power_events and specify the RUNNING sub-event mask and the desired 

T0_OS/T0_USR/T1_OS/T1_USR bits for the targeted processor.

2. Select an appropriate counter.
3. Enable counting in the CCCR for that counter by setting the enable bit.

18.17.2 Non-Sleep 

Clockticks

Performance monitoring counters can be configured to count clockticks whenever the performance monitoring 
hardware is not powered-down. To count Non-sleep Clockticks with a performance-monitoring counter, do the 
following:
1. Select one of the 18 counters.
2. Select any of the ESCRs whose events the selected counter can count. Set its event select to anything other 

than no_event. This may not seem necessary, but the counter may be disabled if this is not done.

3. Turn threshold comparison on in the CCCR by setting the compare bit to 1.
4. Set the threshold to 15 and the complement to 1 in the CCCR. Since no event can exceed this threshold, the 

threshold condition is met every cycle and the counter counts every cycle. Note that this overrides any qualifi-
cation (e.g. by CPL) specified in the ESCR.

5. Enable counting in the CCCR for the counter by setting the enable bit.
In most cases, the counts produced by the non-halted and non-sleep metrics are equivalent if the physical package 
supports one logical processor and is not placed in a power-saving state. Operating systems may execute an HLT 
instruction and place a physical processor in a power-saving state.
On processors that support Intel Hyper-Threading Technology (Intel HT Technology), each physical package can 
support two or more logical processors. Current implementation of Intel HT Technology provides two logical proces-
sors for each physical processor. While both logical processors can execute two threads simultaneously, one logical 
processor may halt to allow the other logical processor to execute without sharing execution resources between 
two logical processors. 
Non-halted Clockticks can be set up to count the number of processor clock cycles for each logical processor when-
ever the logical processor is not halted (the count may include some portion of the clock cycles for that logical 
processor to complete a transition to a halted state). Physical processors that support Intel HT Technology enter 
into a power-saving state if all logical processors halt.
The Non-sleep Clockticks mechanism uses a filtering mechanism in CCCRs. The mechanism will continue to incre-
ment as long as one logical processor is not halted or in a power-saving state. Applications may cause a processor 
to enter into a power-saving state by using an OS service that transfers control to an OS’s idle loop. The idle loop 
then may place the processor into a power-saving state after an implementation-dependent period if there is no 
work for the processor.