background image

18-94 Vol. 3B

PERFORMANCE MONITORING

18.15.5.7   EXTENDED CASCADING 

Extended cascading is a model-specific feature in the Intel NetBurst microarchitecture with CPUID 
DisplayFamily_DisplayModel 0F_02, 0F_03, 0F_04, 0F_06. This feature uses bit 11 in CCCRs associated with the IQ 
block. See Table 18-65. 

The extended cascading feature can be adapted to the Interrupt based sampling usage model for performance 
monitoring. However, it is known that performance counters do not generate PMI in cascade mode or extended 
cascade mode due to an erratum. This erratum applies to processors with CPUID DisplayFamily_DisplayModel 
signature of 0F_02. For processors with CPUID DisplayFamily_DisplayModel signature of 0F_00 and 0F_01, the 
erratum applies to processors with stepping encoding greater than 09H. 
Counters 16 and 17 in the IQ block are frequently used in processor event-based sampling or at-retirement 
counting of events indicating a stalled condition in the pipeline. Neither counter 16 or 17 can initiate the cascading 
of counter pairs using the cascade bit in a CCCR.
Extended cascading permits performance monitoring tools to use counters 16 and 17 to initiate cascading of two 
counters in the IQ block. Extended cascading from counter 16 and 17 is conceptually similar to cascading other 
counters, but instead of using CASCADE bit of a CCCR, one of the four CASCNTxINTOy bits is used. 

Example 18-2.  Scenario for Extended Cascading

A usage scenario for extended cascading is to sample instructions retired on logical processor 1 after the first 4096 
instructions retired on logical processor 0. A procedure to program extended cascading in this scenario is outlined 
below:
1. Write the value 0 to counter 12. 
2. Write the value 04000603H to MSR_CRU_ESCR0 (corresponding to selecting the NBOGNTAG and NBOGTAG 

event masks with qualification restricted to logical processor 1).

3. Write the value 04038800H to MSR_IQ_CCCR0. This enables CASCNT4INTO0 and OVF_PMI. An ISR can sample 

on instruction addresses in this case (do not set ENABLE, or CASCADE).

4. Write the value FFFFF000H into counter 16.1.
5. Write the value 0400060CH to MSR_CRU_ESCR2 (corresponding to selecting the NBOGNTAG and NBOGTAG 

event masks

 

with qualification restricted to logical processor 0).

6. Write the value 00039000H to MSR_IQ_CCCR4 (set ENABLE bit, but not OVF_PMI).
Another use for cascading is to locate stalled execution in a multithreaded application. Assume MOB replays in 
thread B cause thread A to stall. Getting a sample of the stalled execution in this scenario could be accomplished 
by:
1. Set up counter B to count MOB replays on thread B.
2. Set up counter A to count resource stalls on thread A; set its force overflow bit and the appropriate CASCNTx-

INTOy bit.

3. Use the performance monitoring interrupt to capture the program execution data of the stalled thread.

Table 18-65.  CCR Names and Bit Positions 

CCCR Name:Bit Position

Bit Name

Description

MSR_IQ_CCCR1|2:11

Reserved

MSR_IQ_CCCR0:11

CASCNT4INTO0

Allow counter 4 to cascade into counter 0

MSR_IQ_CCCR3:11

CASCNT5INTO3

Allow counter 5 to cascade into counter 3

MSR_IQ_CCCR4:11

CASCNT5INTO4

Allow counter 5 to cascade into counter 4

MSR_IQ_CCCR5:11

CASCNT4INTO5

Allow counter 4 to cascade into counter 5