18-94 Vol. 3B
PERFORMANCE MONITORING
18.15.5.7 EXTENDED CASCADING
Extended cascading is a model-specific feature in the Intel NetBurst microarchitecture with CPUID
DisplayFamily_DisplayModel 0F_02, 0F_03, 0F_04, 0F_06. This feature uses bit 11 in CCCRs associated with the IQ
block. See Table 18-65.
The extended cascading feature can be adapted to the Interrupt based sampling usage model for performance
monitoring. However, it is known that performance counters do not generate PMI in cascade mode or extended
cascade mode due to an erratum. This erratum applies to processors with CPUID DisplayFamily_DisplayModel
signature of 0F_02. For processors with CPUID DisplayFamily_DisplayModel signature of 0F_00 and 0F_01, the
erratum applies to processors with stepping encoding greater than 09H.
Counters 16 and 17 in the IQ block are frequently used in processor event-based sampling or at-retirement
counting of events indicating a stalled condition in the pipeline. Neither counter 16 or 17 can initiate the cascading
of counter pairs using the cascade bit in a CCCR.
Extended cascading permits performance monitoring tools to use counters 16 and 17 to initiate cascading of two
counters in the IQ block. Extended cascading from counter 16 and 17 is conceptually similar to cascading other
counters, but instead of using CASCADE bit of a CCCR, one of the four CASCNTxINTOy bits is used.
Example 18-2. Scenario for Extended Cascading
A usage scenario for extended cascading is to sample instructions retired on logical processor 1 after the first 4096
instructions retired on logical processor 0. A procedure to program extended cascading in this scenario is outlined
below:
1. Write the value 0 to counter 12.
2. Write the value 04000603H to MSR_CRU_ESCR0 (corresponding to selecting the NBOGNTAG and NBOGTAG
event masks with qualification restricted to logical processor 1).
3. Write the value 04038800H to MSR_IQ_CCCR0. This enables CASCNT4INTO0 and OVF_PMI. An ISR can sample
on instruction addresses in this case (do not set ENABLE, or CASCADE).
4. Write the value FFFFF000H into counter 16.1.
5. Write the value 0400060CH to MSR_CRU_ESCR2 (corresponding to selecting the NBOGNTAG and NBOGTAG
event masks
with qualification restricted to logical processor 0).
6. Write the value 00039000H to MSR_IQ_CCCR4 (set ENABLE bit, but not OVF_PMI).
Another use for cascading is to locate stalled execution in a multithreaded application. Assume MOB replays in
thread B cause thread A to stall. Getting a sample of the stalled execution in this scenario could be accomplished
by:
1. Set up counter B to count MOB replays on thread B.
2. Set up counter A to count resource stalls on thread A; set its force overflow bit and the appropriate CASCNTx-
INTOy bit.
3. Use the performance monitoring interrupt to capture the program execution data of the stalled thread.
Table 18-65. CCR Names and Bit Positions
CCCR Name:Bit Position
Bit Name
Description
MSR_IQ_CCCR1|2:11
Reserved
MSR_IQ_CCCR0:11
CASCNT4INTO0
Allow counter 4 to cascade into counter 0
MSR_IQ_CCCR3:11
CASCNT5INTO3
Allow counter 5 to cascade into counter 3
MSR_IQ_CCCR4:11
CASCNT5INTO4
Allow counter 5 to cascade into counter 4
MSR_IQ_CCCR5:11
CASCNT4INTO5
Allow counter 4 to cascade into counter 5