background image

Vol. 3B 19-99

PERFORMANCE-MONITORING EVENTS

CCH

02H

FP_MMX_TRANS.TO_MMX

Counts the first MMX instruction following a 

floating-point instruction. You can use this event to 

estimate the penalties for the transitions between 

floating-point and MMX technology states.

CCH

03H

FP_MMX_TRANS.ANY

Counts all transitions from floating point to MMX 

instructions and from MMX instructions to floating 

point instructions. You can use this event to 

estimate the penalties for the transitions between 

floating-point and MMX technology states.

D0H

01H

MACRO_INSTS.DECODED

Counts the number of instructions decoded, (but not 

necessarily executed or retired).

D1H

01H

UOPS_DECODED.STALL_CYCLE

S

Counts the cycles of decoder stalls. INV=1, Cmask= 

1.

D1H

02H

UOPS_DECODED.MS

Counts the number of Uops decoded by the 

Microcode Sequencer, MS. The MS delivers uops 

when the instruction is more than 4 uops long or a 

microcode assist is occurring. 

D1H

04H

UOPS_DECODED.ESP_FOLDIN

G

Counts number of stack pointer (ESP) instructions 

decoded: push, pop, call, ret, etc. ESP instructions do 

not generate a Uop to increment or decrement ESP. 

Instead, they update an ESP_Offset register that 

keeps track of the delta to the current value of the 

ESP register.

D1H

08H

UOPS_DECODED.ESP_SYNC

Counts number of stack pointer (ESP) sync 

operations where an ESP instruction is corrected by 

adding the ESP offset register to the current value 

of the ESP register.

D2H

01H

RAT_STALLS.FLAGS

Counts the number of cycles during which 

execution stalled due to several reasons, one of 

which is a partial flag register stall. A partial register 

stall may occur when two conditions are met: 1) an 

instruction modifies some, but not all, of the flags in 

the flag register and 2) the next instruction, which 

depends on flags, depends on flags that were not 

modified by this instruction.

D2H

02H

RAT_STALLS.REGISTERS

This event counts the number of cycles instruction 

execution latency became longer than the defined 

latency because the instruction used a register that 

was partially written by previous instruction.

D2H

04H

RAT_STALLS.ROB_READ_POR

T

Counts the number of cycles when ROB read port 

stalls occurred, which did not allow new micro-ops 

to enter the out-of-order pipeline. Note that, at this 

stage in the pipeline, additional stalls may occur at 

the same cycle and prevent the stalled micro-ops 

from entering the pipe. In such a case, micro-ops 

retry entering the execution pipe in the next cycle 

and the ROB-read port stall is counted again.

D2H

08H

RAT_STALLS.SCOREBOARD

Counts the cycles where we stall due to 

microarchitecturally required serialization. 

Microcode scoreboarding stalls.

Table 19-19.  Non-Architectural Performance Events In the Processor Core for 

Processors Based on IntelĀ® Microarchitecture Code Name Westmere (Contd.)

Event

Num.

Umask

Value

Event Mask Mnemonic

Description

Comment