background image

19-60 Vol. 3B

PERFORMANCE-MONITORING EVENTS

14H

01H

ARITH.CYCLES_DIV_BUSY

Counts the number of cycles the divider is busy 

executing divide or square root operations. The 

divide can be integer, X87 or Streaming SIMD 

Extensions (SSE). The square root operation can be 

either X87 or SSE. 
Set 'edge =1, invert=1, cmask=1' to count the 

number of divides.

Count may be incorrect 

When SMT is on.

14H

02H

ARITH.MUL

Counts the number of multiply operations executed. 

This includes integer as well as floating point 

multiply operations but excludes DPPS mul and 

MPSAD.

Count may be incorrect 

When SMT is on.

17H

01H

INST_QUEUE_WRITES

Counts the number of instructions written into the 

instruction queue every cycle. 

18H

01H

INST_DECODED.DEC0

Counts number of instructions that require decoder 

0 to be decoded. Usually, this means that the 

instruction maps to more than 1 uop.

19H

01H

TWO_UOP_INSTS_DECODED

An instruction that generates two uops was 

decoded.

1EH

01H

INST_QUEUE_WRITE_CYCLES

This event counts the number of cycles during 

which instructions are written to the instruction 

queue. Dividing this counter by the number of 

instructions written to the instruction queue 

(INST_QUEUE_WRITES) yields the average number 

of instructions decoded each cycle. If this number is 

less than four and the pipe stalls, this indicates that 

the decoder is failing to decode enough instructions 

per cycle to sustain the 4-wide pipeline. 

If SSE* instructions that 

are 6 bytes or longer 

arrive one after another, 

then front end 

throughput may limit 

execution speed.

20H

01H

LSD_OVERFLOW

Counts number of loops that can’t stream from the 

instruction queue.

24H

01H

L2_RQSTS.LD_HIT

Counts number of loads that hit the L2 cache. L2 

loads include both L1D demand misses as well as 

L1D prefetches. L2 loads can be rejected for various 

reasons. Only non rejected loads are counted.

24H

02H

L2_RQSTS.LD_MISS

Counts the number of loads that miss the L2 cache. 

L2 loads include both L1D demand misses as well as 

L1D prefetches. 

24H

03H

L2_RQSTS.LOADS

Counts all L2 load requests. L2 loads include both 

L1D demand misses as well as L1D prefetches. 

24H

04H

L2_RQSTS.RFO_HIT

Counts the number of store RFO requests that hit 

the L2 cache. L2 RFO requests include both L1D 

demand RFO misses as well as L1D RFO prefetches. 

Count includes WC memory requests, where the 

data is not fetched but the permission to write the 

line is required.

24H

08H

L2_RQSTS.RFO_MISS

Counts the number of store RFO requests that miss 

the L2 cache. L2 RFO requests include both L1D 

demand RFO misses as well as L1D RFO prefetches.

Table 19-17.  Non-Architectural Performance Events In the Processor Core for 

Intel® Core™ i7 Processor and Intel® Xeon® Processor 5500 Series (Contd.)

Event

Num.

Umask

Value

Event Mask Mnemonic

Description

Comment