19-60 Vol. 3B
PERFORMANCE-MONITORING EVENTS
14H
01H
ARITH.CYCLES_DIV_BUSY
Counts the number of cycles the divider is busy
executing divide or square root operations. The
divide can be integer, X87 or Streaming SIMD
Extensions (SSE). The square root operation can be
either X87 or SSE.
Set 'edge =1, invert=1, cmask=1' to count the
number of divides.
Count may be incorrect
When SMT is on.
14H
02H
ARITH.MUL
Counts the number of multiply operations executed.
This includes integer as well as floating point
multiply operations but excludes DPPS mul and
MPSAD.
Count may be incorrect
When SMT is on.
17H
01H
INST_QUEUE_WRITES
Counts the number of instructions written into the
instruction queue every cycle.
18H
01H
INST_DECODED.DEC0
Counts number of instructions that require decoder
0 to be decoded. Usually, this means that the
instruction maps to more than 1 uop.
19H
01H
TWO_UOP_INSTS_DECODED
An instruction that generates two uops was
decoded.
1EH
01H
INST_QUEUE_WRITE_CYCLES
This event counts the number of cycles during
which instructions are written to the instruction
queue. Dividing this counter by the number of
instructions written to the instruction queue
(INST_QUEUE_WRITES) yields the average number
of instructions decoded each cycle. If this number is
less than four and the pipe stalls, this indicates that
the decoder is failing to decode enough instructions
per cycle to sustain the 4-wide pipeline.
If SSE* instructions that
are 6 bytes or longer
arrive one after another,
then front end
throughput may limit
execution speed.
20H
01H
LSD_OVERFLOW
Counts number of loops that can’t stream from the
instruction queue.
24H
01H
L2_RQSTS.LD_HIT
Counts number of loads that hit the L2 cache. L2
loads include both L1D demand misses as well as
L1D prefetches. L2 loads can be rejected for various
reasons. Only non rejected loads are counted.
24H
02H
L2_RQSTS.LD_MISS
Counts the number of loads that miss the L2 cache.
L2 loads include both L1D demand misses as well as
L1D prefetches.
24H
03H
L2_RQSTS.LOADS
Counts all L2 load requests. L2 loads include both
L1D demand misses as well as L1D prefetches.
24H
04H
L2_RQSTS.RFO_HIT
Counts the number of store RFO requests that hit
the L2 cache. L2 RFO requests include both L1D
demand RFO misses as well as L1D RFO prefetches.
Count includes WC memory requests, where the
data is not fetched but the permission to write the
line is required.
24H
08H
L2_RQSTS.RFO_MISS
Counts the number of store RFO requests that miss
the L2 cache. L2 RFO requests include both L1D
demand RFO misses as well as L1D RFO prefetches.
Table 19-17. Non-Architectural Performance Events In the Processor Core for
Intel® Core™ i7 Processor and Intel® Xeon® Processor 5500 Series (Contd.)
Event
Num.
Umask
Value
Event Mask Mnemonic
Description
Comment