19-34 Vol. 3B
PERFORMANCE-MONITORING EVENTS
Table 19-11. Non-Architectural Performance Events In the Processor Core of
3rd Generation Intel® Core™ i7, i5, i3 Processors
Event
Num.
Umask
Value
Event Mask Mnemonic
Description
Comment
03H
02H
LD_BLOCKS.STORE_FORWARD
Loads blocked by overlapping with store buffer that
cannot be forwarded.
03H
08H
LD_BLOCKS.NO_SR
The number of times that split load operations are
temporarily blocked because all resources for
handling the split accesses are in use.
05H
01H
MISALIGN_MEM_REF.LOADS
Speculative cache-line split load uops dispatched to
L1D.
05H
02H
MISALIGN_MEM_REF.STORES
Speculative cache-line split Store-address uops
dispatched to L1D.
07H
01H
LD_BLOCKS_PARTIAL.ADDRESS_
ALIAS
False dependencies in MOB due to partial compare
on address.
08H
81H
DTLB_LOAD_MISSES.MISS_CAUSE
S_A_WALK
Misses in all TLB levels that cause a page walk of
any page size from demand loads.
08H
82H
DTLB_LOAD_MISSES.WALK_COM
PLETED
Misses in all TLB levels that caused page walk
completed of any size by demand loads.
08H
84H
DTLB_LOAD_MISSES.WALK_DUR
ATION
Cycle PMH is busy with a walk due to demand loads.
08H
88H
DTLB_LOAD_MISSES.LARGE_PAG
E_WALK_DURATION
Page walk for a large page completed for Demand
load.
0EH
01H
UOPS_ISSUED.ANY
Increments each cycle the # of Uops issued by the
RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count
stalled cycles of this core.
Set Cmask = 1, Inv = 1to
count stalled cycles.
0EH
10H
UOPS_ISSUED.FLAGS_MERGE
Number of flags-merge uops allocated. Such uops
adds delay.
0EH
20H
UOPS_ISSUED.SLOW_LEA
Number of slow LEA or similar uops allocated. Such
uop has 3 sources (e.g. 2 sources + immediate)
regardless if as a result of LEA instruction or not.
0EH
40H
UOPS_ISSUED.SiNGLE_MUL
Number of multiply packed/scalar single precision
uops allocated.
10H
01H
FP_COMP_OPS_EXE.X87
Counts number of X87 uops executed.
10H
10H
FP_COMP_OPS_EXE.SSE_FP_PAC
KED_DOUBLE
Counts number of SSE* or AVX-128 double
precision FP packed uops executed.
10H
20H
FP_COMP_OPS_EXE.SSE_FP_SCA
LAR_SINGLE
Counts number of SSE* or AVX-128 single precision
FP scalar uops executed.
10H
40H
FP_COMP_OPS_EXE.SSE_PACKED
SINGLE
Counts number of SSE* or AVX-128 single precision
FP packed uops executed.
10H
80H
FP_COMP_OPS_EXE.SSE_SCALAR
_DOUBLE
Counts number of SSE* or AVX-128 double
precision FP scalar uops executed.
11H
01H
SIMD_FP_256.PACKED_SINGLE
Counts 256-bit packed single-precision floating-
point instructions.
11H
02H
SIMD_FP_256.PACKED_DOUBLE
Counts 256-bit packed double-precision floating-
point instructions.
14H
01H
ARITH.FPU_DIV_ACTIVE
Cycles that the divider is active, includes INT and FP.
Set 'edge =1, cmask=1' to count the number of
divides.