Vol. 3B 19-13
PERFORMANCE-MONITORING EVENTS
19.3
PERFORMANCE MONITORING EVENTS FOR THE INTEL
®
CORE
™
M AND 5TH
GENERATION INTEL
®
CORE
™
PROCESSORS
The Intel
®
Core™ M processors, the 5th generation Intel
®
Core™ processors and the Intel Xeon processor E3 1200
v4 product family are based on the Broadwell microarchitecture. They support the architectural performance-
monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are
listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of
DisplayFamily_DisplayModel encoding with the following values: 06_3DH and 06_47H. Table 19-8 lists perfor-
mance events supporting Intel TSX (see Section 18.11.5) and the events are available on processors based on
Broadwell microarchitecture. Fixed counters in the core PMU support the architecture events defined in Table 19-2.
Non-architectural performance monitoring events that are located in the uncore sub-system are implementation
specific between different platforms using processors based on Broadwell microarchitecture and with different
DisplayFamily_DisplayModel signatures. Processors with CPUID signature of DisplayFamily_DisplayModel 06_3DH
and 06_47H support uncore performance events listed in Table 19-9.
Table 19-5. Non-Architectural Performance Events of the Processor Core Supported by Broadwell
Microarchitecture
Event
Num.
Umask
Value
Event Mask Mnemonic
Description
Comment
03H
02H
LD_BLOCKS.STORE_FORWARD
Loads blocked by overlapping with store buffer that
cannot be forwarded.
03H
08H
LD_BLOCKS.NO_SR
The number of times that split load operations are
temporarily blocked because all resources for
handling the split accesses are in use.
05H
01H
MISALIGN_MEM_REF.LOADS
Speculative cache-line split load uops dispatched to
L1D.
05H
02H
MISALIGN_MEM_REF.STORES
Speculative cache-line split store-address uops
dispatched to L1D.
07H
01H
LD_BLOCKS_PARTIAL.ADDRESS
_ALIAS
False dependencies in MOB due to partial compare
on address.
08H
01H
DTLB_LOAD_MISSES.MISS_CAUS
ES_A_WALK
Load misses in all TLB levels that cause a page walk
of any page size.
08H
02H
DTLB_LOAD_MISSES.WALK_COM
PLETED_4K
Completed page walks due to demand load misses
that caused 4K page walks in any TLB levels.
08H
10H
DTLB_LOAD_MISSES.WALK_DUR
ATION
Cycle PMH is busy with a walk.
08H
20H
DTLB_LOAD_MISSES.STLB_HIT_
4K
Load misses that missed DTLB but hit STLB (4K).
0DH
03H
INT_MISC.RECOVERY_CYCLES
Cycles waiting to recover after Machine Clears
except JEClear. Set Cmask= 1.
Set Edge to count
occurrences.
0EH
01H
UOPS_ISSUED.ANY
Increments each cycle the # of uops issued by the
RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count
stalled cycles of this core.
Set Cmask = 1, Inv = 1to
count stalled cycles.
0EH
10H
UOPS_ISSUED.FLAGS_MERGE
Number of flags-merge uops allocated. Such uops
add delay.
0EH
20H
UOPS_ISSUED.SLOW_LEA
Number of slow LEA or similar uops allocated. Such
uop has 3 sources (for example, 2 sources +
immediate) regardless of whether it is a result of
LEA instruction or not.
0EH
40H
UOPS_ISSUED.SiNGLE_MUL
Number of multiply packed/scalar single precision
uops allocated.