Vol. 3B 19-11
PERFORMANCE-MONITORING EVENTS
C6H
01H
FRONTEND_RETIRED.LATENCY_
GE_16
Retired instructions that are fetched after an interval
where the front end delivered no uops for at least 16
cycles. Specify the following fields in
MSR_PEBS_FRONTEND: EVTSEL=16H,
IDQ_Bubble_Length =16, IDQ_Bubble_Width = 4.
PS
C6H
01H
FRONTEND_RETIRED.LATENCY_
GE_2_BUBBLES_GE_m
Retired instructions that are fetched after an interval
where the front end had ‘m’ IDQ slots delivered, no uops
for at least 2 cycles. Specify the following fields in
MSR_PEBS_FRONTEND: EVTSEL=16H,
IDQ_Bubble_Length =2, IDQ_Bubble_Width = m.
PS, m = 1, 2, 3
C7H
01H
FP_ARITH_INST_RETIRED.SCAL
AR_DOUBLE
Number of double-precision, floating-point, scalar
SSE/AVX computational instructions that are retired.
Each scalar FMA instruction counts as 2.
Software may treat
each count as one DP
FLOP.
C7H
02H
FP_ARITH_INST_RETIRED.SCAL
AR_SINGLE
Number of single-precision, floating-point, scalar
SSE/AVX computational instructions that are retired.
Each scalar FMA instruction counts as 2.
Software may treat
each count as one SP
FLOP.
C7H
04H
FP_ARITH_INST_RETIRED.128B
_PACKED_DOUBLE
Number of double-precision, floating-point, 128-bit
SSE/AVX computational instructions that are retired.
Each 128-bit FMA or (V)DPPD instruction counts as 2.
Software may treat
each count as two DP
FLOPs.
C7H
08H
FP_ARITH_INST_RETIRED.128B
_PACKED_SINGLE
Number of single-precision, floating-point, 128-bit
SSE/AVX computational instructions that are retired.
Each 128-bit FMA or (V)DPPS instruction counts as 2.
Software may treat
each count as four SP
FLOPs.
C7H
10H
FP_ARITH_INST_RETIRED.256B
_PACKED_DOUBLE
Number of double-precision, floating-point, 256-bit
SSE/AVX computational instructions that are retired.
Each 256-bit FMA instruction counts as 2.
Software may treat
each count as four DP
FLOPs.
C7H
20H
FP_ARITH_INST_RETIRED.256B
_PACKED_SINGLE
Number of single-precision, floating-point, 256-bit
SSE/AVX computational instructions that are retired.
Each 256-bit FMA or VDPPS instruction counts as 2.
Software may treat
each count as eight
SP FLOPs.
CAH
1EH
FP_ASSIST.ANY
Cycles with any input/output SSE* or FP assists.
CMSK1
CBH
01H
HW_INTERRUPTS.RECEIVED
Number of hardware interrupts received by the
processor.
CDH
01H
MEM_TRANS_RETIRED.LOAD_L
ATENCY
Randomly sampled loads whose latency is above a user
defined threshold. A small fraction of the overall loads
are sampled due to randomization.
Specify threshold in
MSR 3F6H.
PSDLA
D0H
11H
MEM_INST_RETIRED.STLB_MISS
_LOADS
Retired load instructions that miss the STLB.
PSDLA
D0H
12H
MEM_INST_RETIRED.STLB_MISS
_STORES
Retired store instructions that miss the STLB.
PSDLA
D0H
21H
MEM_INST_RETIRED.LOCK_LOA
DS
Retired load instructions with locked access.
PSDLA
D0H
41H
MEM_INST_RETIRED.SPLIT_LOA
DS
Number of load instructions retired with cache-line
splits that may impact performance.
PSDLA
D0H
42H
MEM_INST_RETIRED.SPLIT_STO
RES
Number of store instructions retired with line-split.
PSDLA
D0H
81H
MEM_INST_RETIRED.ALL_LOAD
S
All retired load instructions.
PSDLA
D0H
82H
MEM_INST_RETIRED.ALL_STOR
ES
All retired store instructions.
PSDLA
Table 19-3. Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture
Event
Num.
Umask
Value
Event Mask Mnemonic
Description
Comment