background image

Vol. 3B 19-11

PERFORMANCE-MONITORING EVENTS

C6H

01H

FRONTEND_RETIRED.LATENCY_

GE_16

Retired instructions that are fetched after an interval 

where the front end delivered no uops for at least 16 

cycles. Specify the following fields in 

MSR_PEBS_FRONTEND: EVTSEL=16H, 

IDQ_Bubble_Length =16, IDQ_Bubble_Width = 4.

PS

C6H

01H

FRONTEND_RETIRED.LATENCY_

GE_2_BUBBLES_GE_m

Retired instructions that are fetched after an interval 

where the front end had â€˜m’ IDQ slots delivered, no uops 

for at least 2 cycles. Specify the following fields in 

MSR_PEBS_FRONTEND: EVTSEL=16H, 

IDQ_Bubble_Length =2, IDQ_Bubble_Width = m.

PS, m = 1, 2, 3

C7H

01H

FP_ARITH_INST_RETIRED.SCAL

AR_DOUBLE

Number of double-precision, floating-point, scalar 

SSE/AVX computational instructions that are retired. 

Each scalar FMA instruction counts as 2.

Software may treat 

each count as one DP 

FLOP.

C7H

02H

FP_ARITH_INST_RETIRED.SCAL

AR_SINGLE

Number of single-precision, floating-point, scalar 

SSE/AVX computational instructions that are retired. 

Each scalar FMA instruction counts as 2.

Software may treat 

each count as one SP 

FLOP.

C7H

04H

FP_ARITH_INST_RETIRED.128B

_PACKED_DOUBLE

Number of double-precision, floating-point, 128-bit 

SSE/AVX computational instructions that are retired. 

Each 128-bit FMA or (V)DPPD instruction counts as 2.

Software may treat 

each count as two DP 

FLOPs.

C7H

08H

FP_ARITH_INST_RETIRED.128B

_PACKED_SINGLE

Number of single-precision, floating-point, 128-bit 

SSE/AVX computational instructions that are retired. 

Each 128-bit FMA or (V)DPPS instruction counts as 2.

Software may treat 

each count as four SP 

FLOPs.

C7H

10H

FP_ARITH_INST_RETIRED.256B

_PACKED_DOUBLE

Number of double-precision, floating-point, 256-bit 

SSE/AVX computational instructions that are retired. 

Each 256-bit FMA instruction counts as 2.

Software may treat 

each count as four DP 

FLOPs.

C7H

20H

FP_ARITH_INST_RETIRED.256B

_PACKED_SINGLE

Number of single-precision, floating-point, 256-bit 

SSE/AVX computational instructions that are retired. 

Each 256-bit FMA or VDPPS instruction counts as 2.

Software may treat 

each count as eight 

SP FLOPs.

CAH

1EH

FP_ASSIST.ANY

Cycles with any input/output SSE* or FP assists.

CMSK1

CBH

01H

HW_INTERRUPTS.RECEIVED

Number of hardware interrupts received by the 

processor.

CDH

01H

MEM_TRANS_RETIRED.LOAD_L

ATENCY

Randomly sampled loads whose latency is above a user 

defined threshold. A small fraction of the overall loads 

are sampled due to randomization.

Specify threshold in 

MSR 3F6H.
PSDLA

D0H

11H

MEM_INST_RETIRED.STLB_MISS

_LOADS

Retired load instructions that miss the STLB.

PSDLA

D0H

12H

MEM_INST_RETIRED.STLB_MISS

_STORES

Retired store instructions that miss the STLB.

PSDLA

D0H

21H

MEM_INST_RETIRED.LOCK_LOA

DS

Retired load instructions with locked access.

PSDLA

D0H

41H

MEM_INST_RETIRED.SPLIT_LOA

DS

Number of load instructions retired with cache-line 

splits that may impact performance.

PSDLA

D0H

42H

MEM_INST_RETIRED.SPLIT_STO

RES

Number of store instructions retired with line-split.

PSDLA

D0H

81H

MEM_INST_RETIRED.ALL_LOAD

S

All retired load instructions.

PSDLA

D0H

82H

MEM_INST_RETIRED.ALL_STOR

ES

All retired store instructions.

PSDLA

Table 19-3.  Non-Architectural Performance Events of the Processor Core Supported by Skylake Microarchitecture

Event

Num.

Umask

Value

Event Mask Mnemonic

Description

Comment