background image

19-34 Vol. 3B

PERFORMANCE-MONITORING EVENTS

Table 19-11.  Non-Architectural Performance Events In the Processor Core of 

3rd Generation Intel® Core™ i7, i5, i3 Processors

Event

Num.

Umask

Value

Event Mask Mnemonic

Description

Comment

03H

02H

LD_BLOCKS.STORE_FORWARD

Loads blocked by overlapping with store buffer that 

cannot be forwarded.

03H

08H

LD_BLOCKS.NO_SR

The number of times that split load operations are 

temporarily blocked because all resources for 

handling the split accesses are in use.

05H

01H

MISALIGN_MEM_REF.LOADS

Speculative cache-line split load uops dispatched to 

L1D.

05H

02H

MISALIGN_MEM_REF.STORES

Speculative cache-line split Store-address uops 

dispatched to L1D.

07H

01H

LD_BLOCKS_PARTIAL.ADDRESS_

ALIAS

False dependencies in MOB due to partial compare 

on address.

08H

81H

DTLB_LOAD_MISSES.MISS_CAUSE

S_A_WALK

Misses in all TLB levels that cause a page walk of 

any page size from demand loads.

08H

82H

DTLB_LOAD_MISSES.WALK_COM

PLETED

Misses in all TLB levels that caused page walk 

completed of any size by demand loads.

08H

84H

DTLB_LOAD_MISSES.WALK_DUR

ATION

Cycle PMH is busy with a walk due to demand loads.

08H

88H

DTLB_LOAD_MISSES.LARGE_PAG

E_WALK_DURATION

Page walk for a large page completed for Demand 

load.

0EH

01H

UOPS_ISSUED.ANY

Increments each cycle the # of Uops issued by the 

RAT to RS. Set Cmask = 1, Inv = 1, Any= 1to count 

stalled cycles of this core.

Set Cmask = 1, Inv = 1to 

count stalled cycles.

0EH

10H

UOPS_ISSUED.FLAGS_MERGE

Number of flags-merge uops allocated. Such uops 

adds delay.

0EH

20H

UOPS_ISSUED.SLOW_LEA

Number of slow LEA or similar uops allocated. Such 

uop has 3 sources (e.g. 2 sources + immediate) 

regardless if as a result of LEA instruction or not.

0EH

40H

UOPS_ISSUED.SiNGLE_MUL

Number of multiply packed/scalar single precision 

uops allocated.

10H

01H

FP_COMP_OPS_EXE.X87

Counts number of X87 uops executed.

10H

10H

FP_COMP_OPS_EXE.SSE_FP_PAC

KED_DOUBLE

Counts number of SSE* or AVX-128 double 

precision FP packed uops executed.

10H

20H

FP_COMP_OPS_EXE.SSE_FP_SCA

LAR_SINGLE

Counts number of SSE* or AVX-128 single precision 

FP scalar uops executed.

10H

40H

FP_COMP_OPS_EXE.SSE_PACKED 

SINGLE

Counts number of SSE* or AVX-128 single precision 

FP packed uops executed.

10H

80H

FP_COMP_OPS_EXE.SSE_SCALAR

_DOUBLE

Counts number of SSE* or AVX-128 double 

precision FP scalar uops executed.

11H

01H

SIMD_FP_256.PACKED_SINGLE

Counts 256-bit packed single-precision floating-

point instructions.

11H

02H

SIMD_FP_256.PACKED_DOUBLE

Counts 256-bit packed double-precision floating-

point instructions.

14H

01H

ARITH.FPU_DIV_ACTIVE

Cycles that the divider is active, includes INT and FP. 

Set 'edge =1, cmask=1' to count the number of 

divides.