Page 875

Vol. 3B 19-119

PERFORMANCE-MONITORING EVENTS

• The load is from bytes written by the preceding store,

the store is misaligned and the load is not aligned on the

beginning of the store.

• The load is split over an eight byte boundary (excluding

16-byte loads).

• The load and store have the same offset relative to the

beginning of different 4-KByte pages. This case is also

called 4-KByte aliasing.

• In all these cases the load is blocked until after the

blocking store retires and the stored data is committed to

the cache hierarchy.

03H

10H

LOAD_BLOCK.

UNTIL_RETIRE

Loads blocked until

retirement.

This event indicates that load operations were blocked until

retirement. The number of events is greater or equal to the

number of load operations that were blocked.

This includes mainly uncacheable loads and split loads (loads

that cross the cache line boundary) but may include other

cases where loads are blocked until retirement.

03H

20H

LOAD_BLOCK.L1D

Loads blocked by the

L1 data cache.

This event indicates that loads are blocked due to one or

more reasons. Some triggers for this event are:
• The number of L1 data cache misses exceeds the

maximum number of outstanding misses supported by

the processor. This includes misses generated as result of

demand fetches, software prefetches or hardware

prefetches.

• Cache line split loads.

• Partial reads, such as reads to un-cacheable memory, I/O

instructions and more.

• A locked load operation is in progress. The number of

events is greater or equal to the number of load

operations that were blocked.

04H

01H

SB_DRAIN_

CYCLES

Cycles while stores are

blocked due to store

buffer drain.

This event counts every cycle during which the store buffer

is draining. This includes:
• Serializing operations such as CPUID

• Synchronizing operations such as XCHG

• Interrupt acknowledgment

• Other conditions, such as cache flushing

04H

02H

STORE_BLOCK.

ORDER

Cycles while store is

waiting for a

preceding store to be

globally observed.

This event counts the total duration, in number of cycles,

which stores are waiting for a preceding stored cache line to

be observed by other cores.

This situation happens as a result of the strong store

ordering behavior, as defined in “Memory Ordering,” Chapter

8, Intel® 64 and IA-32 Architectures Software Developer’s

Manual, Volume 3A.

The stall may occur and be noticeable if there are many

cases when a store either misses the L1 data cache or hits a

cache line in the Shared state. If the store requires a bus

transaction to read the cache line then the stall ends when

snoop response for the bus transaction arrives.

STORE_BLOCK.

SNOOP

A store is blocked due

to a conflict with an

external or internal

snoop.

This event counts the number of cycles the store port was

used for snooping the L1 data cache and a store was stalled

by the snoop. The store is typically resubmitted one cycle

later.

Table 19-23. Non-Architectural Performance Events in Processors Based on Intel® Core™ Microarchitecture (Contd.)

Event

Num

Umask

Value

Event Name

Definition

Description and

Comment