Vol. 3B 19-119
PERFORMANCE-MONITORING EVENTS
• The load is from bytes written by the preceding store,
the store is misaligned and the load is not aligned on the
beginning of the store.
• The load is split over an eight byte boundary (excluding
16-byte loads).
• The load and store have the same offset relative to the
beginning of different 4-KByte pages. This case is also
called 4-KByte aliasing.
• In all these cases the load is blocked until after the
blocking store retires and the stored data is committed to
the cache hierarchy.
03H
10H
LOAD_BLOCK.
UNTIL_RETIRE
Loads blocked until
retirement.
This event indicates that load operations were blocked until
retirement. The number of events is greater or equal to the
number of load operations that were blocked.
This includes mainly uncacheable loads and split loads (loads
that cross the cache line boundary) but may include other
cases where loads are blocked until retirement.
03H
20H
LOAD_BLOCK.L1D
Loads blocked by the
L1 data cache.
This event indicates that loads are blocked due to one or
more reasons. Some triggers for this event are:
• The number of L1 data cache misses exceeds the
maximum number of outstanding misses supported by
the processor. This includes misses generated as result of
demand fetches, software prefetches or hardware
prefetches.
• Cache line split loads.
• Partial reads, such as reads to un-cacheable memory, I/O
instructions and more.
• A locked load operation is in progress. The number of
events is greater or equal to the number of load
operations that were blocked.
04H
01H
SB_DRAIN_
CYCLES
Cycles while stores are
blocked due to store
buffer drain.
This event counts every cycle during which the store buffer
is draining. This includes:
• Serializing operations such as CPUID
• Synchronizing operations such as XCHG
• Interrupt acknowledgment
• Other conditions, such as cache flushing
04H
02H
STORE_BLOCK.
ORDER
Cycles while store is
waiting for a
preceding store to be
globally observed.
This event counts the total duration, in number of cycles,
which stores are waiting for a preceding stored cache line to
be observed by other cores.
This situation happens as a result of the strong store
ordering behavior, as defined in “Memory Ordering,” Chapter
8, Intel® 64 and IA-32 Architectures Software Developer’s
The stall may occur and be noticeable if there are many
cases when a store either misses the L1 data cache or hits a
cache line in the Shared state. If the store requires a bus
transaction to read the cache line then the stall ends when
snoop response for the bus transaction arrives.
04
H
08
H
STORE_BLOCK.
SNOOP
A store is blocked due
to a conflict with an
external or internal
snoop.
This event counts the number of cycles the store port was
used for snooping the L1 data cache and a store was stalled
by the snoop. The store is typically resubmitted one cycle
later.
Table 19-23. Non-Architectural Performance Events in Processors Based on Intel® Core™ Microarchitecture (Contd.)
Event
Num
Umask
Value
Event Name
Definition
Description and
Comment