background image

18-96 Vol. 3B

PERFORMANCE MONITORING

Tagging — Tagging is a means of marking μops that have encountered a particular performance event so they 

can be counted at retirement. During the course of execution, the same event can happen more than once per 
μop and a direct count of the event would not provide an indication of how many μops encountered that event. 
The tagging mechanisms allow a μop to be tagged once during its lifetime and thus counted once at retirement. 

The retired suffix is used for performance metrics that increment a count once per μop, rather than once per 

event. For example, a μop may encounter a cache miss more than once during its life time, but a “Miss Retired” 

metric (that counts the number of retired μops that encountered a cache miss) will increment only once for that 

μop. A “Miss Retired” metric would be useful for characterizing the performance of the cache hierarchy for a 

particular instruction sequence. Details of various performance metrics and how these can be constructed using 
the Pentium 4 and Intel Xeon processors performance events are provided in the Intel Pentium 4 Processor 
Optimization Reference Manual
 (see Section 1.4, “Related Literature”). 

Replay — To maximize performance for the common case, the Intel NetBurst microarchitecture aggressively 
schedules μops for execution before all the conditions for correct execution are guaranteed to be satisfied. In 

the event that all of these conditions are not satisfied, μops must be reissued. The mechanism that the Pentium 

4 and Intel Xeon processors use for this reissuing of μops is called replay. Some examples of replay causes are 

cache misses, dependence violations, and unforeseen resource constraints. In normal operation, some number 
of replays is common and unavoidable. An excessive number of replays is an indication of a performance 
problem.

Assist — When the hardware needs the assistance of microcode to deal with some event, the machine takes 
an assist. One example of this is an underflow condition in the input operands of a floating-point operation. The 
hardware must internally modify the format of the operands in order to perform the computation. Assists clear 
the entire machine of μops before they begin and are costly.

18.15.6.1   Using At-Retirement Counting

Processors based on Intel NetBurst microarchitecture allow counting both events and μops that encountered a 

specified event. For a subset of the at-retirement events listed in Table 19-29, a μop may be tagged when it 

encounters that event. The tagging mechanisms can be used in Interrupt-based event sampling, and a subset of 
these mechanisms can be used in PEBS. There are four independent tagging mechanisms, and each mechanism 
uses a different event to count μops tagged with that mechanism: 

Front-end tagging — This mechanism pertains to the tagging of μops that encountered front-end events (for 

example, trace cache and instruction counts) and are counted with the Front_end_event event

Execution tagging — This mechanism pertains to the tagging of μops that encountered execution events (for 

example, instruction types) and are counted with the Execution_Event event.

Replay tagging — This mechanism pertains to tagging of μops whose retirement is replayed (for example, a 

cache miss) and are counted with the Replay_event event. Branch mispredictions are also tagged with this 
mechanism.

No tags — This mechanism does not use tags. It uses the Instr_retired and the Uops_ retired events.

Each tagging mechanism is independent from all others; that is, a μop that has been tagged using one mechanism 

will not be detected with another mechanism’s tagged-μop detector. For example, if μops are tagged using the 

front-end tagging mechanisms, the Replay_event will not count those as tagged μops unless they are also tagged 

using the replay tagging mechanism. However, execution tags allow up to four different types of μops to be counted 

at retirement through execution tagging.
The independence of tagging mechanisms does not hold when using PEBS. When using PEBS, only one tagging 
mechanism should be used at a time. 
Certain kinds of μops that cannot be tagged, including I/O, uncacheable and locked accesses, returns, and far 

transfers.
Table 19-29 lists the performance monitoring events that support at-retirement counting: specifically the 
Front_end_event, Execution_event, Replay_event, Inst_retired and Uops_retired events. The following sections 
describe the tagging mechanisms for using these events to tag μop and count tagged μops.