background image

Vol. 3B 19-145

PERFORMANCE-MONITORING EVENTS

86H

02H

FETCH_STALL.ICACHE_F

ILL_PENDING_CYCLES

Counts cycles that an ICache miss is outstanding, and instruction fetch 

is stalled. That is, the decoder queue is able to accept bytes, but the 

fetch unit is unable to provide bytes, while an Icache miss is 

outstanding. Note this event is not the same as cycles to retrieve an 

instruction due to an Icache miss. Rather, it is the part of the 

Instruction Cache (ICache) miss time where no bytes are available for 

the decoder.

9CH

00H

UOPS_NOT_DELIVERED.

ANY

This event is used to measure front-end inefficiencies, i.e., when the 

front end of the machine is not delivering uops to the back end and the 

back end has not stalled. This event can be used to identify if the 

machine is truly front-end bound. When this event occurs, it is an 

indication that the front end of the machine is operating at less than its 

theoretical peak performance. 
Background: We can think of the processor pipeline as being divided 

into 2 broader parts: the front end and the back end. The front end is 

responsible for fetching the instruction, decoding into uops in machine 

understandable format and putting them into a uop queue to be 

consumed by the back end. The back end then takes these uops and 

allocates the required resources. When all resources are ready, uops are 

executed. If the back end is not ready to accept uops from the front 

end, then we do not want to count these as front-end bottlenecks. 

However, whenever we have bottlenecks in the back end, we will have 

allocation unit stalls and eventually force the front end to wait until the 

back end is ready to receive more uops. This event counts only when 

the back end is requesting more micro-uops and the front end is not 

able to provide them. When 3 uops are requested and no uops are 

delivered, the event counts 3. When 3 are requested, and only 1 is 

delivered, the event counts 2. When only 2 are delivered, the event 

counts 1. Alternatively stated, the event will not count if 3 uops are 

delivered, or if the back end is stalled and not requesting any uops at 

all. Counts indicate missed opportunities for the front end to deliver a 

uop to the back end. Some examples of conditions that cause front-end 

efficiencies are: Icache misses, ITLB misses, and decoder restrictions 

that limit the front-end bandwidth. 
Known Issues: Some uops require multiple allocation slots. These uops 

will not be charged as a front end 'not delivered' opportunity, and will 

be regarded as a back-end problem. For example, the INC instruction 

has one uop that requires 2 issue slots. A stream of INC instructions will 

not count as UOPS_NOT_DELIVERED, even though only one instruction 

can be issued per clock. The low uop issue rate for a stream of INC 

instructions is considered to be a back-end issue.

B7H

01H, 

02H

OFFCORE_RESPONSE

Requires MSR_OFFCORE_RESP[0,1] to specify request type and 

response. (Duplicated for both MSRs.)

C0H

00H

INST_RETIRED.ANY_P

Counts the number of instructions that retire execution. For 

instructions that consist of multiple uops, this event counts the 

retirement of the last uop of the instruction. The event continues 

counting during hardware interrupts, traps, and inside interrupt 

handlers. This is an architectural performance event. This event uses a 

programmable general purpose performance counter. *This event is a 

Precise Event: the EventingRIP field in the PEBS record is precise to the 

address of the instruction which caused the event. 
Note: Because PEBS records can be collected only on IA32_PMC0, only 

one event can use the PEBS facility at a time.

Precise Event

Table 19-24.  Non-Architectural Performance Events for the Goldmont Microarchitecture (Contd.)

Event

Num.

Umask

Value

Event Name

Description

Comment