Page 901

Vol. 3B 19-145

PERFORMANCE-MONITORING EVENTS

86H

02H

FETCH_STALL.ICACHE_F

ILL_PENDING_CYCLES

Counts cycles that an ICache miss is outstanding, and instruction fetch

is stalled. That is, the decoder queue is able to accept bytes, but the

fetch unit is unable to provide bytes, while an Icache miss is

outstanding. Note this event is not the same as cycles to retrieve an

instruction due to an Icache miss. Rather, it is the part of the

Instruction Cache (ICache) miss time where no bytes are available for

the decoder.

9CH

00H

UOPS_NOT_DELIVERED.

ANY

This event is used to measure front-end inefficiencies, i.e., when the

front end of the machine is not delivering uops to the back end and the

back end has not stalled. This event can be used to identify if the

machine is truly front-end bound. When this event occurs, it is an

indication that the front end of the machine is operating at less than its

theoretical peak performance.
Background: We can think of the processor pipeline as being divided

into 2 broader parts: the front end and the back end. The front end is

responsible for fetching the instruction, decoding into uops in machine

understandable format and putting them into a uop queue to be

consumed by the back end. The back end then takes these uops and

allocates the required resources. When all resources are ready, uops are

executed. If the back end is not ready to accept uops from the front

end, then we do not want to count these as front-end bottlenecks.

However, whenever we have bottlenecks in the back end, we will have

allocation unit stalls and eventually force the front end to wait until the

back end is ready to receive more uops. This event counts only when

the back end is requesting more micro-uops and the front end is not

able to provide them. When 3 uops are requested and no uops are

delivered, the event counts 3. When 3 are requested, and only 1 is

delivered, the event counts 2. When only 2 are delivered, the event

counts 1. Alternatively stated, the event will not count if 3 uops are

delivered, or if the back end is stalled and not requesting any uops at

all. Counts indicate missed opportunities for the front end to deliver a

uop to the back end. Some examples of conditions that cause front-end

efficiencies are: Icache misses, ITLB misses, and decoder restrictions

that limit the front-end bandwidth.
Known Issues: Some uops require multiple allocation slots. These uops

will not be charged as a front end 'not delivered' opportunity, and will

be regarded as a back-end problem. For example, the INC instruction

has one uop that requires 2 issue slots. A stream of INC instructions will

not count as UOPS_NOT_DELIVERED, even though only one instruction

can be issued per clock. The low uop issue rate for a stream of INC

instructions is considered to be a back-end issue.

B7H

01H,

02H

OFFCORE_RESPONSE

Requires MSR_OFFCORE_RESP[0,1] to specify request type and

response. (Duplicated for both MSRs.)

C0H

00H

INST_RETIRED.ANY_P

Counts the number of instructions that retire execution. For

instructions that consist of multiple uops, this event counts the

retirement of the last uop of the instruction. The event continues

counting during hardware interrupts, traps, and inside interrupt

handlers. This is an architectural performance event. This event uses a

programmable general purpose performance counter. *This event is a

Precise Event: the EventingRIP field in the PEBS record is precise to the

address of the instruction which caused the event.
Note: Because PEBS records can be collected only on IA32_PMC0, only

one event can use the PEBS facility at a time.

Precise Event

Table 19-24. Non-Architectural Performance Events for the Goldmont Microarchitecture (Contd.)

Event

Num.

Umask

Value

Event Name

Description

Comment