background image

18-64 Vol. 3B

PERFORMANCE MONITORING

Details of the uncore performance monitoring facility of Intel Xeon Processor E5 v2 and Intel Xeon Processor E7 v2 
families are available in “Intel® Xeon® Processor E5 v2 and E7 v2 Uncore Performance Monitoring Programming 
Reference Manual”. The MSR-based uncore PMU interfaces are listed in Table 35-26.

18.11 4TH 

GENERATION 

INTEL

®

 CORE

 PROCESSOR PERFORMANCE 

MONITORING FACILITY

The 4th generation Intel

®

 Core™ processor and Intel

®

 Xeon

®

 processor E3-1200 v3 product family are based on 

the Haswell microarchitecture. The core PMU supports architectural performance monitoring capability with version 
ID 3 (see Section 18.2.3) and a host of non-architectural monitoring capabilities. 
Architectural performance monitoring version 3 capabilities are described in Section 18.2.3. 
The core PMU’s capability is similar to those described in Section 18.9 through Section 18.9.5, with some differ-
ences and enhancements summarized in Table 18-42. Additionally, the core PMU provides some enhancement to 
support performance monitoring when the target workload contains instruction streams using Intel

®

 Transactional 

Synchronization Extensions (TSX), see 

Section 18.11.5. For details of Intel TSX, see Chapter 16, “Programming with 

Intel® Transactional Synchronization Extensions” of Intel® 64 and IA-32 Architectures Software Developer’s 
Manual, Volume 1
.

Table 18-42.  Core PMU Comparison

Box

Intel® microarchitecture code 

name Haswell

Intel® microarchitecture code 

name Sandy Bridge

Comment

# of Fixed counters per thread

3

3

# of general-purpose counters 

per core

8

8

Counter width (R,W)

R:48, W: 32/48

R:48, W: 32/48

See Section 18.2.2.

# of programmable counters per 

thread

4 or (8 if a core not shared by two 

threads)

4 or (8 if a core not shared by 

two threads)

Use CPUID to enumerate 

# of counters.

PMI Overhead Mitigation

• Freeze_Perfmon_on_PMI with 

legacy semantics.

• Freeze_on_LBR  with  legacy 

semantics for branch profiling.

• Freeze_while_SMM. 

• Freeze_Perfmon_on_PMI 

with legacy semantics.

• Freeze_on_LBR  with  legacy 

semantics for branch 

profiling.

• Freeze_while_SMM.

See Section 17.4.7.

Processor Event Based Sampling 

(PEBS) Events

See Table 18-32 and Section 

18.11.5.1.

See Table 18-32.

IA32_PMC4-IA32_PMC7 

do not support PEBS.

PEBS-Load Latency

See Section 18.9.4.2.

See Section 18.9.4.2.

PEBS-Precise Store

No, replaced by Data Address 

profiling.

Section 18.9.4.3

PEBS-PDIR

Yes (using precise 

INST_RETIRED.ALL)

Yes (using precise 

INST_RETIRED.ALL)

PEBS-EventingIP

Yes

No

Data Address Profiling

Yes

No

LBR Profiling

Yes

Yes

Call Stack Profiling

Yes, see Section 17.9.

No

Use LBR facility.

Off-core Response Event

MSR 1A6H and 1A7H; extended 

request and response types.

MSR 1A6H and 1A7H; extended 

request and response types.

Intel TSX support for Perfmon

See Section 18.11.5.

No