background image

36-8 Vol. 3C

INTEL® PROCESSOR TRACE

as PSB, will be generated normally regardless. Further, PIP and VMCS continue to be generated, as indicators of 
what software is running.

36.2.5.5   Filter Enable (FilterEn)

Filter Enable indicates that the Instruction Pointer (IP) is within the range of IPs that Intel PT is configured to watch. 
Software can get the state of Filter Enable by a RDMSR of IA32_RTIT_STATUS.FilterEn. For details on configuration 
and use of IP filtering, see Section 36.2.4.3.
On clearing of FilterEn that also clears PacketEn, a Packet Generation Disable (TIP.PGD) will be generated, but 
unlike the ContextEn case, the IP payload may not be suppressed. For direct, unconditional branches, as well as for 
indirect branches (including RETs), the PGD generated by leaving the tracing region and clearing FilterEn will 
contain the target IP. This means that IPs from outside the configured range can be exposed in the trace, as long 
as they are within context. 
When FilterEn is 0, control flow packets are not generated (e.g., TNT, TIP). However, some packets, such as PIP, 
MTC, and PSB, may still be generated while FilterEn is clear. For details on packet enable dependencies, see Section 
36.4.1.
After TraceEn is set, FilterEn is set to 1 at all times if there is no IP filter range configured by software 
(IA32_RTIT_CTL.ADDRn_CFG != 1, for all n), or if the processor does not support IP filtering (i.e., 
CPUID.(EAX=14H, ECX=0):EBX.IPFILT_WRSTPRSV[bit 2] = 0). FilterEn will toggle only when TraceEn=1 and 
ContextEn=1, and when at least one range is configured for IP filtering.

36.2.6 Trace 

Output

Intel PT output should be viewed independently from trace content and filtering mechanisms. The options available 
for trace output can vary across processor generations and platforms. 
Trace output is written out using one of the following output schemes, as configured by the ToPA and FabricEn bit 
fields of IA32_RTIT_CTL (see Section 36.2.7.2):

A single, contiguous region of physical address space. 

A collection of variable-sized regions of physical memory. These regions are linked together by tables of 
pointers to those regions, referred to as Table of Physical Addresses (ToPA). The trace output stores bypass 
the caches and the TLBs, but are not serializing. This is intended to minimize the performance impact of the 
output.

A platform-specific trace transport subsystem.

Regardless of the output scheme chosen, Intel PT stores bypass the processor caches by default. This ensures that 
they don't consume precious cache space, but they do not have the serializing aspects associated with un-cache-
able (UC) stores. Software should avoid using MTRRs to mark any portion of the Intel PT output region as UC, as 
this may override the behavior described above and force Intel PT stores to UC, thereby incurring severe perfor-
mance impact.
There is no guarantee that a packet will be written to memory or other trace endpoint after some fixed number of 
cycles after a packet-producing instruction executes. The only way to assure that all packets generated have 
reached their endpoint is to clear TraceEn and follow that with a store, fence, or serializing instruction; doing so 
ensures that all buffered packets are flushed out of the processor. 

36.2.6.1   Single Range Output

When IA32_RTIT_CTL.ToPA and IA32_RTIT_CTL.FabricEn bits are clear, trace packet output is sent to a single, 
contiguous memory (or MMIO if DRAM is not available) range defined by a base address in 
IA32_RTIT_OUTPUT_BASE (Section 36.2.7.7) and mask value in IA32_RTIT_OUTPUT_MASK_PTRS (Section 
36.2.7.8). 
The current write pointer in this range is also stored in IA32_RTIT_OUTPUT_MASK_PTRS. This output 
range is circular, meaning that when the writes wrap around the end of the buffer they begin again at the base 
address.
This output method is best suited for cases where Intel PT output is either:

Configured to be directed to a sufficiently large contiguous region of DRAM.