background image

Vol. 3A 11-5

MEMORY CACHE CONTROL

line can be filled from memory with a 8-transfer burst transaction. The caches do not support partially-filled cache 
lines, so caching even a single doubleword requires caching an entire line.
The L1 and L2 cache lines in the P6 family and Pentium processors are 32 bytes wide, with cache line reads from 
system memory beginning on a 32-byte boundary (5 least-significant bits of a memory address clear.) A cache line 
can be filled from memory with a 4-transfer burst transaction. Partially-filled cache lines are not supported.
The trace cache in processors based on Intel NetBurst microarchitecture is available in all execution modes: 
protected mode, system management mode (SMM), and real-address mode. The L1,L2, and L3 caches are also 
available in all execution modes; however, use of them must be handled carefully in SMM (see Section 34.4.2, 
“SMRAM Caching”).
The TLBs store the most recently used page-directory and page-table entries. They speed up memory accesses 
when paging is enabled by reducing the number of memory accesses that are required to read the page tables 
stored in system memory. The TLBs are divided into four groups: instruction TLBs for 4-KByte pages, data TLBs for 
4-KByte pages; instruction TLBs for large pages (2-MByte, 4-MByte or 1-GByte pages), and data TLBs for large 
pages. The TLBs are normally active only in protected mode with paging enabled. When paging is disabled or the 
processor is in real-address mode, the TLBs maintain their contents until explicitly or implicitly flushed (see Section 
11.9, “Invalidating the Translation Lookaside Buffers (TLBs)”).
Processors based on Intel Core microarchitectures implement one level of instruction TLB and two levels of data 
TLB. Intel Core i7 processor provides a second-level unified TLB. 
The store buffer is associated with the processors instruction execution units. It allows writes to system memory 
and/or the internal caches to be saved and in some cases combined to optimize the processor’s bus accesses. The 
store buffer is always enabled in all execution modes.
The processor’s caches are for the most part transparent to software. When enabled, instructions and data flow 
through these caches without the need for explicit software control. However, knowledge of the behavior of these 
caches may be useful in optimizing software performance. For example, knowledge of cache dimensions and 
replacement algorithms gives an indication of how large of a data structure can be operated on at once without 
causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circumstances, require intervention by 
system software. For these rare cases, the processor provides privileged cache control instructions for use in 
flushing caches and forcing memory ordering.
There are several instructions that software can use to improve the performance of the L1, L2, and L3 caches, 
including the PREFETCHh, CLFLUSH, and CLFLUSHOPT instructions and the non-temporal move instructions 
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD). The use of these instructions are discussed in Section 
11.5.5, “Cache Management Instructions.”

11.2 CACHING 

TERMINOLOGY

IA-32 processors (beginning with the Pentium processor) and Intel 64 processors use the MESI (modified, exclu-
sive, shared, invalid) cache protocol to maintain consistency with internal caches and caches in other processors 
(see Section 11.4, “Cache Control Protocol”).
When the processor recognizes that an operand being read from memory is cacheable, the processor reads an 
entire cache line into the appropriate cache (L1, L2, L3, or all). This operation is called a cache line fill. If the 
memory location containing that operand is still cached the next time the processor attempts to access the 
operand, the processor can read the operand from the cache instead of going back to memory. This operation is 
called a cache hit
When the processor attempts to write an operand to a cacheable area of memory, it first checks if a cache line for 
that memory location exists in the cache. If a valid cache line does exist, the processor (depending on the write 
policy currently in force) can write the operand into the cache instead of writing it out to system memory. This 
operation is called a write hit. If a write misses the cache (that is, a valid cache line is not present for area of 
memory being written to), the processor performs a cache line fill, write allocation. Then it writes the operand into 
the cache line and (depending on the write policy currently in force) can also write it out to memory. If the operand 
is to be written out to memory, it is written first into the store buffer, and then written from the store buffer to 
memory when the system bus is available. (Note that for the Pentium processor, write misses do not result in a 
cache line fill; they always result in a write to memory. For this processor, only read misses result in cache line fills.)