Page 417

Vol. 3A 11-5

MEMORY CACHE CONTROL

line can be filled from memory with a 8-transfer burst transaction. The caches do not support partially-filled cache
lines, so caching even a single doubleword requires caching an entire line.
The L1 and L2 cache lines in the P6 family and Pentium processors are 32 bytes wide, with cache line reads from
system memory beginning on a 32-byte boundary (5 least-significant bits of a memory address clear.) A cache line
can be filled from memory with a 4-transfer burst transaction. Partially-filled cache lines are not supported.
The trace cache in processors based on Intel NetBurst microarchitecture is available in all execution modes:
protected mode, system management mode (SMM), and real-address mode. The L1,L2, and L3 caches are also
available in all execution modes; however, use of them must be handled carefully in SMM (see Section 34.4.2,
“SMRAM Caching”).
The TLBs store the most recently used page-directory and page-table entries. They speed up memory accesses
when paging is enabled by reducing the number of memory accesses that are required to read the page tables
stored in system memory. The TLBs are divided into four groups: instruction TLBs for 4-KByte pages, data TLBs for
4-KByte pages; instruction TLBs for large pages (2-MByte, 4-MByte or 1-GByte pages), and data TLBs for large
pages. The TLBs are normally active only in protected mode with paging enabled. When paging is disabled or the
processor is in real-address mode, the TLBs maintain their contents until explicitly or implicitly flushed (see Section
11.9, “Invalidating the Translation Lookaside Buffers (TLBs)”).
Processors based on Intel Core microarchitectures implement one level of instruction TLB and two levels of data
TLB. Intel Core i7 processor provides a second-level unified TLB.
The store buffer is associated with the processors instruction execution units. It allows writes to system memory
and/or the internal caches to be saved and in some cases combined to optimize the processor’s bus accesses. The
store buffer is always enabled in all execution modes.
The processor’s caches are for the most part transparent to software. When enabled, instructions and data flow
through these caches without the need for explicit software control. However, knowledge of the behavior of these
caches may be useful in optimizing software performance. For example, knowledge of cache dimensions and
replacement algorithms gives an indication of how large of a data structure can be operated on at once without
causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circumstances, require intervention by
system software. For these rare cases, the processor provides privileged cache control instructions for use in
flushing caches and forcing memory ordering.
There are several instructions that software can use to improve the performance of the L1, L2, and L3 caches,
including the PREFETCHh, CLFLUSH, and CLFLUSHOPT instructions and the non-temporal move instructions
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD). The use of these instructions are discussed in Section
11.5.5, “Cache Management Instructions.”

11.2 CACHING

TERMINOLOGY

IA-32 processors (beginning with the Pentium processor) and Intel 64 processors use the MESI (modified, exclu-
sive, shared, invalid) cache protocol to maintain consistency with internal caches and caches in other processors
(see Section 11.4, “Cache Control Protocol”).
When the processor recognizes that an operand being read from memory is cacheable, the processor reads an
entire cache line into the appropriate cache (L1, L2, L3, or all). This operation is called a cache line fill. If the
memory location containing that operand is still cached the next time the processor attempts to access the
operand, the processor can read the operand from the cache instead of going back to memory. This operation is
called a cache hit.
When the processor attempts to write an operand to a cacheable area of memory, it first checks if a cache line for
that memory location exists in the cache. If a valid cache line does exist, the processor (depending on the write
policy currently in force) can write the operand into the cache instead of writing it out to system memory. This
operation is called a write hit. If a write misses the cache (that is, a valid cache line is not present for area of
memory being written to), the processor performs a cache line fill, write allocation. Then it writes the operand into
the cache line and (depending on the write policy currently in force) can also write it out to memory. If the operand
is to be written out to memory, it is written first into the store buffer, and then written from the store buffer to
memory when the system bus is available. (Note that for the Pentium processor, write misses do not result in a
cache line fill; they always result in a write to memory. For this processor, only read misses result in cache line fills.)