Vol. 3A 11-9
MEMORY CACHE CONTROL
Software can use page-level cache control, to assign appropriate effective memory types when software will not
access data structures in ways that benefit from write-back caching. For example, software may read a large data
structure once and not access the structure again until the structure is rewritten by another agent. Such a large
data structure should be marked as uncacheable, or reading it will evict cached lines that the processor will be
referencing again.
A similar example would be a write-only data structure that is written to (to export the data to another agent), but
never read by software. Such a structure can be marked as uncacheable, because software never reads the values
that it writes (though as uncacheable memory, it will be written using partial writes, while as write-back memory,
it will be written using line writes, which may not occur until the other agent reads the structure and triggers
implicit write-backs).
On the Pentium III, Pentium 4, and more recent processors, new instructions are provided that give software
greater control over the caching, prefetching, and the write-back characteristics of data. These instructions allow
software to use weakly ordered or processor ordered memory types to improve processor performance, but when
necessary to force strong ordering on memory reads and/or writes. They also allow software greater control over
the caching of data. For a description of these instructions and there intended use, see Section 11.5.5, “Cache
Management Instructions.”
11.3.3
Code Fetches in Uncacheable Memory
Programs may execute code from uncacheable (UC) memory, but the implications are different from accessing
data in UC memory. When doing code fetches, the processor never transitions from cacheable code to UC code
speculatively. It also never speculatively fetches branch targets that result in UC code.
The processor may fetch the same UC cache line multiple times in order to decode an instruction once. It may
decode consecutive UC instructions in a cacheline without fetching between each instruction. It may also fetch
additional cachelines from the same or a consecutive 4-KByte page in order to decode one non-speculative UC
instruction (this can be true even when the instruction is contained fully in one line).
Because of the above and because cacheline sizes may change in future processors, software should avoid placing
memory-mapped I/O with read side effects in the same page or in a subsequent page used to execute UC code.
11.4
CACHE CONTROL PROTOCOL
The following section describes the cache control protocol currently defined for the Intel 64 and IA-32 architec-
tures.
In the L1 data cache and in the L2/L3 unified caches, the MESI (modified, exclusive, shared, invalid) cache protocol
maintains consistency with caches of other processors. The L1 data cache and the L2/L3 unified caches have two
MESI status flags per cache line. Each line can be marked as being in one of the states defined in Table 11-4. In
general, the operation of the MESI protocol is transparent to programs.
The L1 instruction cache in P6 family processors implements only the “SI” part of the MESI protocol, because the
instruction cache is not writable. The instruction cache monitors changes in the data cache to maintain consistency
Table 11-4. MESI Cache Line States
Cache Line State
M (Modified)
E (Exclusive)
S (Shared)
I (Invalid)
This cache line is valid?
Yes
Yes
Yes
No
The memory copy is…
Out of date
Valid
Valid
—
Copies exist in caches of other
processors?
No
No
Maybe
Maybe
A write to this line …
Does not go to the
system bus.
Does not go to the
system bus.
Causes the processor to
gain exclusive ownership
of the line.
Goes directly to the
system bus.