background image

Vol. 3B 22-33

ARCHITECTURE COMPATIBILITY

22.33.1 Segment 

Wraparound

On the 8086 processor, an attempt to access a memory operand that crosses offset 65,535 or 0FFFFH or offset 0 
(for example, moving a word to offset 65,535 or pushing a word when the stack pointer is set to 1) causes the 
offset to wrap around modulo 65,536 or 010000H. With the Intel 286 processor, any base and offset combination 
that addresses beyond 16 MBytes wraps around to the 1 MByte of the address space. The P6 family, Pentium, 
Intel486, and Intel386 processors in real-address mode generate an exception in these cases: 

A general-protection exception (#GP) if the segment is a data segment (that is, if the CS, DS, ES, FS, or GS 
register is being used to address the segment).

A stack-fault exception (#SS) if the segment is a stack segment (that is, if the SS register is being used). 

An exception to this behavior occurs when a stack access is data aligned, and the stack pointer is pointing to the 
last aligned piece of data that size at the top of the stack (ESP is FFFFFFFCH). When this data is popped, no 
segment limit violation occurs and the stack pointer will wrap around to 0. 
The address space of the P6 family, Pentium, and Intel486 processors may wraparound at 1 MByte in real-address 
mode. An external A20M# pin forces wraparound if enabled. On Intel 8086 processors, it is possible to specify 
addresses greater than 1 MByte. For example, with a selector value FFFFH and an offset of FFFFH, the effective 
address would be 10FFEFH (1 MByte plus 65519 bytes). The 8086 processor, which can form addresses up to 20 
bits long, truncates the uppermost bit, which “wraps” this address to FFEFH. However, the P6 family, Pentium, and 
Intel486 processors do not truncate this bit if A20M# is not enabled. 
If a stack operation wraps around the address limit, shutdown occurs. (The 8086 processor does not have a shut-
down mode or a limit.) 
The behavior when executing near the limit of a 4-GByte selector (limit = FFFFFFFFH) is different between the 
Pentium Pro and the Pentium 4 family of processors. On the Pentium Pro, instructions which cross the limit -- for 
example, a two byte instruction such as INC EAX that is encoded as FFH C0H starting exactly at the limit faults for 
a segment violation (a one byte instruction at FFFFFFFFH does not cause an exception). Using the Pentium 4 micro-
processor family, neither of these situations causes a fault.
Segment wraparound and the functionality of A20M# is used primarily by older operating systems and not used by 
modern operating systems. On newer Intel 64 processors, A20M# may be absent. 

22.34  STORE BUFFERS AND MEMORY ORDERING

The Pentium 4, Intel Xeon, and P6 family processors provide a store buffer for temporary storage of writes (stores) 
to memory (see Section 11.10, “Store Buffer”). Writes stored in the store buffer(s) are always written to memory 
in program order, with the exception of “fast string” store operations (see Section 8.2.4, “Fast-String Operation and 
Out-of-Order Stores”).
The Pentium processor has two store buffers, one corresponding to each of the pipelines. Writes in these buffers 
are always written to memory in the order they were generated by the processor core.
It should be noted that only memory writes are buffered and I/O writes are not. The Pentium 4, Intel Xeon, P6 
family, Pentium, and Intel486 processors do not synchronize the completion of memory writes on the bus and 
instruction execution after a write. An I/O, locked, or serializing instruction needs to be executed to synchronize 
writes with the next instruction (see Section 8.3, “Serializing Instructions”).
The Pentium 4, Intel Xeon, and P6 family processors use processor ordering to maintain consistency in the order 
that data is read (loaded) and written (stored) in a program and the order the processor actually carries out the 
reads and writes. With this type of ordering, reads can be carried out speculatively and in any order, reads can pass 
buffered writes, and writes to memory are always carried out in program order. (See Section 8.2, “Memory 
Ordering,” for m
ore information about processor ordering.) The Pentium III processor introduced a new instruction 
to serialize writes and make them globally visible. Memory ordering issues can arise between a producer and a 
consumer of data. The SFENCE instruction provides a performance-efficient way of ensuring ordering between 
routines that produce weakly-ordered results and routines that consume this data.
No re-ordering of reads occurs on the Pentium processor, except under the condition noted in Section 8.2.1, 
“Memory Ordering in the Intel

®

 Pentium

®

 and Intel486

 Processors,” and in the following paragraph describing 

the Intel486 processor.