Page 89

Vol. 1 4-5

DATA TYPES

NOTE

Section 4.8, “Real Numbers and Floating-Point Formats,” gives an overview of the IEEE Standard
754 floating-point formats and defines the terms integer bit, QNaN, SNaN, and denormal value.

Table 4-3 shows the floating-point encodings for zeros, denormalized finite numbers, normalized finite numbers,
infinites, and NaNs for each of the three floating-point data types. It also gives the format for the QNaN floating-
point indefinite value. (See Section 4.8.3.7, “QNaN Floating-Point Indefinite,” for a discussion of the use of the
QNaN floating-point indefinite value.)
For the single-precision and double-precision formats, only the fraction part of the significand is encoded. The
integer is assumed to be 1 for all numbers except 0 and denormalized finite numbers. For the double extended-
precision format, the integer is contained in bit 63, and the most-significant fraction bit is bit 62. Here, the integer
is explicitly set to 1 for normalized numbers, infinities, and NaNs, and to 0 for zero and denormalized numbers.

Table 4-2. Length, Precision, and Range of Floating-Point Data Types

Data Type

Length

Precision

(Bits)

Approximate Normalized Range

Binary

Decimal

Half Precision

–14

to 2

3.1 × 10

–5

to 6.50 × 10

Single Precision

–126

to 2

127

1.18 × 10

–38

to 3.40 × 10

Double Precision

–1022

to 2

1023

2.23 × 10

–308

to 1.79 × 10

308

Double Extended

Precision

–16382

to 2

16383

3.37 × 10

–4932

to 1.18 × 10

4932

Table 4-3. Floating-Point Number and NaN Encodings

Class

Sign

Biased Exponent

Significand

Integer

Fraction

Positive

+∞

11..11

00..00

+Normals

11..10

00..01

11..11

00..00

+Denormals

00..00

11.11

00..01

+Zero

00..00

Negative

−Zero

00..00

−Denormals

00..00

00..01

11..11

−Normals

00..01

11..10

00..00

11..11

-∞

11..11

00..00