Vol. 1 4-5
DATA TYPES
NOTE
Section 4.8, “Real Numbers and Floating-Point Formats,” gives an overview of the IEEE Standard
754 floating-point formats and defines the terms integer bit, QNaN, SNaN, and denormal value.
Table 4-3 shows the floating-point encodings for zeros, denormalized finite numbers, normalized finite numbers,
infinites, and NaNs for each of the three floating-point data types. It also gives the format for the QNaN floating-
point indefinite value. (See Section 4.8.3.7, “QNaN Floating-Point Indefinite,” for a discussion of the use of the
QNaN floating-point indefinite value.)
For the single-precision and double-precision formats, only the fraction part of the significand is encoded. The
integer is assumed to be 1 for all numbers except 0 and denormalized finite numbers. For the double extended-
precision format, the integer is contained in bit 63, and the most-significant fraction bit is bit 62. Here, the integer
is explicitly set to 1 for normalized numbers, infinities, and NaNs, and to 0 for zero and denormalized numbers.
Table 4-2. Length, Precision, and Range of Floating-Point Data Types
Data Type
Length
Precision
(Bits)
Approximate Normalized Range
Binary
Decimal
Half Precision
16
11
2
–14
to 2
15
3.1 × 10
–5
to 6.50 × 10
4
Single Precision
32
24
2
–126
to 2
127
1.18 × 10
–38
to 3.40 × 10
38
Double Precision
64
53
2
–1022
to 2
1023
2.23 × 10
–308
to 1.79 × 10
308
Double Extended
Precision
80
64
2
–16382
to 2
16383
3.37 × 10
–4932
to 1.18 × 10
4932
Table 4-3. Floating-Point Number and NaN Encodings
Class
Sign
Biased Exponent
Significand
Integer
1
Fraction
Positive
+∞
0
11..11
1
00..00
+Normals
0
.
.
0
11..10
.
.
00..01
1
.
.
1
11..11
.
.
00..00
+Denormals
0
.
.
0
00..00
.
.
00..00
0
.
.
0
11.11
.
.
00..01
+Zero
0
00..00
0
00..00
Negative
−Zero
1
00..00
0
00..00
−Denormals
1
.
.
1
00..00
.
.
00..00
0
.
.
0
00..01
.
.
11..11
−Normals
1
.
.
1
00..01
.
.
11..10
1
.
.
1
00..00
.
.
11..11
-∞
1
11..11
1
00..00