ion-rfc-07-real-numbers.nroff

.tm 7. Real Numbers ................................................... \n%
.ti 0
7. Real Numbers

Ion supports two types of real numbers: floats and decimals.

Both the text and binary representations of an Ion value stream may be
compressed in one or more GZIP [RFC 1952] members.

.tm _  7.1. Floats .................................................... \n%
.ti 0
7.1. Floats

Ion supports IEEE-754 binary floating point values using the IEEE-754
32-bit (binary32) and 64-bit (binary64) encodings. In the data model,
all floating point values are treated as though they are binary64 (all
binary32 encoded values can be represented exactly in binary64).


.tm _     7.1.1. Encoding Considerations  ............................. \n%
.ti 0
7.1.1. Encoding Considerations

In text, binary float is represented using familiar base-10 digits.
While this is convenient for human representation, there is no explicit
notation for expressing a particular floating point value as binary32 or
binary64. Furthermore, many base-10 real numbers are irrational with
respect to base-2 and cannot be expressed exactly in either binary
floating point encoding (e.g. 1.1e0).

Because of this asymmetry, the rules for Ion text float notation when
round-tripping to Ion binary MUST be observed:

.in 9
.ti 6
o Any text notation that can be exactly represented as binary32 MAY be
      encoded as either binary32 or binary64 in Ion binary.

.ti 6
o Any text notation that can only be exactly represented as binary64
      MUST be encoded as binary64 in Ion binary.

.ti 6
o Any text notation that has no exact representation (i.e. irrational
      in base-2 or more precision than the binary64 mantissa), MUST be
      encoded as binary64. This is to ensure that irrational numbers or
      truncated values are represented in the highest fidelity of the
      float data type.

.in 3
When encoding a decimal real number that is irrational in base-2 or has
more precision than can be stored in binary64, the exact binary64 value
is determined by using the IEEE-754 round-to-nearest mode with a
round-half-to-even as the tie-break. This mode/tie-break is the common
default used in most programming environments and is discussed in detail in
"Correctly Rounded Binary-Decimal and Decimal-Binary Conversions" (see
http://ampl.com/REFS/rounding.pdf). This conversion algorithm is illustrated
in a straightforward way in Clinger's Algorithm (see
http://www.cesura17.net/~will/professional/research/papers/howtoread.pdf).

When encoding a binary32 or binary64 value in text notation, an
implementation MAY want to consider the approach described in "Printing
Floating-Point Numbers Quickly and Accurately" (see
http://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf).

.tm _     7.1.2. Special Values  ...................................... \n%
.ti 0
7.1.2. Special Values

The IEEE-754 binary floating point encoding supports special non-number
values. These are represented in the binary format as per the encoding
rules of the IEEE-754 specification, and are represented in text by the
following keywords:

.in 9
.ti 6
o nan - denotes the not a number (NaN) value.

.ti 6
o +inf - denotes positive infinity.

.ti 6
o -inf - denotes negative infinity.

.in 3
The Ion data model considers all encodings of positive infinity to be
equivalent to one another and all encodings of negative infinity to be
equivalent to one another. Thus, an implementation encoding +inf or -inf
in Ion binary MAY choose to encode it using the binary32 or
binary64 form.

The IEEE-754 specification has many encodings of NaN, but the Ion data
model considers all encodings of NaN (i.e. all forms of signaling or
quiet NaN) to be equivalent. Note that the text keyword nan does not map
to any particular encoding, the only requirement is that an
implementation emit a bit-pattern that represents an IEEE-754 NaN value
when converting to binary (e.g. the binary64 bit pattern of
0x7FF8000000000000).

An important consideration is that NaN is not treated in a consistent
manner between programming environments. For example, Java defines that
there is only one canonical NaN value and it happens to be signaling.
On C/C++, on the other hand, NaN is mostly platform defined, but on
platforms that support it, the NAN macro is a quiet NaN. In general,
common programming environments give testing routines for NaN, but no
consistent way to represent it.

.tm _     7.1.3. Examples  ............................................ \n%
.ti 0
7.1.2. Examples

To illustrate the text/binary round-tripping rules above, consider the
following examples.

The Ion text literal 2.147483647e9 overflows the 23-bits of significand
in binary32 and MUST be encoded in Ion binary as a binary64 value. The
Ion binary encoding for this text literal is as follows:

.KS
.nf
   0x48 0x41 0xDF 0xFF 0xFF 0xFF 0xC0 0x00 0x00
.KE

.in 3
The base-2 irrational literal 1.2e0 following the rounding and encoding
rules MUST be encoded in Ion binary as:

.KS
.nf
   0x48 0x3F 0xF3 0x33 0x33 0x33 0x33 0x33 0x33
.KE

.in 3
Although the textual representative of 1.2e0 itself is irrational, its
canonical form in the data model is not (based on the rounding rules),
thus the following text forms all map to the same binary64 value:

.KS
.nf
   // the most human-friendly representation
   1.2e0
   
   // the exact textual representation in base-10 for the binary64 value
   // 1.2e0 represents
   1.1999999999999999555910790149937383830547332763671875e0
   
   // a shortened, irrational version, but still the same value
   1.1999999999999999e0
   
   // a lengthened, irrational version that is still the same value
   1.19999999999999999999999999999999999999999999999999999999e0

.in 3

.tm _  7.2. Decimals .................................................. \n%
.ti 0
7.2. Decimals

Ion supports a decimal numeric type to allow accurate representation of
base-10 floating point values such as currency amounts. An Ion Decimal
has arbitrary precision and scale. This representation preserves
significant trailing zeros when converting between text and binary forms.

Decimals are supported in addition to the traditional base-2 floating
point type. This avoids the loss of exactness often incurred when storing
a decimal fraction as a binary fraction. Many common decimal numbers with
relatively few digits cannot be represented as a terminating
binary fraction.

.tm _     7.2.1. Data Model ........................................... \n%
.ti 0
7.2.1. Data Model

Ion decimals follow the IBM Hursley Lab General Decimal Arithmetic
Specification (see: http://speleotrove.com/decimal/decarith.html), which
defines an abstract decimal data model (see:
http://speleotrove.com/decimal/damodel.html) represented by the following
3-tuple:

.KS
.nf
   (<sign 0|1>, <coefficient: unsigned integer>, <exponent: integer>)
.KE

.in 3
Decimals should be considered equivalent if and only if their data model
tuples are equivalent, where exponents of +0 and -0 are considered
equivalent. All forms of positive zero are distinguished only by the
exponent. All forms of negative zero, which are distinct from all forms
of positive zero, also are distinguished only by the exponent.

.tm _     7.2.2. Text Format  ......................................... \n%
.ti 0
7.2.2. Text Format

The Hursley rules for describing a finite value converting from textual
notation must be followed. The Hursley rules for describing a special
value are not followed--the rules for

.in 9
.ti 6
o infinity - rule is not applicable for Ion Decimals

.ti 6
o nan - rule is not applicable for Ion Decimals

.in 3
Specifically, the rules for getting the integer coefficient from the
decimal-part (digits preceding the exponent) of the textual
representation are specified as follows.

.in 6
If the decimal-part included a decimal point the exponent is then
reduced by the count of digits following the decimal point (which may
be zero) and the decimal point is removed. The remaining string of
digits has any leading zeros removed (except for the rightmost digit)
and is then converted to form the coefficient which will be zero or
positive.

.in 3
Where X is any unsigned integer, all of the following formulae can be
demonstrated to be equivalent using the text conversion rules and the
data model.

.KS
.nf
   // Exponent implicitly zero
   X.
   // Exponent explicitly zero
   Xd0
   // Exponent explicitly negative zero (equivalent to zero).
   Xd-0
.KE

.in 3
Other equivalent representations include the following, where Y is the
number of digits in X.

.KS
.nf
   // There are Y digits past the decimal point in the
   // decimal-part, making the exponent zero. One leading zero
   // is removed.
   0.XdY
.KE

.in 3
For example, all of the following text Ion decimal representations are
equivalent to each other.

.KS
.nf
   0.
   0d0
   0d-0
   0.0d1
.KE

.in 3
Additionally, all of the following are equivalent to each other (but not
to any forms of positive zero).

.KS
   .nf
   -0.
   -0d0
   -0d-0
   -0.0d1
.KE

.in 3
Because all forms of zero are distinctly identified by the exponent, the
following are not equivalent to each other.

.KS
.nf
   // Exponent implicitly zero.
   0.
   // Exponent explicitly 5.
   0d5
.KE

.in 3
All of the following are equivalent to each other.

.KS
.nf
   42.
   42d0
   42d-0
   4.2d1
   0.42d2
.KE

.in 3
However, the following are not equivalent to each other.

.KS
   .nf
   // Text converted to 42.
   0.42d2
   // Text converted to 42.0
   0.420d2
.KE

.in 3

.tm _     7.2.3. Binary Format  ....................................... \n%
.ti 0
7.2.3. Binary Format

The encoding of Ion decimals, which follows the decimal data model
described above, is specified in [Ion Binary Encoding].

The following binary encodings of decimal values are all equivalent
to 0d0.

KS
.nf
   +-----------------+------------+-------------+
   | type descriptor |  exponent  | coefficient |
   |                 |  (VarInt)  |    (Int)    |
   +-----------------+------------+-------------+
   
   Most compact encoding of 0d0
   +-----------------+
   :      0x50       :
   +-----------------+
   
   Explicit encoding of 0d0
   +-----------------+------------+-------------+
   :      0x52       :    0x80    :    0x00     |
   +-----------------+------------+-------------+
   
   Explicit encoding of 0d(negative)0
   +-----------------+------------+-------------+
   :      0x52       :    0xC0    :    0x00     |
   +-----------------+------------+-------------+
   
   0d0 with overpadded coefficient
   +-----------------+------------+-------------+
   :      0x53       :    0x80    :  0x00 0x00  |
   +-----------------+------------+-------------+
   
   0d0 with overpadded exponent and coefficient
   +-----------------+------------+-------------+
   :      0x54       :  0x00 0x80 :  0x00 0x00  |
   +-----------------+------------+-------------+
.KE

.in 3
Note: The latter two examples demonstrate overpadded encodings of the
exponent and coefficient subfields. Overpadded encodings such as these
are possible for any decimal and are always equivalent to the
unpadded encoding.

The following binary encodings of decimal values are equivalent to -0d0
(but not to 0d0).

.KS
.nf
   +-----------------+------------+-------------+
   | type descriptor |  exponent  | coefficient |
   |                 |  (VarInt)  |    (Int)    |
   +-----------------+------------+-------------+
   
   Explicit encoding of (negative)0d0
   +-----------------+------------+-------------+
   :      0x52       :    0x80    :    0x80     |
   +-----------------+------------+-------------+
   
   Explicit encoding of (negative)0d(negative)0
   +-----------------+------------+-------------+
   :      0x52       :    0xC0    :    0x80     |
   +-----------------+------------+-------------+
.KE

.in 3
Finally, the following binary encodings of decimal values are equivalent
to 42d0.

.KS
.nf
   +-----------------+------------+-------------+
   | type descriptor |  exponent  | coefficient |
   |                 |  (VarInt)  |    (Int)    |
   +-----------------+------------+-------------+
   
   Explicit encoding of 42d0
   +-----------------+------------+-------------+
   :      0x52       :    0x80    :    0x2A     |
   +-----------------+------------+-------------+
   
   Explicit encoding of 42d(negative)0
   +-----------------+------------+-------------+
   :      0x52       :    0xC0    :    0x2A     |
   +-----------------+------------+-------------+
.KE

.in 3