Table Of ContentGeneral Decimal Arithmetic
Specification
25th March 2009
Mike Cowlishaw
IBM Fellow
IBM UK Laboratories
[email protected]
Version 1.70
Copyright © IBM Corporation 2009. All rights reserved.
Table of Contents
Introduction 5
Scope 7
Objectives 7
Inclusions 7
Exclusions 7
Restrictions 8
The Arithmetic Model 9
Abstract representation of numbers 9
Abstract representation of operations 12
Abstract representation of context 13
Default contexts 16
Conversions 17
Numeric string syntax 17
to-scientific-string – conversion to numeric string 19
to-engineering-string – conversion to numeric string 20
to-number – conversion from numeric string 21
Arithmetic operations 23
abs 26
add and subtract 26
compare 27
compare-signal 27
divide 27
divide-integer 29
exp 29
fused-multiply-add 30
ln 31
log10 31
max 32
max-magnitude 32
min 32
min-magnitude 33
minus and plus 33
multiply 33
next-minus 34
next-plus 34
Version 1.70 3
next-toward 34
power 35
quantize 36
reduce 37
remainder 37
remainder-near 38
round-to-integral-exact 39
round-to-integral-value 39
square-root 39
Miscellaneous operations 41
and 41
canonical 42
class 42
compare-total 42
compare-total-magnitude 43
copy 43
copy-abs 44
copy-negate 44
copy-sign 44
invert 44
is-canonical 44
is-finite 45
is-infinite 45
is-NaN 45
is-normal 45
is-qNaN 46
is-signed 46
is-sNaN 46
is-subnormal 46
is-zero 46
logb 47
or 47
radix 47
rotate 47
same-quantum 48
scaleb 48
shift 49
xor 49
Exceptional conditions 51
Appendix A – The X3.274 subset 55
Appendix B – Design concepts 59
Appendix C – Changes 61
Index 69
4 Version 1.70
Introduction
This document defines a general purpose decimal arithmetic for both limited precision floating-point
(as defined by the IEEE 754 standard1 approved in June 2008) and for arbitrary precision floating-
point (following the same principles as IEEE 754 and the earlier IEEE 854-1987 standard).2 In
addition to floating-point arithmetic, integer and unrounded floating-point arithmetic are included as
subsets.
The primary audience for this document is implementers, so examples and other explanatory material
are included. Explanatory material is identified as Notes, Examples, or footnotes, and is not part of
the formal specification.
Appendix A (see page 55) describes a simplified subset of the full arithmetic which implements the
decimal floating-point arithmetic defined in the ANSI standard X3.274-19963 (this provides the model
for the unrounded floating-point rules). Appendix B (see page 59) summarizes the design concepts
behind the decimal arithmetic. Appendix C (see page 61) lists the changes to this specification.
This document in various softcopy formats, together with a reference implementation, testcases,
concrete representations (encodings), and background information may be found at
http://speleotrove.com/decimal
Comments on this draft are welcome. Please send any comments, suggestions, and corrections to the
author, Mike Cowlishaw ([email protected]).
Acknowledgements
Very many people have contributed to the arithmetic described in this document, especially the 1980
Rexx Language Committee, the IBM Rexx Architecture Review Board, the IBM Vienna Compiler
group, the X3 (now NCITS) J18 technical committee, the authors of the IEEE 854 standard, and the
members of the IEEE 754r (revision) committee. Special thanks for their contributions to the current
design and this document are due to Aahz, Merav Aharoni, Nelson Beebe, Joshua Bloch, Dirk
Bosmans, Paul-Georges Crismer, Joe Darcy, Gunnar Degnbol, Mark Dickinson, John Ehrman, Kit
George, Peter Golde, Michel Hack, Brian Marks, Ilan Nehama, Dave Raggett, Fred Ris, Eric Schwarz,
Ron Smith, and Phil Yeh.
1 IEEE 754-2008 – IEEE Standard for Floating-Point Arithmetic, The Institute of Electrical and Electronics Engineers,
Inc., New York, 2008. (In press.)
2 IEEE 854-1987 – IEEE Standard for Radix-Independent Floating-Point Arithmetic, The Institute of Electrical and
Electronics Engineers, Inc., New York, 1987.
3 American National Standard for Information Technology – Programming Language REXX, X3.274-1996, American
National Standards Institute, New York, 1996.
Version 1.70 Introduction 5
Scope
Objectives
This document defines a general purpose decimal arithmetic. A correct implementation of this
specification using appropriate parameters will conform to the decimal arithmetic defined in IEEE
standard 754-2008,4 except for some minor restrictions (see page 8), and will also provide unrounded
decimal arithmetic5 and integer arithmetic as proper subsets.
Inclusions
This specification defines the following:
• Constraints on the values of decimal numbers
• Operations on decimal numbers, including
◦ Required conversions between string and internal representations of numbers
◦ Arithmetical operations on decimal numbers (addition, subtraction, etc.)
• Context information which alters the results of operation, and default contexts.
• Exceptional conditions, such as overflow, underflow, undefined results, and other exceptional
situations which may occur during operations.
Exclusions
This specification does not define the following:
• Concrete representations (storage format) of decimal numbers6
• Concrete representations (storage format) of context information
• The means by which operations are effected
4 IEEE 754-2008 – IEEE Standard for Floating-Point Arithmetic, The Institute of Electrical and Electronics Engineers,
Inc., New York, 2008. (In press.)
5 Sometimes called “fixed-point” decimal arithmetic.
6 The IEEE 754 decimal encodings for interchange formats are described in:
http://speleotrove.com/decimal/decbits.pdf
Version 1.70 Scope 7
Restrictions
This specification deviates from the requirements of IEEE 754 in the following respects:
1. The remainder-near operator is restricted to those values where the intermediate integer can be
represented in the current precision.7
2. The mathematical functions do not, in general, correspond to the recommended functions in
IEEE 754 with the same or similar names; in particular, the power function has some different
special cases, and most of the functions may be up to one unit wrong in the last place.
3. The squareroot function is only specified here for one rounding algorithm (IEEE 754 requires it
to be supported for all rounding algorithms). However, it is defined to be correctly rounded.
The requirements of IEEE 854 over the use of the terms single precision and double precision are not
followed in this specification because since that standard was published these terms have become
synonymous with particular sizes of encodings (32-bit and 64-bit respectively).
7 This is because the conventional implementation of this operator would be unacceptably long-running for the range of
numbers allowed by this specification (with up to nine digits of exponent). For restricted-range numbers, an
implementation can easily be made to conform to IEEE 754 in this respect.
8 Scope Version 1.70
The Arithmetic Model
This specification is based on a model of decimal arithmetic which is a formalization of the decimal
system of numeration (Algorism) as further defined and constrained by the relevant standards (IEEE
854, ANSI X3-274, and IEEE 754-2008).
There are three components to the model:
1. numbers – which represent the values which can be manipulated by, or be the results of, the core
operations defined in this specification
2. operations – the core operations (such as addition, multiplication, etc.) which can be carried out
on numbers
3. context – which represents the user-selectable parameters and rules which govern the results of
arithmetic operations (for example, the precision to be used).
This specification defines these components in the abstract. It neither defines the way in which
operations are expressed (which might vary depending on the computer language or other interface
being used),8 nor does it define the concrete representation (specific layout in storage, or in a processor’s
register, for example) of numbers or context.
The remainder of this section describes the abstract model for each component.
Abstract representation of numbers
Numbers represent the values which can be manipulated by, or be the results of, the core operations
defined in this specification. Numbers may be finite numbers (numbers whose value can be represented
exactly) or they may be special values (infinities and other values which are not finite numbers).
Finite numbers
Finite numbers are defined by three integer parameters:
1. sign – a value which must be either 0 or 1, where 1 indicates that the number is negative or is the
negative zero and 0 indicates that the number is zero or positive.
2. coefficient – an integer which is zero or positive.
In the abstract, there is no upper limit on the maximum size of the coefficient. In practice, an
implementation may need to define a specific upper limit (for example, the length of the
maximum coefficient supported by the concrete representation). This limit must be expressed as
an integral number of decimal digits.9
8 Indeed, some variations of operations could be selected by using context settings outside the scope of this specification.
9 That is, the maximum value of the coefficient will be an integral power of ten, less one – for example,
Version 1.70 The Arithmetic Model 9
3. exponent – a signed integer which indicates the power of ten by which the coefficient is
multiplied.
In the abstract, there is no upper limit on the absolute value of the exponent. In practice there
may be some upper limit, Elimit, on the absolute value of the exponent.
If the coefficient has a maximum length then it is required10 that E be greater than 5 ×
limit
mlength, where mlength is the maximum length of the coefficient in decimal digits. It is
recommended that E be greater than 10 × mlength.
limit
The adjusted exponent is the value of the exponent of a number when that number is expressed as
though in scientific notation with one digit (non-zero unless the coefficient is 0) before any
decimal point. This is given by the value of the exponent+(clength-1), where clength is the
length of the coefficient in decimal digits.
When a limit to the exponent applies, it must result in a balanced range of positive or negative
numbers,11 taking into account the magnitude of the coefficient. To achieve this balanced range,
the minimum and maximum values of the adjusted exponent (Emin and Emax respectively) must
have magnitudes which differ by no more than one, so E will be -E ±1. IEEE 754 further
min max
constrains this so that E = 1-E .
min max
Therefore, if the length of the coefficient is clength digits, the exponent may take any of the
values -E -(clength-1)+1 through E -(clength-1).
limit limit
For example, if the coefficient had the value 123456789 (9 digits) and the exponent had an Elimit of
999 (3 digits), then the exponent could range from -1006 through +991. This would allow
positive values of the number to range from 1.23456789E-998 through 1.23456789E+999.
It is recommended that E be expressed as an integral number of decimal digits or be one of
max
the numbers 1, 5, or 25, multiplied by an positive integral power of ten and optionally reduced
by one (for example, 49 or 50).
The numerical value of a finite number is given by: (-1)sign × coefficient × 10exponent.
The quantum of a finite number is given by: 1 × 10exponent. This is the value of a unit in the least
significant position of the coefficient of a finite number.12
This abstract definition deliberately allows for multiple representations of values which are
numerically equal but are visually distinct (such as 1 and 1.00). However, there is a one-to-one
mapping between the abstract representation and the result of the primary conversion to string using
to-scientific-string (see page 19) on that abstract representation. In other words, if one number has a
different abstract representation to another, then the primary string conversion will also be different.
Notes:
1. Many concrete representations for finite numbers have been used successfully. Typically, the
coefficient is represented in some form of binary coded or packed decimal, or is encoded using a
base which is a higher power of ten. It may also be expressed as a binary integer. The exponent
is typically represented by a two’s complement or biased binary integer. The IEEE 754
99999999999999999999.
10 See IEEE 854 §3.1.
11 This rule, a requirement for both ANSI X3.274 and IEEE 854, constrains the number of values which would overflow or
underflow when inverted (divided into 1).
12 This is slightly different from an ulp (unit in last position), which is defined such that ulp(x) is the difference between the
two nearest bracketing representable values to x, and which if x is exactly representable and is an exact power of the base
gives the “ulp below”.
10 The Arithmetic Model Version 1.70
Description:Mar 25, 2009 Explanatory material is identified as Notes, Examples, or footnotes, that the
compare operation can return a quiet NaN as a result, which.