Difference between revisions of "Floating point math issues"
(→Infinity) |
(→Loss of significant digits under some circumstances) |
||
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | On this page we provide some general background information about how floating-point math is implemented on computers—and some common pitfalls that you may encounter. For information about specific bugs or numerical errors found and fixed in GEOS-Chem, please see our [[Numerical issues discovered in GEOS-Chem]] page. | ||
+ | |||
== Floating-point is an approximation to the real number system == | == Floating-point is an approximation to the real number system == | ||
Line 20: | Line 22: | ||
|- bgcolor="#CCCCCC" | |- bgcolor="#CCCCCC" | ||
! IDL Number Type | ! IDL Number Type | ||
− | ! Fortran | + | ! Fortran equivalent |
+ | ! ESMF equivalent | ||
! Number of bytes | ! Number of bytes | ||
! Approx. range | ! Approx. range | ||
− | |- | + | |- align="center" |
| byte | | byte | ||
− | | BYTE | + | | BYTE |
+ | | - | ||
| 1 | | 1 | ||
| 0 to 255 | | 0 to 255 | ||
− | |- | + | |- align="center" |
| fix | | fix | ||
− | | INTEGER*2 | + | | INTEGER*2 |
+ | | ESMF_KIND_I2 | ||
| 2 | | 2 | ||
| 0 - 32767 | | 0 - 32767 | ||
− | |- | + | |- align="center" |
| long | | long | ||
− | | INTEGER*4 | + | | INTEGER*4 |
+ | | ESMF_KIND_I4 | ||
| 4 | | 4 | ||
− | | | + | | -214700000 to -214700000 |
− | |- | + | |- align="center" |
| float | | float | ||
− | | REAL*4 | + | | REAL*4 |
+ | | ESMF_KIND_R4 | ||
| 4 | | 4 | ||
| -1e-38 to 1e+38 | | -1e-38 to 1e+38 | ||
− | |- | + | |- align="center" |
| double | | double | ||
− | | REAL*8 | + | | REAL*8 |
+ | | ESMF_KIND_R8 | ||
| 8 | | 8 | ||
| -1e-312 to 1e+312 | | -1e-312 to 1e+312 | ||
Line 331: | Line 339: | ||
== Testing for non-representable values == | == Testing for non-representable values == | ||
− | Recall the floating point number line from [[#Floating-point is an approximation to the real number system|the above discussion]]. Each number type (e.g. INTEGER, REAL*4, REAL*8) has a minimum and maximum representable value. If we are not careful, the result of a mathematical computation may result in a value that cannot be represented for the given number type. There are two common non-representable values: Not-A-Number | + | Recall the floating point number line from [[#Floating-point is an approximation to the real number system|the above discussion]]. Each number type (e.g. INTEGER, REAL*4, REAL*8) has a minimum and maximum representable value. If we are not careful, the result of a mathematical computation may result in a value that cannot be represented for the given number type. There are two common non-representable values: Not-A-Number (aka NaN), and Infinity. |
=== NaN (Not-a-Number) === | === NaN (Not-a-Number) === | ||
− | '''''NaN (Not-A-Number)''''' is the IEEE designation for a value that cannot be represented for a given number type (the result of a mathematical computation that is undefined). For example, some computations that may result in NaN are: | + | '''''NaN (Not-A-Number)''''' is the IEEE designation for a value that cannot be represented for a given number type (e.g. the result of a mathematical computation that is undefined). For example, some computations that may result in NaN are: |
SQRT( x ) for all x < 0 | SQRT( x ) for all x < 0 | ||
Line 352: | Line 360: | ||
any number / NaN = NaN | any number / NaN = NaN | ||
− | etc. Each compiler supplies a function to trap for NaN values. Here are the NaN-trapping functions for the various compilers that are supported by GEOS-Chem: | + | etc. |
+ | |||
+ | The easiest (and most portable) way to check for Nan values in the code is to implement one of the following tests: | ||
+ | |||
+ | ! Scan a single value | ||
+ | IF ( X .ne. X ) THEN | ||
+ | PRINT*, 'X is NaN!' | ||
+ | ! ... stop the code here ... | ||
+ | ENDIF | ||
+ | |||
+ | ! Scan an array | ||
+ | WHERE( ARRAY .ne. ARRAY ) | ||
+ | ARRAY = -9999d0 !Set array to a missing value of your choice | ||
+ | ENDWHERE | ||
+ | |||
+ | Each compiler also supplies a function to trap for NaN values. Here are the NaN-trapping functions for the various compilers that are supported by GEOS-Chem: | ||
{| border=1 cellpadding=5 cellspacing=0 | {| border=1 cellpadding=5 cellspacing=0 | ||
Line 362: | Line 385: | ||
| Intel Fortran Compiler (IFORT) | | Intel Fortran Compiler (IFORT) | ||
| <tt>ISNAN( x )</tt> | | <tt>ISNAN( x )</tt> | ||
− | | Returns T if x = NaN; F otherwise | + | | Returns T if <tt>x=NaN</tt>; F otherwise |
|- | |- | ||
| Sun Studio | | Sun Studio | ||
| <tt>IEEE_IS_NAN( x )</tt> | | <tt>IEEE_IS_NAN( x )</tt> | ||
− | | Returns T if x = NaN; F otherwise | + | | Returns T if <tt>x=NaN</tt>; F otherwise |
|- | |- | ||
| Portland Group (PGF90) | | Portland Group (PGF90) | ||
Line 374: | Line 397: | ||
| SGI-MIPS Fortran | | SGI-MIPS Fortran | ||
| <tt>IEEE_IS_NAN( x )</tt> | | <tt>IEEE_IS_NAN( x )</tt> | ||
− | | Returns T if x = NaN; F otherwise | + | | Returns T if <tt>x=NaN</tt>; F otherwise |
|- | |- | ||
| HP/Compaq Fortran | | HP/Compaq Fortran | ||
| <tt>ISNAN( x )</tt> | | <tt>ISNAN( x )</tt> | ||
− | | Returns T if x = NaN; F otherwise | + | | Returns T if <tt>x=NaN</tt>; F otherwise |
|} | |} | ||
NOTES: | NOTES: | ||
# The functions listed in the above table take INTEGER, REAL*4, and REAL*8 arguments. | # The functions listed in the above table take INTEGER, REAL*4, and REAL*8 arguments. | ||
− | # For PGF90 you must write a wrapper in C to call the Linux <tt> | + | # For PGF90 you must write a wrapper in C to call the Linux <tt>isnan( x )</tt> function. GEOS-Chem provides a file <tt>linux_err.c</tt> to do this. |
# The GEOS-Chem module <tt>error_mod.f</tt> contains a routine <tt>IT_IS_NAN( x )</tt> which will call the proper compiler-dependent NaN function. | # The GEOS-Chem module <tt>error_mod.f</tt> contains a routine <tt>IT_IS_NAN( x )</tt> which will call the proper compiler-dependent NaN function. | ||
=== Infinity === | === Infinity === | ||
− | The values '''''-Infinity''''' and '''''+Infinity''''' are the IEEE designation for values that are, respectively, smaller than the smallest representable value or larger than the largest representable value for a given number type. For example, the | + | The values '''''-Infinity''''' and '''''+Infinity''''' are the IEEE designation for values that are, respectively, smaller than the smallest representable value or larger than the largest representable value for a given number type. For example, the computations that may return +Infinity or -Infinity are as follows: |
x / y for y = 0 (or for extremely small y such as 1e-300) | x / y for y = 0 (or for extremely small y such as 1e-300) | ||
Line 432: | Line 455: | ||
|} | |} | ||
− | For HP/Compaq Fortran you can trap infinite numbers with the following code snippets: | + | NOTES: |
+ | #The GEOS-Chem module <tt>error_mod.f</tt> contains a routine called <tt>IT_IS_FINITE( x )</tt> which will call the proper infinity-checking routine for the compiler that you are using. | ||
+ | #For PGF90 you must write a snippet of C code to call the Linux function <tt>isinf( x )</tt>. This is done for you in GEOS-Chem file <tt>linux_err.c</tt>. | ||
+ | #For HP/Compaq Fortran you can trap infinite numbers with the following code snippets: | ||
! Trap REAL*4 infinity values | ! Trap REAL*4 infinity values | ||
Line 450: | Line 476: | ||
ENDIF | ENDIF | ||
− | + | --[[User:Bmy|Bmy]] 12:01, 15 April 2008 (EDT) | |
+ | |||
+ | == Loss of significant digits under some circumstances == | ||
+ | |||
+ | [https://en.wikipedia.org/wiki/Loss_of_significance This article] describes how significant digits may be lost in floating-point calcualations under certain circumstances. | ||
+ | |||
+ | In GEOS-Chem, we have observed this phenomenon when we try to archive data to a 2-D array from within a 3-D OpenMP parallel loop. This causes different results in diagnostic output in simulation where parallelization is turned compared to a simulation with parallelization turned off. | ||
− | --[[User:Bmy|Bmy]] | + | --[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:14, 7 December 2016 (UTC) |
Latest revision as of 18:14, 7 December 2016
On this page we provide some general background information about how floating-point math is implemented on computers—and some common pitfalls that you may encounter. For information about specific bugs or numerical errors found and fixed in GEOS-Chem, please see our Numerical issues discovered in GEOS-Chem page.
Contents
Floating-point is an approximation to the real number system
The real number system may be thought of as a number line with the following characteristics:
- Zero is the primary reference point
- From zero, the line extends infinitely in both directions towards + Infinity and -Infinity
- Between any two real numbers, there are an infinite number of other real numbers
A pictorial representation of the real number system looks like this:
-Infinity <---------- 0 ---A---B----> +Infinity
where between points A and B there are an infinite number of real numbered-values.
It is important to realize that floating-point mathematics (as implemented in all modern computer systems) is never exact but is only an approximation to the real number system. In most programming languages, floating-point real numbers are composed of groups of 4 or 8 bytes. This means that floating-point numbers are not infinitely precise, but have a maximum precision. As a consequence, floating-point math operations (especially multiplication and division) can often lead to different results than one would normally anticipate.
Here are the common number types used in both IDL and Fortran:
IDL Number Type | Fortran equivalent | ESMF equivalent | Number of bytes | Approx. range |
---|---|---|---|---|
byte | BYTE | - | 1 | 0 to 255 |
fix | INTEGER*2 | ESMF_KIND_I2 | 2 | 0 - 32767 |
long | INTEGER*4 | ESMF_KIND_I4 | 4 | -214700000 to -214700000 |
float | REAL*4 | ESMF_KIND_R4 | 4 | -1e-38 to 1e+38 |
double | REAL*8 | ESMF_KIND_R8 | 8 | -1e-312 to 1e+312 |
A pictorial representation of the floating point mathematics system would look like this:
-Min Representable Value < . . . . . . 0 . . A . B . . > +Max Representable Value
Because each floating-point number type is composed of a finite number of bytes, we can only represent numbers within a given range (see table above). Therefore, it is not possible to represent numbers to -Infinity or +Infinity with floating-point mathematics. Also, in on the floating-point number line pictured above, there are a finite number of possible floating-point values between points A and B.
Here are some excellent references about floating-point math and ways to avoid pitfalls:
- The Perils of Floating Point by Bruce M. Bush
- Fun with Floating Point Arithmetic
- Fortran common pitfalls (good reference for some floating-point issues)
- Avoiding floating point errors in Fortran
Safe floating-point division
Due to the approximate nature of floating-point mathematics, you need to take special precautions when dividing by small numbers. Division by zero is of course not allowed. However, under certain circumstances, the quotient of the division might not fall in the range of representable numbers.
For example, if you divided a very big number by a very small number:
REAL*4 :: A, B A = 1e20 B = 1e-20 PRINT*, A/B
then this will result in +Infinity because the order of magnitude of the answer (1e40) is not representable with a REAL*4 variable (max value ~ 1e38).
Therefore it is adviseable to use "safe division" in certain critical computations where you might have very small denominators. The algorithm is described below:
From http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/
- Thank you to all that offered their suggestions for a "safe division" routine to prevent overflow. For those that are curious about the solution to the problem, I found useful to adopt a subroutine along the lines suggested by William Long:
if(exponent(a) - exponent (b) >= maxexponent(a) .OR. b==0)Then q=altv else q=a/b endif
- The = in the >= is to take into account the case when the fractional part of a is 1.111... and that of b is 1.
- It works very well.
- Thank you again,
- Julio.
In other words, we test if the order of magnitude of the result is in the allowable range for the given number type. If it is not, then instead of doing the division, we return the alternate value as the result.
NOTE: The alternate value that you substitute depends on the type of computation that you are doing...there is no "one-size-fits-all" substitute value.
We have implemented the above algorithm into GEOS-Chem. See routine SAFE_DIV in error_mod.f:
FUNCTION SAFE_DIV( N, D, ALTV ) RESULT( Q ) ! !****************************************************************************** ! Subroutine SAFE_DIV performs "safe division", that is to prevent overflow, ! underflow, NaN, or infinity errors. An alternate value is returned if the ! division cannot be performed. (bmy, 2/26/08) ! ! For more information, see the discussion on: http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/ ! ! Arguments as Input: ! ============================================================================ ! (1 ) N : Numerator for the division ! (2 ) D : Divisor for the division ! (3 ) ALTV : Alternate value to be returned if the division can't be done ! ! NOTES: !****************************************************************************** ! ! Arguments REAL*8, INTENT(IN) :: N, D, ALTV ! Function value REAL*8 :: Q !================================================================== ! SAFE_DIV begins here! !================================================================== IF ( EXPONENT(N) - EXPONENT(D) >= MAXEXPONENT(N) .or. D==0 ) THEN Q = ALTV ELSE Q = N / D ENDIF ! Return to calling program END FUNCTION SAFE_DIV
Testing for equality
Exact-value testing
Integers
If you are working with integer-valued numbers, then you can use the equality operator
- In Fortran: == or .eq.
- In IDL: eq
directly to test if a variable is equal to a given value, e.g.
INTEGER :: A A = 1 IF ( A == 1 ) THEN PRINT*, 'A is 1' ELSE PRINT*, 'A is not 1' ENDIF
Then you would get the result
> A is 1
This is because integers are always exactly ... -3, -2, -1, 0, 1, 2, 3 ...
Real numbers
However, if you are working with floating-point real numbers (i.e. REAL*4 or REAL*8), then you have to be a little bit more careful. Recall from the previous section that floating-point numbers are not represented with infinite precision. Therefore, if you try to do the following equality test:
REAL*8 :: A A = 1d0 / 3d0 IF ( A == 0.333d0 ) THEN PRINT*, 'A is 1/3' ELSE PRINT*, 'A is not 1/3 but is ', A ENDIF
You will get the result:
> A is not 1/3 but is 0.333333333333333
A is a repeating decimal 0.333333333333333 up to the limit of the precision of the machine (usually about 15 or 16 decimal places). However, we tested for 0.333 and not 0.333333333333333. This is why the equality test failed.
TIP: You should never use a direct equality test with floating-point real numbers!
A better way to test for equality is to use the greater-than or less-than operators
- In Fortran: >, .gt., <, .lt.
- In IDL: gt, lt
in the following manner:
REAL*8 :: A A = 1d0/3d0 IF ( A > 1d0/3d0 .or. A < 1d0/3d0 ) THEN PRINT*, 'A is not 1d0/3d0 but is ', A ELSE PRINT*, 'A is exactly 1d0/3d0' ENDIF
This will reject all other values except exactly 1d0/3d0. Running the above example gives:
> A is exactly 1d0/3d0
Avoiding zero
In many instances it is necessary to take appropriate action if a number is zero (for example, to avoid a division by zero error). A simple test is as follows:
REAL*8 :: A A = 0d0 IF ( ABS( A ) > 0d0 ) THEN PRINT*, 'A is not 0 but is ', A ELSE PRINT*, 'A is exactly 0' ENDIF
This example gives the result:
> A is exactly 0
Note that we have used the mathematical relation ABS( x ) > a, which is true if x > a or x < -a. This test will reject all numbers that are not exactly 0.
Epsilon testing
The above algorithms are useful when you need to test if a variable is EXACTLY EQUAL to a given value. However, it is often better to test whether the variable is within some tolerance of a given value. Here are some examples:
Avoiding division by zero
For example, let's say we have the following division:
REAL*8 :: X, Y, Q ! Make X artificially large, Y artificially small for demo purposes X = 1.0d+75 Y = 1.0d-250 ! Do the division if Y is not exactly zero IF ( ABS( Y ) > 0d0 ) THEN Q = X / Y ENDIF PRINT*, 'Q is : ', Q
and we are testing to prevent the denominator Y from being zero. However, when we compile and run this code, we get the result
> Q is : Infinity
Although this is a rather contrived example, it illustrates the point that the denominator of the expression does not have to be exactly zero to cause a floating point error. In this case the error check on Y failed because we were testing for Y strictly equal to zero.
In this case it is preferable to do the division only if the magnitude of Y is greater than some specified minimum value (called an epsilon value). A good choice for the epsilon is usually 1e-30 or 1e-32. We can rewrite the above code as follows:
REAL*8 :: X, Y, Q ! Epsilon value REAL*8, PARAMETER :: EPSILON = 1d-30 ! Make X artificially large, Y artificially small for demo purposes X = 1.0d+75 Y = 1.0d-250 ! Only do the division if Y does not lie within +/- EPSILON of zero ! Otherwise return a default value IF ( ABS( Y ) > EPSILON ) THEN Q = X / Y ELSE Q = -999d0 ENDIF PRINT*, 'Q is : ', Q
Which gives the result:
> Q is : -999.000000000000
In this case, notice that the "epsilon test" determined that Y was too small to be placed in the denominator of the division, and thus prevented the division from taking place. The alternate value of -999.0 was returned.
As described in the previous section, the mathematical identity ABS( X ) > EPSILON is true only if X > +EPSILON or X < -EPSILON.
(Of course, you could also use the routine SAFE_DIV in GEOS-Chem error_mod.f which will perform a safe division, as described above.)
Testing for values close to zero
If you would like to test whether a given variable lies within a specified tolerance of zero, then we can just reverse the logic. We'll test for ABS( X ) < EPSILON, as this identity is true only if -EPSILON < X < +EPSILON. For example:
REAL*8 :: X ! Epsilon value REAL*8, PARAMETER :: EPSILON = 1d-3 X = 1d-5 IF ( ABS( X ) < EPSILON ) THEN PRINT*, 'X is within +/- EPSILON of zero.' ELSE PRINT*, 'X is not within +/- EPSILON of zero.' ENDIF
which results in:
> X is within +/- EPSILON of zero.
Testing for values close to a non-zero number
The procedure for testing if values lie within a specfied tolerance of a non-zero number is similar to the above example, just that we have to test for ABS( X - VALUE ) < EPSILON.
REAL*8 :: X, VALUE ! Epsilon value REAL*8, PARAMETER :: EPSILON = 1d-3 X = 20.00001d0 ! Value of X VALUE = 20.d0 ! Target value IF ( ABS( X - VALUE ) < EPSILON ) THEN PRINT*, 'X is within +/- EPSILON of 20.' ELSE PRINT*, 'X is not within +/- EPSILON of 20.' ENDIF
which gives
> X is within +/- EPSILON of 20.
Testing for non-representable values
Recall the floating point number line from the above discussion. Each number type (e.g. INTEGER, REAL*4, REAL*8) has a minimum and maximum representable value. If we are not careful, the result of a mathematical computation may result in a value that cannot be represented for the given number type. There are two common non-representable values: Not-A-Number (aka NaN), and Infinity.
NaN (Not-a-Number)
NaN (Not-A-Number) is the IEEE designation for a value that cannot be represented for a given number type (e.g. the result of a mathematical computation that is undefined). For example, some computations that may result in NaN are:
SQRT( x ) for all x < 0 LOG( x ) for all x <= 0 LOG10( x ) for all x <= 0 ACOS( x ) for all x > 1 and x < -1 ASIN( x ) for all x > 1 and x < -1
NOTE: in some cases, a function may return a NaN or it may return +/-Infinity. It may depend on the particular implementation.
It is important to trap for NaN values, otherwise they will propagate throughout your code. For example:
any number + NaN = NaN any number - NaN = NaN any number * NaN = NaN any number / NaN = NaN
etc.
The easiest (and most portable) way to check for Nan values in the code is to implement one of the following tests:
! Scan a single value IF ( X .ne. X ) THEN PRINT*, 'X is NaN!' ! ... stop the code here ... ENDIF ! Scan an array WHERE( ARRAY .ne. ARRAY ) ARRAY = -9999d0 !Set array to a missing value of your choice ENDWHERE
Each compiler also supplies a function to trap for NaN values. Here are the NaN-trapping functions for the various compilers that are supported by GEOS-Chem:
Compiler | Function | Result |
---|---|---|
Intel Fortran Compiler (IFORT) | ISNAN( x ) | Returns T if x=NaN; F otherwise |
Sun Studio | IEEE_IS_NAN( x ) | Returns T if x=NaN; F otherwise |
Portland Group (PGF90) | none | none |
SGI-MIPS Fortran | IEEE_IS_NAN( x ) | Returns T if x=NaN; F otherwise |
HP/Compaq Fortran | ISNAN( x ) | Returns T if x=NaN; F otherwise |
NOTES:
- The functions listed in the above table take INTEGER, REAL*4, and REAL*8 arguments.
- For PGF90 you must write a wrapper in C to call the Linux isnan( x ) function. GEOS-Chem provides a file linux_err.c to do this.
- The GEOS-Chem module error_mod.f contains a routine IT_IS_NAN( x ) which will call the proper compiler-dependent NaN function.
Infinity
The values -Infinity and +Infinity are the IEEE designation for values that are, respectively, smaller than the smallest representable value or larger than the largest representable value for a given number type. For example, the computations that may return +Infinity or -Infinity are as follows:
x / y for y = 0 (or for extremely small y such as 1e-300) LOG( x ) for all x <= 0 LOG10( x ) for all x <= 0
NOTE: in some cases, a function may return a NaN or it may return +/-Infinity. It may depend on the particular implementation.
Each compiler supplies a function to trap for +Infinity or -Infinity.
Compiler | Function | Result |
---|---|---|
Intel Fortran Compiler (IFORT) | FP_CLASS( x ) | Returns the following values POS_INF if x = +Infinity |
Sun Studio | IR_FINITE( x ) for REAL*4 x | Returns if x = +/-Infinity; F otherwise |
Sun Studio | ID_FINITE( x ) for REAL*8 x | Returns if x = +/-Infinity; F otherwise |
Portland Group (PGF90) | none | none |
SGI-MIPS Fortran | IEEE_FINITE( x ) | Returns F if x = +/-Infinity; T otherwise |
HP/Compaq Fortran | none | none |
NOTES:
- The GEOS-Chem module error_mod.f contains a routine called IT_IS_FINITE( x ) which will call the proper infinity-checking routine for the compiler that you are using.
- For PGF90 you must write a snippet of C code to call the Linux function isinf( x ). This is done for you in GEOS-Chem file linux_err.c.
- For HP/Compaq Fortran you can trap infinite numbers with the following code snippets:
! Trap REAL*4 infinity values IF ( VALUE == Z'7F800000' .or. & VALUE == Z'FF800000' ) THEN IT_IS_A_FINITE = .FALSE. ELSE IT_IS_A_FINITE = .TRUE. ENDIF ! Trap REAL*8 infinity values IF ( VALUE == Z'7FF0000000000000' .or. & VALUE == Z'FFF0000000000000') THEN IT_IS_A_FINITE = .FALSE. ELSE IT_IS_A_FINITE = .TRUE. ENDIF
--Bmy 12:01, 15 April 2008 (EDT)
Loss of significant digits under some circumstances
This article describes how significant digits may be lost in floating-point calcualations under certain circumstances.
In GEOS-Chem, we have observed this phenomenon when we try to archive data to a 2-D array from within a 3-D OpenMP parallel loop. This causes different results in diagnostic output in simulation where parallelization is turned compared to a simulation with parallelization turned off.
--Bob Yantosca (talk) 18:14, 7 December 2016 (UTC)