# Difference between revisions of "Floating point math issues"

Line 3: | Line 3: | ||

One important thing to remember when writing computer programs is that floating-point mathematics is not exact but is an approximation. In most programming languages, floating-point real numbers are composed of groups of 4 or 8 bytes. This means that floating-point numbers are not infinitely precise, but have a maximum precision. As a consequence, floating-point math operations (especially multiplication and division) can often lead to different results than anticipated. | One important thing to remember when writing computer programs is that floating-point mathematics is not exact but is an approximation. In most programming languages, floating-point real numbers are composed of groups of 4 or 8 bytes. This means that floating-point numbers are not infinitely precise, but have a maximum precision. As a consequence, floating-point math operations (especially multiplication and division) can often lead to different results than anticipated. | ||

− | + | Here are the common number types used in both IDL and Fortran: | |

+ | {| border=1 cellpadding=5 cellspacing=0 | ||

+ | |- bgcolor="#CCCCCC" | ||

+ | ! IDL Number Type | ||

+ | ! Fortran Equivalent | ||

+ | ! Number of bytes | ||

+ | ! Approx. range | ||

+ | |- | ||

+ | | byte | ||

+ | | BYTE | ||

+ | | 1 | ||

+ | | 0 to 255 | ||

+ | |- | ||

+ | | fix | ||

+ | | INTEGER*2 | ||

+ | | 2 | ||

+ | | 0 - 32767 | ||

+ | |- | ||

+ | | long | ||

+ | | INTEGER*4 | ||

+ | | 4 | ||

+ | | 0 - 2000000000 | ||

+ | |- | ||

+ | | float | ||

+ | | REAL*4 | ||

+ | | 4 | ||

+ | | -1e-38 to 1e+38 | ||

+ | |- | ||

+ | | double | ||

+ | | REAL*8 | ||

+ | | 8 | ||

+ | | -1e-312 to 1e+312 | ||

+ | |} | ||

+ | Note that you cannot represent numbers to -Infinity or +Infinity. Each number type is composed of a finite number of bytes, and can only represent numbers within a given range. | ||

Here are some excellent references about floating-point math: | Here are some excellent references about floating-point math: |

## Revision as of 14:53, 9 April 2008

## The pitfalls of floating-point mathematics

One important thing to remember when writing computer programs is that floating-point mathematics is not exact but is an approximation. In most programming languages, floating-point real numbers are composed of groups of 4 or 8 bytes. This means that floating-point numbers are not infinitely precise, but have a maximum precision. As a consequence, floating-point math operations (especially multiplication and division) can often lead to different results than anticipated.

Here are the common number types used in both IDL and Fortran:

IDL Number Type | Fortran Equivalent | Number of bytes | Approx. range |
---|---|---|---|

byte | BYTE | 1 | 0 to 255 |

fix | INTEGER*2 | 2 | 0 - 32767 |

long | INTEGER*4 | 4 | 0 - 2000000000 |

float | REAL*4 | 4 | -1e-38 to 1e+38 |

double | REAL*8 | 8 | -1e-312 to 1e+312 |

Note that you cannot represent numbers to -Infinity or +Infinity. Each number type is composed of a finite number of bytes, and can only represent numbers within a given range.

Here are some excellent references about floating-point math:

## Safe floating-point division

Due to the approximate nature of floating-point mathematics, you need to take special precautions when dividing by small numbers. Division by zero is of course not allowed. However, if both the numerator and denominator are very small, then the quotient not fall in the range of representable numbers. For example,

**From http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/**

- Thank you to all that offered their suggestions for a "safe division" routine to prevent overflow. For those that are curious about the solution to the problem, I found useful to adopt a subroutine along the lines suggested by William Long:

if(exponent(a) - exponent (b) >= maxexponent(a) .OR. b==0)Then q=altv else q=a/b endif

- The = in the >= is to take into account the case when the fractional part of a is 1.111... and that of b is 1.

- It works very well.

- Thank you again,
- Julio.

We have implemented the above algorithm into GEOS-Chem. See routine SAFE_DIV in error_mod.f:

FUNCTION SAFE_DIV( N, D, ALTV ) RESULT( Q ) ! !****************************************************************************** ! Subroutine SAFE_DIV performs "safe division", that is to prevent overflow, ! underflow, NaN, or infinity errors. An alternate value is returned if the ! division cannot be performed. (bmy, 2/26/08) ! ! For more information, see the discussion on: http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/ ! ! Arguments as Input: ! ============================================================================ ! (1 ) N : Numerator for the division ! (2 ) D : Divisor for the division ! (3 ) ALTV : Alternate value to be returned if the division can't be done ! ! NOTES: !****************************************************************************** ! ! Arguments REAL*8, INTENT(IN) :: N, D, ALTV ! Function value REAL*8 :: Q !================================================================== ! SAFE_DIV begins here! !================================================================== IF ( EXPONENT(N) - EXPONENT(D) >= MAXEXPONENT(N) .or. D==0 ) THEN Q = ALTV ELSE Q = N / D ENDIF ! Return to calling program END FUNCTION SAFE_DIV