# Difference between revisions of "Floating point math issues"

(!) |
(→Testing for exact value) |
||

Line 121: | Line 121: | ||

=== Testing for exact value === | === Testing for exact value === | ||

+ | |||

+ | ==== Integers ==== | ||

If you are working with integer-valued numbers, then you can use the equality operator | If you are working with integer-valued numbers, then you can use the equality operator | ||

Line 144: | Line 146: | ||

This is because integers are always exactly +/- 0, 1, 2, 3, ... | This is because integers are always exactly +/- 0, 1, 2, 3, ... | ||

+ | |||

+ | ==== Real numbers ==== | ||

However, if you are working with floating-point real numbers (i.e. REAL*4 or REAL*8), then you have to be a little bit more careful. Recall from the [[#Floating-point math is an approximation to the real number system|previous section]] that floating-point numbers are not represented with infinite precision. Therefore, if you try to do the following equality test: | However, if you are working with floating-point real numbers (i.e. REAL*4 or REAL*8), then you have to be a little bit more careful. Recall from the [[#Floating-point math is an approximation to the real number system|previous section]] that floating-point numbers are not represented with infinite precision. Therefore, if you try to do the following equality test: |

## Revision as of 14:36, 11 April 2008

## Contents

## Floating-point is an approximation to the real number system

One important thing to remember when writing computer programs is that floating-point mathematics is never exact but is only an approximation. In most programming languages, floating-point real numbers are composed of groups of 4 or 8 bytes. This means that floating-point numbers are not infinitely precise, but have a maximum precision. As a consequence, floating-point math operations (especially multiplication and division) can often lead to different results than one would normally anticipate.

Here are the common number types used in both IDL and Fortran:

IDL Number Type | Fortran Equivalent | Number of bytes | Approx. range |
---|---|---|---|

byte | BYTE | 1 | 0 to 255 |

fix | INTEGER*2 | 2 | 0 - 32767 |

long | INTEGER*4 | 4 | 0 - 2000000000 |

float | REAL*4 | 4 | -1e-38 to 1e+38 |

double | REAL*8 | 8 | -1e-312 to 1e+312 |

Note that you cannot represent numbers to -Infinity or +Infinity. Each number type is composed of a finite number of bytes, and can only represent numbers within a given range, as listed in the above table.

Here are some excellent references about floating-point math:

## Safe floating-point division

Due to the approximate nature of floating-point mathematics, you need to take special precautions when dividing by small numbers. Division by zero is of course not allowed. However, under certain circumstances, the quotient of the division might not fall in the range of representable numbers.

For example, if you divided a very big number by a very small number:

REAL*4 :: A, B A = 1e20 B = 1e-20 PRINT*, A/B

then this will result in +Infinity because the order of magnitude of the answer (1e40) is not representable with a REAL*4 variable (max value ~ 1e38).

Therefore it is adviseable to use "safe division" in certain critical computations where you might have very small denominators. The algorithm is described below:

**From http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/**

- Thank you to all that offered their suggestions for a "safe division" routine to prevent overflow. For those that are curious about the solution to the problem, I found useful to adopt a subroutine along the lines suggested by William Long:

if(exponent(a) - exponent (b) >= maxexponent(a) .OR. b==0)Then q=altv else q=a/b endif

- The = in the >= is to take into account the case when the fractional part of a is 1.111... and that of b is 1.

- It works very well.

- Thank you again,
- Julio.

In other words, we test if the order of magnitude of the result is in the allowable range for the given number type. If it is not, then instead of doing the division, we return the alternate value as the result.

**NOTE: The alternate value that you substitute depends on the type of computation that you are doing...there is no "one-size-fits-all" substitute value.**

We have implemented the above algorithm into GEOS-Chem. See routine SAFE_DIV in error_mod.f:

FUNCTION SAFE_DIV( N, D, ALTV ) RESULT( Q ) ! !****************************************************************************** ! Subroutine SAFE_DIV performs "safe division", that is to prevent overflow, ! underflow, NaN, or infinity errors. An alternate value is returned if the ! division cannot be performed. (bmy, 2/26/08) ! ! For more information, see the discussion on: http://groups.google.com/group/comp.lang.fortran/browse_thread/thread/8b367f44c419fa1d/ ! ! Arguments as Input: ! ============================================================================ ! (1 ) N : Numerator for the division ! (2 ) D : Divisor for the division ! (3 ) ALTV : Alternate value to be returned if the division can't be done ! ! NOTES: !****************************************************************************** ! ! Arguments REAL*8, INTENT(IN) :: N, D, ALTV ! Function value REAL*8 :: Q !================================================================== ! SAFE_DIV begins here! !================================================================== IF ( EXPONENT(N) - EXPONENT(D) >= MAXEXPONENT(N) .or. D==0 ) THEN Q = ALTV ELSE Q = N / D ENDIF ! Return to calling program END FUNCTION SAFE_DIV

## Testing for equality

### Testing for exact value

#### Integers

If you are working with integer-valued numbers, then you can use the equality operator

- In Fortran:
`"==", ".eq."` - In IDL:
`"eq"`

directly to test if a variable is equal to a given value, e.g.

INTEGER :: A A = 1 IF ( A == 1 ) THEN PRINT*, 'A is 1' ELSE PRINT*, 'A is not 1' ENDIF

Then you would get the result

> A is 1

This is because integers are always exactly +/- 0, 1, 2, 3, ...

#### Real numbers

However, if you are working with floating-point real numbers (i.e. REAL*4 or REAL*8), then you have to be a little bit more careful. Recall from the previous section that floating-point numbers are not represented with infinite precision. Therefore, if you try to do the following equality test:

REAL*8 :: A A = 1d0 / 3d0 IF ( A == 0.333d0 ) THEN PRINT*, 'A is 1/3' ELSE PRINT*, 'A is not 1/3 but is ', A ENDIF

You will get the result:

> A is not 1/3 but is 0.333333333333333

A is a repeating decimal 0.33333...3 up to the limit of the precision of the machine (usually about 15 or 16 decimal places). However, we tested for 0.333 and not 0.33333...3. This is why the equality test failed.

**TIP: You should never use a direct equality test with floating-point real numbers!**

A better way to test for equality is to use the greater-than or less-than operators

- In Fortran:
`">", ".gt.", "<", ".lt."` - In IDL:
`"gt", "lt"`

in the following manner:

REAL*8 :: A A = 1d0/3d0 IF ( A > 1d0/3d0 .or. A < 1d0/3d0 ) THEN PRINT*, 'A is not 1d0/3d0 but is ', A ELSE PRINT*, 'A is exactly 1d0/3d0' ENDIF

This will reject all other values except exactly 1d0/3d0. Running the above example gives:

> A is exactly 1d0/3d0

Also, in many instances it is necessary to take appropriate action if a number is zero (for example, to avoid a division by zero error). A simple test is as follows:

REAL*8 :: A A = 0d0 IF ( ABS( A ) > 0d0 ) THEN PRINT*, 'A is not 0 but is ', A ELSE PRINT*, 'A is exactly 0' ENDIF

This example gives the result:

> A is exactly 0

Note that we have used the mathematical relation `ABS( x ) > a`, which is true if `x > a` or `x < -a`. This test will reject all numbers that are not exactly 0.

### Epsilon testing

The above algorithms are useful when you need to test if a variable is EXACTLY EQUAL to a given value. However, it is often better to test whether the variable is within some tolerance of a given value.

For example, let's say you would like to do the following division:

REAL*8 :: X, Y, Q

Q = X / Y

You could use the algorithm given in the preceding section to prevent Y from being exactly zero:

! Do the division if Y is not exactly zero IF ( ABS( Y ) > 0d0 ) THEN Q = X / Y ENDIF

However, if Y somehow ends up being an EXTREMELY small number (e.g. +/- 1e-150), then this may be sufficient to cause the division to result in a non-representable number (i.e. either Infinity or NaN). If the Q variable gets used in the denominator, logarithm, or square root of another expression further downstream, then this will cause your code to die with a floating-point error.

In this case it would be better to only do the division if the magnitude of Y is greater than some minimum value (called an "epsilon value"). A good choice for the epsilon is usually 1e-30 or 1e-32. We can rewrite the above test as follows:

! Variables REAL*8 :: X, Y, Q

! Epsilon value (force REAL*8 precision w/ the "d" exponent) REAL*8, PARAMETER :: EPSILON = 1d-30 ! Do not do the division if Y lies within +/- EPSILON of zero IF ( ABS( Y ) > EPSILON ) THEN Q = X / Y ENDIF

Once again, the mathematical expression ABS( Y ) > EPSILON is true only if Y > +EPSILON or Y < -EPSILON. This will prevent the division if Y gets too small.

--Bmy 10:35, 11 April 2008 (EDT)