GEOS-Chem Debugging Tips
When your run dies with an error, the most important thing is to try to isolate the place where the error occurs (i.e. does it die in chemistry, in transport, in dry deposition, etc.)
Here are a list of things that you can try if your run dies w/ an error:
Have you looked at the log file?
- Can you tell where the run stopped?
- If not, re-run with the ND70 diagnostic turned on (debug output).
- ND70 will print debug output and should let you pinpoint the location of the error
Did you run out of time in the queue?
- Did you submit your job to a queue that only has 1 hour or less of wall-clock time?
- PBS error #143 is usually the tell-tale sign of an "out-of-time" error
- If so, then submit to a queue with a longer time limit
Did you modify the standard code?
- If so, then focus on your most recent changes
- You should keep a **clean** (unmodified) version for comparison
Can you isolate the error to a particular operation?
- Can you tell if the error happens in transport, chemistry, dry dep, etc?
Does the error happen consistently?
- If the error happens at the same model date & time, it could indicate bad input data
- If it happened only once, it could be caused by a network problem or other such transient condition
Check for math errors
- Is there a division by zero, logarithm of negative number, etc?
Check for array-out-of-bounds errors
- Compilation options
- For Intel Fortran Compiler: recompile with the -CB flag. Adding -traceback may also be helpful (that will give you expanded error messages).
- For Sun Studio, PGI, SGI-MIPS: recompiler with the -C flag.
- Out-of-bounds errors can produce segmentation faults
When in doubt, print it out!
- Print out the values of variables in the area where you suspect the error lies
- Also use "call flush(6)" to flush the output buffer after writing
- Maybe you will see something wrong in the output
When all else fails, USE THE BRUTE FORCE METHOD!
- Comment out code until you find where the failure occurs