Run-time crashes and abnormal exits

From Geos-chem
Jump to: navigation, search

Previous | Next | Guide to GEOS-Chem error messages

  1. Understanding the different categories of errors
  2. Compile-time warnings and errors
  3. Run-time crashes and abnormal exits
  4. Segmentation faults
  5. Other less-common errors


Overview

We have been migrating bug reports to our GEOS-Chem issue tracker, which is located on our Github repository: https://github.com/geoschem/geos-chem/issues/. We recommend that you also look through both the open and closed issues on this page, as your issue might be listed there.


In this section, we provide information about some commonly-reported run-time errors that cause GEOS-Chem to halt executing.

Run-time errors originating in GEOS-Chem code

No output scheduled on last day of run

If you encounter this error at the start of your GEOS-Chem simulation:

==========================================================================
GEOS-CHEM ERROR: No output scheduled on last day of run!
STOP at IS_LAST_DAY_GOOD ("input_mod.f")
==========================================================================

This means that you have not told GEOS-Chem to save out diagnostic data on the day that your simulation ends. GEOS-Chem adds this error check in order to prevent you from running a long simulation only to have no diagnostics printed out at the end of the run.

For more information on how to schedule diagnostic output in GEOS-Chem, please see the OUTPUT MENU section of the input.geos file on the GEOS-Chem wiki.

--Bob Yantosca (talk) 15:51, 10 March 2017 (UTC)

List-directed I/O syntax error

List directed io error.png

The above error message indicates that the simulation crashed at line 871 in GeosCore/input_mod.F. This means there was an issue while reading the input.geos file (located in the run directory). For example, GEOS-Chem might have expected numeric input, but instead a character was read from input.geos, thus causing a read error. This type of error can occur the input.geos corresponds to a version of GEOS-Chem that is different from yours.

This error is not limited to input.geos; it can happen for any text file that is being read from disk (both in GEOS-Chem and in any other Fortran programs you may write).

--Bob Yantosca (talk) 15:51, 27 February 2017 (UTC)

Error reading the input.geos file

If you should encounter this type of error:

READ_INPUT_FILE: Reading input.geos
SPLIT_ONE_LINE: error at ___
Expected __ substrs but found __
STOP in SPLIT_ONE_LINE (input_mod.F)

then you are probably using an input.geos file that does not correspond to the same version as your GEOS-Chem source code directory. Please check the git history for both your code directory and unit tester directory to make sure they are for the same version (marked by tags). If necessary, update your unit tester directory and create a new run directory.

--Bob Yantosca (talk) 16:00, 28 January 2019 (UTC)

Errors reading the GEOS-Chem restart file

Please see the following posts for more information about errors that may occur when reading GEOS-Chem restart files:

  1. Error reading restart file when using a fixed emissions year in HEMCO

--Bob Yantosca (talk) 20:28, 20 December 2018 (UTC)

NetCDF: HDF Error

If you should encounter this error message:

NetCDF: HDF error

Then this usually means GEOS-Chem was trying to read an incomplete or corrupted netCDF file. The quickest solution is to re-download the netCDF file from the original source.

--Bob Yantosca (talk) 21:11, 2 January 2019 (UTC)

Permission denied error

If you receive this error:

v11-01.run: Permission denied. 

after having submitted a run script to a queue system (such as SLURM or Grid Engine), then doublecheck the Unix permissions of your v11-01.run script. If the script does not have the Unix "execute" permission then the queue system will not be able to run it.

Use the Unix chmod command to make your script executable

chmod 755 v11-01.run

and then re-submit the script to the queue system.

--Bob Yantosca (talk) 22:17, 6 January 2017 (UTC)

UCX not defined at compile time

Executable run pg4.png

This error may occur if you are compiling GEOS-Chem (v11-01 and prior versions) and directly from the source code directory and not a run directory. Whenever you compile GEOS-Chem in the source code directory, you must remember to use the following Makefile options:

CHEM=Standard UCX=y   # Standard simulation (used for benchmarking GC)

CHEM=UCX UCX=y        # UCX simulation (i.e. Standard simulation w/o SOA species)

To avoid these errors, we STRONGLY RECOMMEND that you always compile GEOS-Chem from a run directory. This will ensure that the Makefile switches relevant to your simulation will always be activated. For more information, please see these wiki posts:

NOTE: In GEOS-Chem v11-02 and higher versions, the Makefile option UCX=y will automatically be set whenever you select CHEM=Standard or CHEM=UCX.

--Bob Yantosca (talk) 15:43, 10 April 2017 (UTC)

Floating invalid or floating-point exception error

You can check for several common floating-point math errors by compiling with the FPEX=y option. This will halt the simulation with an error message such as:

 forrtl: error (65): floating invalid    # Error message from Intel Fortran Compiler
 
 Floating point exception (core dumped)  # Error message from GNU Fortran Compiler

This error typically means that a division-by-zero occurred, or a NaN value was encountered in one of your variables.

A common way to prevent these types of errors is to ensure "safe" divisions (i.e. to make sure that the denominator is nonzero). You can do this manually with an IF statement, or use the routines SAFE_DIV or IS_SAFE_DIV in GeosCore/error_mod.F.

--Bob Yantosca (talk) 14:58, 11 October 2017 (UTC)

KPP "Step size too small" error

The following abnormal exit from the KPP chemical solver:

Forced exit from Rosenbrock due to the following error:
--> Step size too small: T + 10*H = T or H < Roundoff
T=   3044.21151383269      and H=  1.281206877135470E-012

...
       1
Forced exit from Rosenbrock due to the following error:
--> Step size too small: T + 10*H = T or H < Roundoff
T=   3044.21151383269      and H=  1.281206877135470E-012
failed twice !!!

indicates that the chemistry could not converge to a solution in the given grid box. Possible reasons for this could be:

  1. A particular tracer has numerically underflowed or overflowed. This can happen especially in the aerosol chemistry and equilibrium routines, where many exponentials and logarithms are used in the algorithms.
  2. The restart file is not appropriate for the given simulation. For example, if the restart file was created using the Synoz O3 flux boundary condtion, but you have turned on the Linoz stratospheric O3 chemistry, then this mismatch can cause the solver not to converge. You can try switching to a restart file generated from a simulation with the same input options as the simulation that you wish to perform.

You may have to manually adjust the convergence criteria in the GEOS-Chem code to fix this condition.

--Bob Y. 11:30, 9 November 2010 (EST)

Mixed file access modes error

This error is particular to the Intel Fortran Compiler. You might encounter this in conjunction with the ND49 timeseries diagnostic.

According to the Intel website:

severe (31): Mixed file access modes

FOR$IOS_MIXFILACC. An attempt was made to use any of the following combinations:
* Formatted and unformatted operations on the same unit
* An invalid combination of access modes on a unit, such as direct and sequential
* An Intel® Fortran RTL I/O statement on a logical unit that was opened by a program coded in another language

Here are a few suggestions to try if you haven’t already:

  1. Make sure you don’t already have a ND49 file in the location that you’re writing. In other words, make sure GEOS-Chem isn’t trying to write to a file that already exists.
  2. Do “make realclean”, recompile, and run again to see if the error is persistent.
  3. Are you using an out-of-the-box version of the code or a modified version? If the latter, do you still get this error with a “clean” copy of the code?
  4. Is there a particular READ/WRITE format statement that is causing the problem? You could try compiling with BOUNDS=y TRACEBACK=y FPE=y DEBUG=y and running in totalview (do “module load totalview” first on Odyssey) to locate the problematic line.

--Bob Yantosca (talk) 19:07, 12 April 2017 (UTC)

Negative tracer found in WETDEP

If your simulation encounters negative (or NaN) tracer concentrations in the WETDEP routine, then this can be an indication of a problem further upsteam, perhaps in the aerosol routines (highly probable if the tracer is SO4, SO4s, HNO3, SO2, or NH3). We have fixed some of these bugs by making the code more robust. If you are using a GEOS-Chem version prior to v8-01-01, then you should get ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/ these patches. (These patches have been added to the standard GEOS-Chem code in versions higher than v8-01-01.) Please see the following links for more information:

If the fixes above do not solve your problem, you will need to debug. The first step is to use few calls to CHECK_STT (from tracer_mod.f) to isolate the part of the code where negative tracers are created. This can be done quite fast if the code dies early enough in the run.

--Bob Y. 12:50, 15 July 2011 (EDT)

Run time errors originating in the HEMCO emissions component

HEMCO Error: Cannot find file for current simulation time

If you see an error such as this in your HEMCO.log file:

HEMCO ERROR: Cannot find file for current simulation time: ./GEOSChem.Restart.17120701_0000z.nc4 - Cannot get field SPC_NO. 
Please check file name and time (incl. time range flag) in the config. file

Then this can have a couple of causes:

  1. HEMCO cannot find the file because it is missing on disk.
    • HEMCO will try to look back in time starting with the current year and going all the way back to the year 1712 or 1713. So if you see 1712 or 1713 in the error message, that is a tip-off that the file is missing.
  2. HEMCO cannot find an expected variable name within a file.

--Bob Yantosca (talk) 20:28, 20 December 2018 (UTC)

HEMCO Run Error

Errors messages containing "HCO" originate in the HEMCO emissions component. For example:

 ===============================================================================
 GEOS-CHEM ERROR: HCO_RUN
 STOP at HCOI_GC_RUN (hcoi_gc_main_mod.F90)
 ===============================================================================

Additional helpful diagnostic information can be found in the HEMCO log file, which is usually named HEMCO.log.

--Chris Holmes (talk) 15:42, 24 June 2015 (UTC)

Updated error message for v11-01

In GEOS-Chem v11-01 and higher versions, additional text instructs the user to also check the HEMCO log file.

 ===============================================================================
 GEOS-CHEM ERROR: HCO_RUN 
 
 HEMCO ERROR: Please check the HEMCO log file for error messages!
 
 STOP at HCOI_GC_RUN (GeosCore/hcoi_gc_main_mod.F90)
 ===============================================================================

--Bob Yantosca (talk) 20:50, 6 January 2017 (UTC)

HEMCO time stamps may be wrong

HEMCO reads the files but gives zero emissions and shows the following time step error:
HEMCO WARNING: ncdf reference year is prior to 1901 - time stamps may be wrong!
--> LOCATION: GET_TIMEIDX (hco_read_std_mod.F90)

Lizzie Lundgren wrote:

That HEMCO error occurs if the reference time for the netCDF file time dimension is prior to 1901. If you do ncdump –c filename you will be able to see the metadata for the time dimension as well as the time variable values. The time units should include the reference date.

You can get around this issue by changing the reference time within the file. You can do this with CDO (climate data operators) using the setreftime command.

Here is a bash script example (by GCST member Melissa Sulprizio) that updates the calendar and reference time for all files ending in *.nc within a directory. Support team member developed this for a user very recently who ran into the same issue. In that case the first file was for Jan 1, 1950, so that was made the new reference time. I would recommend doing the same for your dataset so that the first time variable value would be 0. This script also compresses the file which we recommend doing.

     #!/bin/bash
     
     for file in *nc; do
        echo "Processing $file"
        cdo setcalendar,standard $file tmp.nc
        mv tmp.nc $file
        cdo setreftime,1950-01-01,0 $file tmp.nc
        mv tmp.nc $file
        nccopy -d1 -c "time/1" $file tmp.nc
        mv tmp.nc $file
     done
After you update the file you can then again do ncdump –c filename to check the time dimension. For the case above it looks like this after processing.
             double time(time) ;
                     time:standard_name = "time" ;
                     time:long_name = "time" ;
                     time:bounds = "time_bnds" ;
                     time:units = "days since 1950-01-01 00:00:00" ;
                     time:calendar = "standard" ;
             . . .
     time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 396, 424, 
          455, 485, 516, 546, 577, 608, 638, 669, 699, 730, 761, 790, 821, 851,
          882, 912, 943, 974, 1004, 1035, 1065, 1096, 1127, 1155, 1186, 1216, 1247, etc

--Bob Yantosca (talk) 21:33, 11 April 2018 (UTC)



Previous | Next | Guide to GEOS-Chem error messages