Difference between revisions of "Common GEOS-Chem error messages"

From Geos-chem
Jump to: navigation, search
(KPP "Step size too small" error)
(Redirected page to Guide to GEOS-Chem error messages)
 
(229 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Here is a list of some commonly-encountered GEOS-Chem error messages.
+
#REDIRECT [[Guide to GEOS-Chem error messages]]
 
+
== File I/O errors ==
+
 
+
=== A3 met fields not found ===
+
 
+
If you encounter a "file not found" error in A3_READ_MOD very near to the beginning of a GEOS-Chem simulation, i.e.:
+
 
+
  $$ Finished Reading Linoz Data $$
+
 
+
===============================================================================
+
GEOS-CHEM I/O ERROR    29 in file unit    73
+
Encountered at routine:location open_a3_fields:1
+
+
Error  29: File not found
+
===============================================================================
+
      - CLEANUP: deallocating arrays now...
+
 
+
then this is more than likely caused by the MEGAN biogenic emissions.  MEGAN keeps a 10-day running average of temperature, and therefore requires that the the met field files for the 10 days prior to the start of the GEOS-Chem simulation be present on disk.
+
 
+
You can solve this error in one of two ways:
+
 
+
# Make sure you have the previous 10 days (or better yet, the entire previous month!) of data prior to your GEOS-Chem simulation's starting date
+
# Start your GEOS-Chem simulation at a later date
+
 
+
--[[User:Bmy|Bob Y.]] 13:06, 4 November 2010 (EDT)
+
 
+
=== I/O Error #29 ===
+
 
+
This error indicates that GEOS-Chem cannot find the proper [[#A3 met fields not found|A3 met field file]].
+
 
+
'''''NOTE: Error #29 is specific to the [[Intel Fortran Compiler|IFORT compiler]].  If you are using a different compiler, then the I/O error number may differ.'''''
+
 
+
--[[User:Bmy|Bob Y.]] 15:30, 3 November 2010 (EDT)
+
 
+
=== Problem reading binary punch file ===
+
 
+
If you are having problems reading a binary punch file into GEOS-Chem, make sure that you have the [[Reading_binary_files_in_IDL#.22Big_Endian.22_vs._.22Little_Endian.22_byte_ordering|correct endian setting]] in your makefile.  These are:
+
 
+
* Intel Fortran compiler (IFORT): <tt>-convert big_endian</tt>
+
* PGI compiler: <tt>-byteswapio</tt>
+
* Sun Studio compiler: <tt>-xfilebyteorder=big16:%all</tt>
+
 
+
Most machines that use an Intel or AMD chipset are little-endian machines.  A few of the older architectures (e.g. Cray, SGI Origin) are big-endian.  Binary punch files are always big-endian (due to historical reasons), so you will need to tell your compiler to do the byte swapping manually.
+
 
+
The symptoms of such an error can be as follows:
+
 
+
'''''Daewon Byun wrote:'''''
+
 
+
:In the <tt>SUBROUTINE READ_BPCH2</tt>, It reads the <tt>FTI = CTM bin 02</tt>  fine, but then fails to read anything after. I dumped the IOUNIT and IO error code -- as you see TMP_TITLE is empty....
+
 
+
    IUNIT, IOS, TMP_TITLE =            98          -1
+
 
+
:Then the program stops at the
+
 
+
    IF ( IOS /= 0 ) THEN
+
      PRINT*, 'open_bpch2_for_read:2'
+
      STOP
+
    ENDIF
+
 
+
:If I force to read further removing the "STOP", then I get (again, I tried to dump..)
+
 
+
    MODELNAME, LONRES, LATRES, HALFPOLAR, CENTER180 =  0.000000    0.000000      0        0
+
 
+
--[[User:Bmy|Bob Y.]] 09:57, 24 July 2008 (EDT)
+
 
+
=== Module file cannot be read ===
+
 
+
If you should encounter this type of error:
+
 
+
ifort -cpp -w -O2 -auto -noalign -convert big_endian -openmp -Dmultitask -c time_mod.f
+
fortcom: Error: time_mod.f, line 259: This module file was generated for a different
+
platform or by an incompatible compiler or compiler release. It cannot be read.  [JULDAY_MOD]
+
      USE JULDAY_MOD, ONLY : JULDAY, CALDATE
+
 
+
Then this means that you are trying to link to previously-created <tt>*.mod</tt> files that were generated by a different compiler.  Making clean and re-compiling from scratch should solve this problem.
+
 
+
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
+
 
+
=== Error when reading the "restart_gprod_aprod" file ===
+
 
+
'''''Eric Leibensperger wrote:'''''
+
 
+
:I am trying to run GEOS-Chem and have encountered and error. The log file gives me this:
+
 
+
  ===============================================================================
+
  GEOS-CHEM ERROR: No matches found for file restart_gprod_aprod.2001070100!
+
  STOP at READ_BPCH2 (bpch2_mod.f)!
+
  ===============================================================================
+
 
+
:I have the [[Secondary_organic_aerosols#The restart_gprod_aprod.YYYYMMDDhh file|aerosol restart file]] (with the same name) in my <tt>~/testrun/runs/run.v7-04-12/</tt> folder. Is it looking for it elsewhere? I get an additional message in the log.error file, but I think that it is possibily the result of not being able to find the file above:
+
 
+
  ******  FORTRAN RUN-TIME SYSTEM  ******
+
  Error 1183:  deallocating an unallocated allocatable array
+
  Location:  the DEALLOCATE statement at line 4933 of "carbon_mod.f"
+
  Abort
+
 
+
:Any thoughts would be appreciated. Sorry to bother you with this!
+
:Eric
+
 
+
'''''Philippe Le Sager replied:'''''
+
 
+
:You must rewrite your <tt>restart_gprod_aprod.YYYYMMDDHH</tt> so that the date in the filename is the same as the one in the datablock header.
+
 
+
:I wrote a routine to do that: <tt>~phs/IDL/dvpt/various_rewrite/rewrite_agprod.pro</tt>
+
:-Philippe
+
 
+
NOTE: The file rewrite_agprod.pro will be released in the next GAMAP version.
+
 
+
--[[User:Bmy|Bmy]] 15:59, 9 May 2008 (EDT)
+
 
+
==== For GEOS-Chem v8-03-01 and higher ====
+
 
+
In GEOS-Chem v8-03-01 and higher, the <tt>restart_gprod_aprod.YYYYMMDDhh</tt> file has been renamed to <tt>soaprod.YYYYMMDDhh</tt>.  The [[#Error when reading the .22restart_gprod_aprod.22_file|above-described error]] can occur with the <tt>soaprod.YYYYMMDDhh</tt> file if the date in the file does not match the starting date of your simulation.
+
 
+
--[[User:Bmy|Bob Y.]] 13:10, 4 November 2010 (EDT)
+
 
+
=== File ann_mean_trop.geos5.* not found ===
+
 
+
If you are running a GEOS-5 simulation and get an error that says that GEOS-Chem cannot locate the <tt>ann_mean_trop.geos5.2x25</tt> or <tt>ann_mean_trop.geos5.4x5</tt> file, then make sure that the following option is set in your <tt>input.geos</tt> file. 
+
 
+
Use variable tropopause?: T
+
 
+
Starting in version [[GEOS-Chem versions under development#v7-04-12|GEOS-Chem v7-04-12]], GEOS-Chem can now use a variable tropopause (i.e. chemistry is done up to the location of the actual tropopause as diagnosed from the met fields at any given timestep).  '''''You cannot use the annual mean tropopause for GEOS-5.''''' 
+
 
+
--[[User:Bmy|Bob Y.]] 15:18, 7 July 2008 (EDT)
+
 
+
=== Problem reading GEOS-4 TROPP files ===
+
 
+
Please see [[Dynamic tropopause#Problem reading GEOS-4 TROPP files|this wiki post]] for more information about a common problem that can occur if you using GEOS-4 meteorology with the [[Dynamic tropopause|dynamic tropopause]].
+
 
+
== Crashes or abnormal exits ==
+
+
=== KPP "Step size too small" error ===
+
 
+
The following abnormal exit from the [[KPP solvers FAQ|KPP chemical solver]]:
+
 
+
    - PHYSPROC: Trop chemistry at 2006/07/01 00:00
+
Forced exit from Rosenbrock due to the following error:
+
--> Step size too small: T + 10*H = T or H < Roundoff
+
T=  3044.21151383269      and H=  1.281206877135470E-012
+
+
...
+
+
JLOOP, I, J, L        7231          31          51          1
+
Forced exit from Rosenbrock due to the following error:
+
--> Step size too small: T + 10*H = T or H < Roundoff
+
T=  3044.21151383269      and H=  1.281206877135470E-012
+
failed twice !!!
+
+
indicates that the chemistry could not converge to a solution in the given grid box.  Possible reasons for this could be:
+
 
+
# A particular tracer has numerically underflowed or overflowed.  This can happen especially in the aerosol chemistry and equilibrium routines, where many exponentials and logarithms are used in the algorithms.
+
# The restart file is not appropriate for the given simulation.  For example, if the restart file was created using the Synoz O3 flux boundary condtion, but you have turned on the Linoz stratospheric O3 chemistry, then this mismatch can cause the solver not to converge.  Using a different restart file should solve the problem.
+
 
+
In order to fix this condition, you may have to [[KPP_solvers_FAQ#How_do_I_choose_the_absolute_and_relative_tolerance.3F|manually adjust the convergence criteria]] in the GEOS-Chem code.
+
 
+
--[[User:Bmy|Bob Y.]] 11:30, 9 November 2010 (EST)
+
 
+
=== Permission denied error ===
+
 
+
If you receive this error:
+
 
+
Thu Nov  4 11:03:57 EDT 2010
+
run.geos: Permission denied.
+
 
+
after having submitted a [http://acmg.seas.harvard.edu/geos/doc/man/chapter_6.html#6.2.4 run script to a queue system] (such as SGE), then doublecheck the Unix permissions of your script.  If the script does not have the Unix "execute" permission then the queue system will not be able to run it.
+
 
+
Use the Unix chmod command to make your script executable
+
 
+
chmod 755 run.geos
+
 
+
and then re-submit the script to the queue system.
+
 
+
--[[User:Bmy|Bob Y.]] 10:26, 9 November 2010 (EST)
+
 
+
=== Error computing F_OF_PBL ===
+
 
+
'''''NOTE: The same error described below can cause GEOS-Chem to die elsewhere, and not just in the F_OF_PBL computation.'''''
+
 
+
If you should encounter this error (which occurs in routine <tt>COMPUTE_PBL_HEIGHT</tt> from <tt>pbl_mix_mod.f</tt>)
+
 
+
  bad sum at:            1          70  -1.00000000000000   
+
  bad sum at:            1          81  -1.00000000000000   
+
===============================================================================
+
GEOS-CHEM ERROR: Error in computing F_OF_PBL!
+
STOP at COMPUTE_PBL_HEIGHT ("pbl_mix_mod.f")
+
===============================================================================
+
 
+
===============================================================================
+
GEOS-CHEM ERROR: Error in computing F_OF_PBL!
+
STOP at COMPUTE_PBL_HEIGHT ("pbl_mix_mod.f")
+
===============================================================================
+
      - CLEANUP: deallocating arrays now...
+
  bad sum at:            1          48  -1.00000000000000   
+
  bad sum at:            1          59  -1.00000000000000   
+
  bad sum at:          73          46  -1.00000000000000   
+
 
+
you should check to see if your code was not recompiled cleanly.  Often this is a result of not doing a
+
 
+
make realclean
+
 
+
before trying to switch from 2&deg; x 2.5&deg; to 4&deg; x 5&deg; or vice versa in the <tt>define.h</tt> header file.  Often this type of error can be deduced by looking at the GEOS-Chem log file.  When this error occurs the resolution indicated at the top of the GEOS-Chem log file, e.g.:
+
 
+
*************  S T A R T I N G  2 x 2.5  G E O S--C H E M  *************
+
 
+
will not match the longitudes and latitudes as reported a little further down in the log file:
+
 
+
Grid box longitude centers [degrees]:
+
-180.000 -175.000 -170.000 -165.000 -160.000 -155.000 -150.000 -145.000
+
  ...
+
+
Grid box latitude centers [degrees]:
+
  -89.000  -86.000  -82.000  -78.000  -74.000  -70.000  -66.000  -62.000
+
  ...
+
 
+
--[[User:Bmy|Bob Y.]] 12:13, 25 October 2010 (EDT)
+
 
+
=== Segmentation fault encountered after TPCORE initialization ===
+
 
+
On the [[Machine issues & portability#IFORT|IFORT Compiler]], there is a known issue with the the <tt>glibc</tt> library that can make your code appear to have run out of memory even though there is plenty of memory available.  The symptom is that your code will die with a segmentation fault right after the following text is printed:
+
 
+
NASA-GSFC Tracer Transport Module successfully initialized
+
 
+
This error may not occur if you are running GEOS-Chem at 4 x 5 resolution.  Users have reported that this error typically occurs when running 2 x 2.5 simulation. 
+
 
+
The workaround is to reset your stacksize limit to a large positive number instead of "unlimited".  [[Intel Fortran Compiler#Resetting stacksize for Linux|See the following discussion on our Intel Fortran Compiler wiki page.]]
+
 
+
--[[User:Bmy|Bob Y.]] 10:55, 27 October 2010 (EDT)
+
 
+
=== Negative tracer found in WETDEP ===
+
 
+
If your simulation encounters negative (or NaN) tracer concentrations in the WETDEP routine, then this can be an indication of a problem further upsteam, perhaps in the aerosol routines (highly probable if the tracer is SO4, SO4s, HNO3, SO2, or NH3). We have fixed some of these bugs by making the code more robust. Patches are at [ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/ ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/]. Please see the following links for more information:
+
 
+
*[[GEOS-5 issues#Small_negative_RH_value_in_20060206.a6.2x25_file|Negative tracer due to negative RH values in the met field data]]
+
*[[Wet deposition#Negative_tracer_in_routine_WETDEP|Negative tracer in routine WETDEP]]
+
*[[Aerosol_thermodynamical_equilibrium#Run_dies_in_RPMARES_unexpectedly|A bug in RPMARES]] was also leading to a crash in WETDEP.
+
 
+
If the fixes above do not solve your problem, you will need to debug. The first step is to use few calls to CHECK_STT (from <tt>tracer_mod.f</tt>) to isolate the part of the code where negative tracers are created. This can be done quite fast if the code dies early enough in the run.
+
 
+
=== Fatal error in IFORT ===
+
 
+
The following error, which resulted on the Altix platform using Intel "ifort" v9.1 compiler:
+
 
+
  ifort: error: /opt/intel/fc/9.0/bin/fpp: core dumped
+
  ifort: error: Fatal error in /opt/intel/fc/9.0/bin/fpp,
+
  terminated by unknown signal(139)
+
  make: *** [transport_mod.o] Error 1
+
 
+
was caused by an omitted " in an #include declaration, i.e.
+
 
+
  #    include "CMN_DIAG
+
 
+
Adding the closing " fixed the problem.
+
 
+
=== Too many levels in photolysis code ===
+
 
+
Please see [[Photolysis mechanism#Too many levels in photolysis code|this discussion about the "Too many levels in photolysis code" error]] that can sometimes happen in the FAST-J photolysis code.
+
 
+
--[[User:Bmy|Bob Y.]] 11:09, 12 January 2010 (EST)
+
 
+
== General types of errors ==
+
 
+
=== Exit Status ===
+
 
+
If you submit a job to the PBS Queue, the exit status will tell you a little bit about what happened.  The exit status code is in the email that you will get back from the PBS server:
+
 
+
  PBS Job Id: 6015.altix
+
  Job Name:  test.sh
+
  Execution terminated
+
  Exit_status=1
+
  resources_used.cpupercent=0
+
  resources_used.cput=00:00:00
+
  resources_used.mem=2304kb
+
  resources_used.ncpus=1
+
  resources_used.vmem=4544kb
+
  resources_used.walltime=00:00:00
+
 
+
Jobs that finish normally should have an exit status of 0.  In the above example, we have an exit status of 1, which means that the job encountered some kind of error (either a compilation error or a run-time error).
+
 
+
A common exit status number is 143.  This means that your job has exceeded the wall-clock time limit of the [[wiki:geos-chem:queues|queue]].  The solution is to restart your job in a queue with a longer time limit.
+
 
+
=== Segmentation Fault ===
+
 
+
A segmentation fault may mean:
+
 
+
* You have gone outside the declared bounds of an array
+
* You are trying to access an ALLOCATABLE array that hasn't been allocated yet
+
* You are trying to read data from a file into an array of the wrong dimensions
+
 
+
==== Example 1 ====
+
 
+
Here is segmentation fault caused by the code trying read data from a file into an array which is too small to contain the data.  This error was detected on the Altix platform.  The output comes from the IDB debugger. 
+
 
+
  Thread received signal SEGV
+
  stopped at [<opaque> for_read_seq_xmit(...) 0x40000000006b6500]
+
 
+
  Information:  An <opaque> type was presented during execution of
+
  the previous command.  For complete type information on this symbol,
+
  recompilation of the program will be necessary.  Consult the compiler
+
  man pages for details on producing full symbol table information using 
+
  the '-g' (and '-gall' for cxx) flags.
+
 
+
==== Example 2 ====
+
 
+
The same error message as in Example 1 has also been known to occur on Altix when a variable that has not been initialized is passed to a subroutine or function:
+
 
+
  CALL MYSUB( I )  ! I is uninitialized!
+
  DO I = 1, IIPAR
+
    ...
+
  ENDDO
+
 
+
The solution to the problem is as follows:
+
 
+
  DO I = 1, IIPAR
+
    CALL MYSUB( I )  ! Put the MYSUB call in the I-loop!
+
    ...
+
  ENDDO
+
 
+
=== Bus Error ===
+
 
+
A bus error usually means that you are trying to call a subroutine with the wrong number of arguments. 
+
 
+
On SGI, you usually will only get a bus error if you call a subroutine with too many, rather than too few arguments.  Other behaviors are platform and compiler dependent.
+
 
+
--[[User:Bmy|Bob Y.]] 13:01, 4 November 2010 (EDT)
+

Latest revision as of 20:30, 14 June 2019