Difference between revisions of "Common GEOS-Chem error messages"

From Geos-chem
Jump to: navigation, search
(Obsolete error messages)
(Redirected page to Guide to GEOS-Chem error messages)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
----
+
#REDIRECT [[Guide to GEOS-Chem error messages]]
----
+
<big><strong>GEOS-Chem v11-02-final</strong> '''will also carry the designation''' <strong>GEOS-Chem 12.0.0</strong>'''.'''  We are migrating to a purely numeric versioning system in order to adhere more closely to software development best practices. For a complete description of the new versioning system, please see [[GEOS-Chem version numbering system|our ''GEOS-Chem version numbering system'' wiki page]].</big>
+
----
+
----
+
 
+
 
+
Here is a list of some commonly-encountered GEOS-Chem error messages.  Also be sure to visit our [[Machine issues & portability|Machine issues and portability]] wiki page for a list of compiler-specific issues.
+
 
+
== Overview ==
+
 
+
== Compile-time warnings and errors ==
+
 
+
In this section we discuss some compilation warnings that you may encounter. 
+
 
+
*'''Warnings''' are not generally fatal&mdash;GEOS-Chem will usually continue to compile while an informational message is displayed.
+
*'''Errors''', on the other hand, will halt the GEOS-Chem compilation process.
+
 
+
=== Error in opening the compiled module file ===
+
 
+
RnPbBe_mod.F(118): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.  [CHEMGRID_MOD]
+
      USE CHEMGRID_MOD,      ONLY : ITS_IN_THE_STRATMESO
+
----------^
+
RnPbBe_mod.F(118): error #6580: Name in only-list does not exist.  [ITS_IN_THE_STRATMESO]
+
      USE CHEMGRID_MOD,      ONLY : ITS_IN_THE_STRATMESO
+
-------------------------------------^
+
compilation aborted for RnPbBe_mod.F (code 1)
+
 
+
This error message usually indicates indicates a typo in the Makefile dependencies section.  For example, the above error was caused by this Makefile code
+
 
+
<span style="color:red">rnpbbe_mod.o</span>              : <span style="color:blue">RnPbBe_mod.F</span>                                  \
+
                              chemgrid_mod.o          diag_mod.o            \
+
                              hco_interface_mod.o
+
 
+
because the capitalization of the <span style="color:red">object file name</span> did not exactly correspond to the <span style="color:blue">source code file name</span>. The compilation process died with the above error because it was expecting to find a module file named <tt>RnPbBe_mod.o</tt> but could not.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 16:19, 27 February 2017 (UTC)
+
 
+
=== Module file cannot be read ===
+
 
+
If you should encounter this type of error:
+
 
+
ifort -cpp -w -O2 -auto -noalign -convert big_endian -openmp -Dmultitask -c time_mod.f
+
fortcom: Error: time_mod.f, line 259: This module file was generated for a different
+
platform or by an incompatible compiler or compiler release. It cannot be read.  [JULDAY_MOD]
+
      USE JULDAY_MOD, ONLY : JULDAY, CALDATE
+
 
+
Then this means that you are trying to link to previously-created <tt>*.mod</tt> files that were generated by a different compiler.  Making clean and re-compiling from scratch should solve this problem.
+
 
+
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
+
 
+
=== Intel Fortran 17 compilation error (also can happen with GNU Fortran) ===
+
 
+
[[Image:Ifort 17 error.png]]
+
 
+
If you are using [[GEOS-Chem v10-01]] with the [[Intel Fortran Compiler]] version 17 (or with any recent version of the [[GNU Fortran compiler]]) then you might encounter the above error.  These newer compilers are much stricter in their interpretations of the Fortran standard, and thus do not like the way that a couple of routines in the <tt>NcdfUtil</tt> and <tt>HEMCO/</tt> folders have been written.
+
 
+
The best way to solve this error is to upgrade to the [[GEOS-Chem v11-01]] public release.  In this version, the offending code has been rewritten in such a way that it is compatible with the newest versions of the Intel Fortran Compiler and GNU Fortran Compiler.  For complete instructions on how to downloaad GEOS-Chem v11-01, please visit the [http://manual.geos-chem.org GEOS-Chem Online User's Guide].
+
 
+
For more information about this bug, please see [[Intel_Fortran_Compiler#Cannot_compile_GEOS-Chem_v10-01_with_Intel_Fortran_Compiler_v17|this post on our ''Intel Fortran Compiler'' wiki page]].
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 16:17, 28 February 2017 (UTC)
+
 
+
=== f77: command not found ===
+
 
+
/bin/bash: f77: command not found
+
...
+
make[1]: Leaving directory 'merra2_05x0625_CH4_na'
+
../Makefile_header.mk:321: *** "Select a compiler: COMPILER=ifort, COMPILER=pgi, COMPILER=gfortran".  Stop.
+
make[1]: *** [all] Error 2
+
make: *** [mp] Error 2
+
 
+
The above error happens when GEOS-Chem cannot determine the name of your Fortran compiler.  To fix this issue, make sure to specify the proper Unix environment variables in your <tt>.bashrc</tt> or <tt>.cshrc</tt> startup script.  For complete instructions, please see our [[Setting Unix environment variables for GEOS-Chem#Environment_variables_that_specify_compiler_names|''Environment variables that specify compiler names'' wiki post]].
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 17:23, 10 March 2017 (UTC)
+
 
+
=== Internal threshold was exceeded ===
+
 
+
This warning is specific to the [[Intel Fortran Compiler]].  It usually happens when you try to optimize a complex module or subroutine.  Please see [http://software.intel.com/en-us/articles/internal-threshold-was-exceeded this post on the software.intel.com site] for a full explanation.
+
 
+
--[[User:Bmy|Bob Y.]] 15:32, 22 August 2012 (EDT)
+
 
+
=== GNU Fortran internal compiler error ===
+
 
+
f951: internal compiler error: in read_module, at fortran/module.c:5090
+
0x6251ee read_module
+
        ../.././gcc/fortran/module.c:5090
+
0x6251ee gfc_use_module
+
        ../.././gcc/fortran/module.c:6980
+
0x626896 gfc_use_modules()
+
        ../.././gcc/fortran/module.c:7104
+
 
+
  ... etc ...
+
+
Please submit a full bug report,
+
  with preprocessed source if appropriate.
+
Please include the complete backtrace with any bug report.
+
See <http://gcc.gnu.org/bugs.html> for instructions.
+
make[6]: *** [flexchem_mod.o] Error 1
+
make[6]: *** Waiting for unfinished jobs....
+
 
+
If you should encounter an error similar to the one shown above when using the [[GNU Fortran compiler]], the easiest solution is to rebuild GEOS-Chem from scratch. 
+
 
+
Type:
+
 
+
make realclean
+
 
+
which will remove all previously compiled modules and object files.  If you are compiling from a GEOS-Chem run directory, you can also type:
+
 
+
make superclean
+
 
+
which will also remove any output files generated by GEOS-Chem.  (Use this option with caution!)
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:00, 20 November 2017 (UTC)
+
 
+
== Compile-time errors caused by compiler bugs ==
+
 
+
A few GEOS-Chem errors have been traced to bugs in the compiler that was used to build the GEOS-Chem executable.  For your convenience, we have collated a list of these issues.  Please see our [[Known issues caused by compiler bugs|our ''Known issues caused by compiler bugs'' wiki page]] for more information.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:13, 13 April 2016 (UTC)
+
 
+
=== Failed in XMAP_R4R4 error ===
+
 
+
If you are using the Intel Fortran Compiler 15, then you may encounter an error such as this:
+
 
+
forrtl: severe (408): fort: (2): Subscript #1 of the array LON2 has value 1 which is greater than the upper bound of -1
+
+
Image              PC                Routine            Line        Source           
+
libifcoremt.so.5  00002B9EFA2188D3  Unknown              Unknown  Unknown
+
geos.mp            00000000011FCE35  regrid_a2a_mod_mp        1914  regrid_a2a_mod.F90
+
libiomp5.so        00002B9EFB70A8A3  Unknown              Unknown  Unknown
+
 
+
'''''Cause:''''' [https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/585242 A compiler bug in Intel Fortran Compiler version 15].
+
 
+
'''''Solution:''''' If you are using array-out-of-bounds checking, make sure to compile GEOS-Chem with these flags: <tt>BOUNDS=y DEBUG=y</tt>.  For more information, see [[HEMCO#IFORT_15_error_when_using_array-out-of-bounds_error_checking|this post on our ''HEMCO'' wiki page]].
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 17:18, 25 January 2016 (UTC)
+
 
+
=== Seg fault in HEMCO with Intel Fortran 12, 13, or 14 ===
+
 
+
In GEOS-Chem versions from [[GEOS-Chem v10-01|v10-01]] up to [[GEOS-Chem v11-01#v11-01j|v11-01j]], this error can occur with versions 12, 13, and 14 of the [[Intel Fortran Compiler]].
+
 
+
  forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
  Image              PC                Routine            Line        Source           
+
  ...
+
  geos              0000000000ADDCF4  hcoio_dataread_mo        2810  hcoio_dataread_mod.F90
+
  geos              0000000000AD84DE  hcoio_dataread_mo        2499  hcoio_dataread_mod.F90
+
  geos              0000000000AB8971  hco_readlist_mod_        438  hco_readlist_mod.F90
+
  geos              0000000000AB8398  hco_readlist_mod_        267  hco_readlist_mod.F90
+
  geos              0000000000AA99A6  hco_driver_mod_mp        138  hco_driver_mod.F90
+
  geos              0000000000844C07  hcoi_gc_main_mod_        546  hcoi_gc_main_mod.F90
+
  geos              0000000000801A2E  emissions_mod_mp_        172  emissions_mod.F90
+
  geos              00000000006720B7  MAIN__                    834  main.F
+
 
+
For a workaround, please see [[HEMCO#Update:_Preventing_seg_fault_in_HEMCO_v2.0_caused_by_compiler_bug|this post on our ''HEMCO'' wiki page]].
+
 
+
NOTE: [[HEMCO#Update:_Preventing_seg_fault_in_HEMCO_v2.0_caused_by_compiler_bug|A similar workaround]] has been introduced into the [[GEOS-Chem v11-01#v11-01 public release|v11-01 public release]], which uses a newer version of HEMCO.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:50, 6 January 2017 (UTC)
+
 
+
== Run-time crashes and abnormal exits ==
+
+
In this section, we provide information about some commonly-reported run-time errors that cause GEOS-Chem to halt executing.
+
 
+
=== List-directed I/O syntax error ===
+
 
+
[[Image:List directed io error.png]]
+
 
+
The above error message indicates that the simulation crashed at line 871 in <tt>GeosCore/input_mod.F</tt>.  This means there was an issue while reading [[GEOS-Chem_Input_Files#The_input.geos_file|the <tt>input.geos</tt> file]] (located in the run directory).  For example, GEOS-Chem might have expected numeric input, but instead a character was read from <tt>input.geos</tt>, thus causing a read error.  This type of error can occur the <tt>input.geos</tt> corresponds to a version of GEOS-Chem that is different from yours.
+
 
+
This error is not limited to <tt>input.geos</tt>; it can happen for any text file that is being read from disk (both in GEOS-Chem and in any other Fortran programs you may write).
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:51, 27 February 2017 (UTC)
+
 
+
=== UCX not defined at compile time ===
+
 
+
[[Image:Executable run pg4.png]]
+
 
+
This error may occur if you are compiling GEOS-Chem ([[GEOS-Chem v11-01|v11-01]] and prior versions) and directly from the source code directory and not a run directory.  Whenever you compile GEOS-Chem in the source code directory, you must remember to use the following Makefile options:
+
 
+
CHEM=Standard UCX=y  # Standard simulation (used for benchmarking GC)
+
+
CHEM=UCX UCX=y        # UCX simulation (i.e. Standard simulation w/o SOA species)
+
 
+
To avoid these errors, <span style="color:red">'''we STRONGLY RECOMMEND that you always compile GEOS-Chem from a run directory'''</span>.  This will ensure that the Makefile switches relevant to your simulation will always be activated.  For more information, please see these wiki posts:
+
 
+
*[[GEOS-Chem_Makefile_Structure#Compiling_in_a_run_directory|Compiling GEOS-Chem in a run directory]]
+
*[[Creating GEOS-Chem run directories]]
+
 
+
NOTE: In [[GEOS-Chem v11-02]] and higher versions, the Makefile option <tt>UCX=y</tt> will automatically be set whenever you select <tt>CHEM=Standard</tt> or <tt>CHEM=UCX</tt>.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:43, 10 April 2017 (UTC)
+
 
+
=== No output scheduled on last day of run ===
+
 
+
If you encounter this error at the start of your GEOS-Chem simulation:
+
 
+
==========================================================================
+
GEOS-CHEM ERROR: No output scheduled on last day of run!
+
STOP at IS_LAST_DAY_GOOD ("input_mod.f")
+
==========================================================================
+
 
+
This means that you have not told GEOS-Chem to save out diagnostic data on the day that your simulation ends.  GEOS-Chem adds this error check in order to prevent you from running a long simulation only to have no diagnostics printed out at the end of the run.
+
 
+
For more information on how to schedule diagnostic output in GEOS-Chem, please see [[GEOS-Chem_Input_Files#Output_Menu|the OUTPUT MENU section of the <tt>input.geos</tt> file]] on the GEOS-Chem wiki.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:51, 10 March 2017 (UTC)
+
 
+
=== HEMCO Run Error ===
+
 
+
Errors messages containing "HCO" originate in [[HEMCO|the HEMCO emissions component]]. For example:
+
 
+
  ===============================================================================
+
  GEOS-CHEM ERROR: HCO_RUN
+
  STOP at HCOI_GC_RUN (hcoi_gc_main_mod.F90)
+
  ===============================================================================
+
 
+
Additional helpful diagnostic information can be found in the HEMCO log file, which is usually named <tt>HEMCO.log</tt>.
+
 
+
--[[User:Chris Holmes|Chris Holmes]] ([[User talk:Chris Holmes|talk]]) 15:42, 24 June 2015 (UTC)
+
 
+
==== Updated error message for v11-01 ====
+
 
+
In [[GEOS-Chem v11-01]] and higher versions, [[GEOS-Chem_v11-01#Error_message_output_now_advises_users_to_check_the_HEMCO_log_file|additional text instructs the user to also check the HEMCO log file]].
+
 
+
  ===============================================================================
+
  GEOS-CHEM ERROR: HCO_RUN
+
 
+
  <span style="color:green">HEMCO ERROR: Please check the HEMCO log file for error messages!</span>
+
 
+
  STOP at HCOI_GC_RUN (GeosCore/hcoi_gc_main_mod.F90)
+
  ===============================================================================
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 20:50, 6 January 2017 (UTC)
+
 
+
=== HEMCO time stamps may be wrong ===
+
 
+
HEMCO reads the files but gives zero emissions and shows the following time step error:
+
HEMCO WARNING: ncdf reference year is prior to 1901 - time stamps may be wrong!
+
--> LOCATION: GET_TIMEIDX (hco_read_std_mod.F90)
+
 
+
'''''[[User:Lizzie Lundgren|Lizzie Lundgren]] wrote:'''''
+
 
+
<blockquote>That HEMCO error occurs if the reference time for the netCDF file time dimension is prior to 1901.  If you do <tt>ncdump –c filename</tt> you will be able to see the metadata for the time dimension as well as the time variable values. The time units should include the reference date.
+
 
+
You can get around this issue by changing the reference time within the file. You can do this with CDO (climate data operators) using the <tt>setreftime</tt> command.
+
 
+
Here is a bash script example (by GCST member [[User:Melissa Payer|Melissa Sulprizio]]) that updates the calendar and reference time for all files ending in <tt>*.nc</tt> within a directory. Support team member  developed this for a user very recently who ran into the same issue. In that case the first file was for Jan 1, 1950, so that was made the new reference time. I would recommend doing the same for your dataset so that the first time variable value would be 0. This script also compresses the file which we recommend doing.
+
</blockquote>
+
 
+
      #!/bin/bash
+
     
+
      for file in *nc; do
+
        echo "Processing $file"
+
        cdo setcalendar,standard $file tmp.nc
+
        mv tmp.nc $file
+
        cdo setreftime,1950-01-01,0 $file tmp.nc
+
        mv tmp.nc $file
+
        nccopy -d1 -c "time/1" $file tmp.nc
+
        mv tmp.nc $file
+
      done
+
 
+
<blockquote>After you update the file you can then again do <tt>ncdump –c filename</tt> to check the time dimension. For the case above it looks like this after processing.</blockquote>
+
 
+
              double time(time) ;
+
                      time:standard_name = "time" ;
+
                      time:long_name = "time" ;
+
                      time:bounds = "time_bnds" ;
+
                      time:units = "days since 1950-01-01 00:00:00" ;
+
                      time:calendar = "standard" ;
+
              . . .
+
      time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 396, 424,
+
          455, 485, 516, 546, 577, 608, 638, 669, 699, 730, 761, 790, 821, 851,
+
          882, 912, 943, 974, 1004, 1035, 1065, 1096, 1127, 1155, 1186, 1216, 1247, etc
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:33, 11 April 2018 (UTC)
+
 
+
=== Allocation error ===
+
 
+
If your GEOS-Chem simulation dies with this error output:
+
 
+
===============================================================================
+
GEOS-CHEM ERROR: Allocation error in array: MY_ARRAY
+
STOP at alloc_err.f
+
===============================================================================
+
 
+
then this means that you do not have enough memory to run your simulation.  This type of error can frequently occur if you are running a full-chemistry simulation at the 2&deg; x 2.5&deg; global grid, or one of the 0.5&deg; x 0.666&deg; nested grids.  You may need to tell your queuing system to make more memory available for your GEOS-Chem simulation.  Ask your IT staff for details.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:15, 6 January 2017 (UTC)
+
 
+
=== Permission denied error ===
+
 
+
If you receive this error:
+
 
+
v11-01.run: Permission denied.
+
 
+
after having submitted a [http://acmg.seas.harvard.edu/geos/doc/man/chapter_6.html#6.2.4 run script to a queue system] (such as SLURM or Grid Engine), then doublecheck the Unix permissions of your <tt>v11-01.run</tt> script.  If the script does not have the Unix "execute" permission then the queue system will not be able to run it.
+
 
+
Use the Unix chmod command to make your script executable
+
 
+
chmod 755 v11-01.run
+
 
+
and then re-submit the script to the queue system.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:17, 6 January 2017 (UTC)
+
 
+
=== Floating invalid or floating-point exception error ===
+
 
+
You can check for several common floating-point math errors by compiling with the <tt>FPEX=y</tt> option.  This will halt the simulation with an error message such as:
+
 
+
  forrtl: error (65): floating invalid    # Error message from Intel Fortran Compiler
+
 
+
  Floating point exception (core dumped)  # Error message from GNU Fortran Compiler
+
 
+
This error typically means that a division-by-zero occurred, or a NaN value was encountered in one of your variables. 
+
 
+
A common way to prevent these types of errors is to ensure "safe" divisions (i.e. to make sure that the denominator is nonzero).  You can do this manually with an <tt>IF</tt> statement, or use the routines <tt>SAFE_DIV</tt> or <tt>IS_SAFE_DIV</tt> in <tt>GeosCore/error_mod.F</tt>.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 14:58, 11 October 2017 (UTC)
+
 
+
=== N_/L2_/L2-cutoff JXTRA error ===
+
 
+
<span style="color:green">'''''NOTE: This issue was resolved in [[GEOS-Chem v11-01#v11-01d|v11-01d]].'''''</span>
+
 
+
---> DATE: 2013/11/13  GMT: 19:00  X-HRS:    307.000
+
      - Found all 44 GEOS-FP A1    met fields for 2013/11/13 19:30
+
N_/L2_/L2-cutoff JXTRA:  601  96    0.00
+
N_/L2_/L2-cutoff JXTRA:  601  96    0.00
+
... run dies later with NaN's in chemistry ...
+
 
+
If you are using a version of GEOS-Chem prior to [[GEOS-Chem v11-01#v11-01d|v11-01d]], then you might have experienced this error, where a simulation using [[GEOS-FP]] or [[MERRA]] met fields dies in the chemistry solver after many such "JXTRA" warnings are printed to the log file.
+
 
+
This error has been attributed to a bug in the GEOS-FP and MERRA convection module, and was resolved in v11-01d.  Upgrading to the [[GEOS-Chem v11-01]] public release code or ([[Cloud_convection#Resolve_very_high_tracer_concentrations_in_MERRA_and_GEOS-FP_convective_scavenging|patching your GEOS-Chem v10-01 code with this fix)]] will solve the issue.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 23:12, 27 February 2017 (UTC)
+
 
+
=== KPP "Step size too small" error ===
+
 
+
The following abnormal exit from the [[KPP solvers FAQ|KPP chemical solver]]:
+
 
+
Forced exit from Rosenbrock due to the following error:
+
--> Step size too small: T + 10*H = T or H < Roundoff
+
T=  3044.21151383269      and H=  1.281206877135470E-012
+
+
...
+
        1
+
Forced exit from Rosenbrock due to the following error:
+
--> Step size too small: T + 10*H = T or H < Roundoff
+
T=  3044.21151383269      and H=  1.281206877135470E-012
+
failed twice !!!
+
+
indicates that the chemistry could not converge to a solution in the given grid box.  Possible reasons for this could be:
+
 
+
# A particular tracer has numerically underflowed or overflowed.  This can happen especially in the aerosol chemistry and equilibrium routines, where many exponentials and logarithms are used in the algorithms.
+
# The restart file is not appropriate for the given simulation.  For example, if the restart file was created using the Synoz O3 flux boundary condtion, but you have turned on the Linoz stratospheric O3 chemistry, then this mismatch can cause the solver not to converge.  You can try switching to a restart file generated from a simulation with the same input options as the simulation that you wish to perform.
+
 
+
You may have to [[KPP_solvers_FAQ#How_do_I_choose_the_absolute_and_relative_tolerance.3F|manually adjust the convergence criteria]] in the GEOS-Chem code to fix this condition.
+
 
+
--[[User:Bmy|Bob Y.]] 11:30, 9 November 2010 (EST)
+
 
+
=== Mixed file access modes error ===
+
 
+
This error is particular to the [[Intel Fortran Compiler]].  You might encounter this in conjunction with the ND49 timeseries diagnostic.
+
 
+
According to the Intel website:
+
 
+
severe (31): Mixed file access modes
+
+
FOR$IOS_MIXFILACC. An attempt was made to use any of the following combinations:
+
* Formatted and unformatted operations on the same unit
+
* An invalid combination of access modes on a unit, such as direct and sequential
+
* An Intel® Fortran RTL I/O statement on a logical unit that was opened by a program coded in another language
+
 
+
Here are a few suggestions to try if you haven’t already:
+
 
+
#Make sure you don’t already have a ND49 file in the location that you’re writing. In other words, make sure GEOS-Chem isn’t trying to write to a file that already exists.
+
#Do “make realclean”, recompile, and run again to see if the error is persistent.
+
#Are you using an out-of-the-box version of the code or a modified version? If the latter, do you still get this error with a “clean” copy of the code?
+
#Is there a particular READ/WRITE format statement that is causing the problem? You could try compiling with BOUNDS=y TRACEBACK=y FPE=y DEBUG=y and running in totalview (do “module load totalview” first on Odyssey) to locate the problematic line.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:07, 12 April 2017 (UTC)
+
 
+
=== Negative tracer found in WETDEP ===
+
 
+
If your simulation encounters negative (or NaN) tracer concentrations in the WETDEP routine, then this can be an indication of a problem further upsteam, perhaps in the aerosol routines (highly probable if the tracer is SO4, SO4s, HNO3, SO2, or NH3). We have fixed some of these bugs by making the code more robust.  If you are using a GEOS-Chem version prior to v8-01-01, then you should get [ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/ ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/ these patches].  (These patches have been added to the standard GEOS-Chem code in versions higher than v8-01-01.) Please see the following links for more information:
+
 
+
*[[Wet_deposition#Values_of_F_PRIME_greater_than_1_in_WETDEP|Values of <tt>F_PRIME</tt> > 1 in routine <tt>WETDEP</tt>]]
+
*[[GEOS-5 issues#Small_negative_RH_value_in_20060206.a6.2x25_file|Negative tracer due to negative RH values in the met field data]]
+
*[[Wet deposition#Negative_tracer_in_routine_WETDEP|Negative tracer in routine WETDEP]]
+
*[[Wet deposition#Negative tracer in routine WETDEP .232|Negative tracer in routine WETDEP #2]]
+
*[[Aerosol_thermodynamical_equilibrium#Run_dies_in_RPMARES_unexpectedly|A bug in RPMARES]] was also leading to a crash in WETDEP.
+
 
+
If the fixes above do not solve your problem, you will need to debug. The first step is to use few calls to CHECK_STT (from <tt>tracer_mod.f</tt>) to isolate the part of the code where negative tracers are created. This can be done quite fast if the code dies early enough in the run.
+
 
+
--[[User:Bmy|Bob Y.]] 12:50, 15 July 2011 (EDT)
+
 
+
== Segmentation faults ==
+
 
+
If your simulation dies with a '''segmentation fault''' error, this means that GEOS-Chem tried to access an [http://stackoverflow.com/questions/2346806/what-is-segmentation-fault invalid memory location].  We list several instances of segmentation faults below.
+
 
+
=== Severe(174) SIGSEGV error ===
+
 
+
<span style="color:darkorange">'''''NOTE: In this section, we shall use the Intel Fortran Compiler error messages.  You may get a slightly different error message if you are using a different compiler (such as GNU Fortran).'''''</span>
+
 
+
If you compiled GEOS-Chem with the [[Intel Fortran Compiler|IFORT compiler]], you may encounter the following error message:
+
 
+
forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
 
+
This means that a segmentation fault (i.e. memory error) has occurred during your GEOS-Chem simulation.  This can be caused by:
+
 
+
==== Traceback error stack ====
+
 
+
<span style="color:darkorange">'''''NOTE: TRACEBACK=yes is turned on by default in [[GEOS-Chem v11-01|v11-01]] and higher versions.'''''</span>
+
 
+
When GEOS-Chem is compiled with the <tt>TRACEBACK=yes</tt> option, it will print out an error stack, which includes the list of routines that were called when the error occurred and the line at which the error occurred.
+
 
+
An error stack is included below:
+
 
+
forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
Image              PC                Routine            Line        Source
+
libintlc.so.5      00002ACA46B91961  Unknown              Unknown  Unknown
+
libintlc.so.5      00002ACA46B900B7  Unknown              Unknown  Unknown
+
libnetcdff.so.5    00002ACA4473D682  Unknown              Unknown  Unknown
+
libnetcdff.so.5    00002ACA4473D4D6  Unknown              Unknown  Unknown
+
libnetcdff.so.5    00002ACA4471DD4C  Unknown              Unknown  Unknown
+
libnetcdff.so.5    00002ACA44721DB8  Unknown              Unknown  Unknown
+
libpthread.so.0    00000031A0A0F710  Unknown              Unknown  Unknown
+
<span style="color:red">'''geos.mp            000000000175FF79  hco_interface_mod        341  hco_interface_mod.F90'''</span>
+
geos.mp            00000000005F1F47  carbon_mod_mp_emi        5490  carbon_mod.F
+
geos.mp            00000000016EAF33  emissions_mod_mp_        206  emissions_mod.F90
+
geos.mp            00000000010BB119  MAIN__                  1383  main.F
+
geos.mp            000000000040370E  Unknown              Unknown  Unknown
+
libc.so.6          00000031A061ED5D  Unknown              Unknown  Unknown
+
geos.mp            0000000000403619  Unknown              Unknown  Unknown
+
 
+
The top line with a valid routine and line number printed is the location of the error. In this case, there is an issue in <tt>hco_interface_mod.F</tt> at line 341. You may also choose to step back through the routines to determine what went wrong. Again, in this case, the problematic routine in <tt>hco_interface_mod.F90</tt> was called from <tt>carbon_mod.F</tt> (line 5490), etc. It may be useful to recompile and rerun GEOS-Chem with additional debug options turned on (e.g. <tt>BOUNDS=yes</tt>, <tt>FPE-yes</tt>) to determine the cause of the error. For more information, see our [[GEOS-Chem_coding_and_debugging#Recompile_GEOS-Chem_with_debug_options_turned_on|''GEOS-Chem coding and debugging'' wiki page]].
+
 
+
==== Array-out-of-bounds error ====
+
 
+
Most often, a segmentation fault indicates an array out-of-bounds condition.  To find out more information about where this error is occurring, recompile GEOS-Chem with the following Makefile options:
+
 
+
make realclean
+
make BOUNDS=yes TRACEBACK=yes
+
 
+
The <tt>BOUNDS=yes</tt> option will turn on '''Array Out-of-Bounds''' error checking.  The <tt>TRACEBACK=yes</tt> option will print out the '''Error Stack''', as [[#Traceback error stack|described above]].  These options will provide more detailed error output.
+
 
+
After recompiling, you should receive an error message such as:
+
 
+
forrtl: severe (408): fort: (3): Subscript #1 of the array PBL_THICK has value -1000000 which is less than the lower bound of 1
+
 
+
This tells you that there is a problem with a certain array.  Use the Unix <tt>grep</tt> command to search for all instances of this array in the GEOS-Chem source code:
+
 
+
grep -i PBL_THICK *.f*
+
 
+
and search for the problem. 
+
 
+
NOTE: In the above example, we manually forced an out-of-bounds error with this line of code:
+
 
+
        !### FORCE OOB error for testing
+
        PBL_THICK(-1000000,J)  = BLTHIK
+
 
+
Removing this line will fix the error.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:19, 6 January 2017 (UTC)
+
 
+
==== Invalid memory access ====
+
 
+
A segmentation fault can also happen if GEOS-Chem makes an reference to a memory location that is invalid.  You may see an error message such as this:
+
 
+
severe (174): SIGSEGV, segmentation fault occurred
+
This message indicates that the program attempted an invalid memory reference.
+
Check the program for possible errors.
+
 
+
This can happen if you are trying to read data from a file into an array, but the array is too small to hold all of the data.  You can use a debugger (such as Totalview or IDB) to try to diagnose the situation.  You may receive an error message from the debugger similar to this one: 
+
 
+
  Thread received signal SEGV
+
  stopped at [<opaque> for_read_seq_xmit(...) 0x40000000006b6500]
+
 
+
  Information:  An <opaque> type was presented during execution of
+
  the previous command.  For complete type information on this symbol,
+
  recompilation of the program will be necessary.  Consult the compiler
+
  man pages for details on producing full symbol table information using 
+
  the '-g' (and '-gall' for cxx) flags.
+
 
+
Usually, increasing the size of the array (i.e. until it is large enough to contain all of the data) will fix this problem.
+
 
+
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
+
 
+
==== Stack overflow ====
+
 
+
Finally, a segmentation fault can happen if GEOS-Chem uses up all of the available [http://en.wikipedia.org/wiki/Stack-based_memory_allocation stack memory] on your system.  The stack memory is a special part of the memory where short-term variables get stored. 
+
 
+
The compiler will typically place into the stack memory all local temporary variables, such as:
+
 
+
* variables that are local to a given subroutine
+
* variables that are NOT located within a <tt>COMMON</tt> block
+
* variables that are NOT declared with the <tt>SAVE</tt> attribute
+
* variables that are NOT declared as an <tt>ALLOCATABLE</tt> array
+
* variables that are NOT declared as a <tt>POINTER</tt> variable or array
+
 
+
Therefore, it is important to make sure that your computational environment is set up to use the maximum amount of stack memory.  You can do this by placing the following line in your <tt>.cshrc</tt> file:
+
 
+
limit stacksize unlimited
+
 
+
or <tt>.bashrc</tt> file:
+
 
+
  ulimit -s unlimited
+
 
+
If you encounter a <tt>SIGSEGV(174)</tt> message due to a stacksize memory error, you may see the following error text:
+
 
+
severe (174): SIGSEGV, possible program stack overflow occurred
+
Program requirements exceed current stacksize resource limit.
+
 
+
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
+
 
+
==== forrtl: error (76): IOT trap signal ====
+
 
+
'''''[mailto:xun@gps.caltech.edu Xun Jiang] wrote:'''''
+
 
+
:We met the following error message
+
 
+
    forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
+
    Stack trace terminated abnormally.
+
    forrtl: error (76): IOT trap signal
+
+
    Note: The error appears after
+
    - RDSOIL: Reading
+
    Data/GEOS_2x2.5/soil_NOx_200203/climatprep2x25.dat
+
    ### MAIN: a DAILY DATA
+
 
+
:I have the following lines in <tt>.cshrc</tt>
+
 
+
    setenv KMP_STACKSIZE 329033024
+
    limit cputime    unlimited
+
    limit datasize    unlimited
+
    limit stacksize  unlimited
+
    limit filesize    unlimited
+
    limit memoryuse  unlimited
+
    limit descriptors unlimited
+
 
+
:However, it still doesn't work. Any suggestion is really appreciated.
+
 
+
'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] replied:'''''
+
 
+
:I found [http://xtechnotes.blogspot.com/2006/01/1001-most-idiotic-error-messages.html this internet post] which has an explanation:
+
 
+
    Cause:
+
    The stack size for child threads are overflowing.  The main stack size for the program
+
    is changed by the ulimit command (in Bash shell) or limit command (in C shell).
+
    However this environment variable does not set the size for the child thread stack size.
+
    Thus the child thread stack overflow.
+
+
    Solution:
+
    Set the environment variables to increase the child thread stack size.
+
+
    #for intel, using bash shell
+
    export KMP_STACKSIZE=500000000
+
+
    # for intel, using csh or tcsh shell
+
    setenv KMP_STACKSIZE 500000000
+
 
+
:For more information, please see our wiki post on [[Intel Fortran Compiler#Resetting stacksize for Linux|Resetting the stack size for Linux]].
+
 
+
--[[User:Bmy|Bob Y.]] 11:20, 26 June 2012 (EDT)
+
 
+
=== Segmentation fault encountered after TPCORE initialization ===
+
 
+
You may encounter a segmentation fault right after the following text is printed.
+
 
+
NASA-GSFC Tracer Transport Module successfully initialized
+
 
+
This error usually occurs when:
+
 
+
# You are running GEOS-Chem at sufficiently fine resolution, such as 2&deg; x 2.5&deg; or finer.  (Many users have reported that this error does not occur at 4&deg; x 5&deg; resolution.)
+
# You are using a large number of advected tracers.
+
# Both #1 and #2
+
 
+
If you are using the [[Intel Fortran Compiler]], the cause of this error can likely be traced to a known issue with the the <tt>glibc</tt> library.  This will cause GEOS-Chem to think that it has used up all of the available memory, when in fact there is plenty of memory still available.  However, you may also encounter this same error even if you have compiled GEOS-Chem with a different compiler.
+
 
+
You can usually correct this error by manually telling your system to use the maximum amount of stack memory when running GEOS-Chem.  For detailed instructions, please see the following links:
+
 
+
#[[Intel Fortran Compiler#Resetting stacksize for Linux|Setting stacksize for the Intel Fortran Compiler (aka "IFORT")]]
+
#[[Machine_issues_%26_portability#Setting_the_stacksize|Setting stacksize for the PGI Compiler]]
+
#[[Machine_issues_%26_portability#.22Not_enough_space.22_error_in_TPCORE|Setting stacksize for the Sun Studio compiler]]
+
 
+
--[[User:Bmy|Bob Y.]] 16:07, 14 December 2010 (EST)
+
 
+
=== Bad GEOS-4 A6 met data causing segmentation fault ===
+
 
+
Please see [[GMAO_GEOS-4#Bad GEOS-4 A6 met data causing segmentation fault|this post about bad GEOS-4 A6 met data causing a segmentation fault]] in GEOS-Chem simulations.
+
 
+
--[[User:Bmy|Bob Y.]] 15:19, 16 February 2010 (EST)
+
 
+
== Other memory-related errors ==
+
 
+
The errors listed below, which occur infrequently, are related to invalid memory operations.  These can especially occur with  <code>POINTER</code>-based variables.
+
 
+
=== Bus Error ===
+
 
+
A bus error means that you are trying to reference memory that cannot possibly be there.  The website StackOverflow.com has a [http://stackoverflow.com/questions/212466/what-is-a-bus-errornice definition of bus error and how it differs from a segmentation fault].
+
 
+
One cause of a bus error can be if you are trying to call a subroutine with the wrong number of arguments (i.e. usually too many arguments).
+
 
+
--[[User:Bmy|Bob Y.]] 12:27, 19 October 2012 (EDT)
+
 
+
=== Dwarf subprogram entry error ===
+
 
+
The error message:
+
 
+
  Dwarf subprogram entry L_ROUTINE-NAME__LINE-NUMBER__par_loop2_2_576 has high_pc < low_pc.
+
  This warning will not be repeated for other occurrences.
+
 
+
can occur when you try to use a pointer variable that is unassociated (i.e. that is not currently pointing to any other variable) from within an OpenMP parallel loop, where:
+
 
+
#<tt>ROUTINE-NAME</tt> is the name of the routine where the error occurred, and
+
#<tt>LINE-NUMBER</tt> is the line where the error occurred.
+
 
+
We recently discovered that this error can be caused if you have a pointer declaration such as this:
+
 
+
  TYPE(Species), POINTER :: ThisSpc => NULL()
+
 
+
where the pointer <code>ThisSpc</code> is later used to point to another variable from within an OpenMP parallel loop.  As it turns out, the above declaration statement will inadvertently cause pointer <code>ThisSpc</code> to be declared with the <tt>SAVE</tt> attribute.  This can cause a segmentation fault, because all pointers used within an OpenMP parallel region must be created and destroyed on the same thread.
+
 
+
This type of problem can usually be fixed by removing the nullification from the declaration statement.  In other words, you can rewrite the above line of code with:
+
 
+
  TYPE(Species), POINTER :: ThisSpc
+
  . . .
+
  ThisSpc => NULL()
+
 
+
For more information, [http://www.cs.rpi.edu/~szymansk/OOF90/bugs.html#4 please see this article].
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:27, 29 April 2016 (UTC)
+
 
+
=== IFORT error: Relocation truncated to fit ===
+
 
+
Please see [[Intel Fortran Compiler#Relocation truncated to fit error|this wiki post on our Intel Fortran Compiler page]] which describes how to work around an <tt>Relocation truncated to fit</tt> error message.
+
 
+
--[[User:Bmy|Bob Y.]] 10:46, 24 February 2012 (EST)
+
 
+
=== IFORT error: Out of memory asking for NNNNN ===
+
 
+
This is not a common error message, but it may occur if you are compiling a version of GEOS-Chem for a high-resolution horizontal grid, or with one of the available microphysics packages (i,e. [[APM aerosol microphysics|APM]] or [[TOMAS aerosol microphysics|TOMAS]]).  Please see [[Intel_Fortran_Compiler#Out_of_memory_asking_for_NNNNN|this wiki post on our Intel Fortran Compiler page]] which describes this error in detail.
+
 
+
--[[User:Bmy|Bob Y.]] 10:42, 26 July 2013 (EDT)
+
 
+
=== Memory error: "munmap_chunk: invalid pointer" ===
+
 
+
The following error is not common but can happen:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-valign="top"
+
!width="150px" bgcolor="#CCCCCC"|Error
+
|width="850px"|<tt>*** glibc detected *** ./geos: munmap_chunk(): invalid pointer: 0x00000000059aac30 ***</tt>
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Reference
+
|http://stackoverflow.com/questions/6199729/how-to-solve-munmap-chunk-invalid-pointer-error-in-c
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Explanation
+
|This happens when the pointer passed to (C-library language routine <tt>free()</tt>, which is called from Fortran routine <tt>NULLIFY()</tt>) is not valid or has been modified somehow. I don't really know the details here. The bottom line is that the pointer passed to free() must be the same as returned by (C-library routines) malloc(), realloc() and their friends.
+
 
+
The free() function frees the memory space pointed  to  by  ptr,  which
+
must  have  been  returned  by a previous call to malloc(), calloc() or
+
realloc().  Otherwise, or if free(ptr) has already been called  before,
+
undefined behavior occurs.  If ptr is NULL, no operation is performed.
+
GNU                              2012-05-10                        MALLOC(3)
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Simpler explanation
+
|This can happen if you are trying to deallocate or nullify a pointer variable that has already been deallocated or modified.
+
 
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:32, 6 January 2017 (UTC)
+
 
+
=== Memory error: "free: invalid size" ===
+
 
+
The following error is not common but can happen:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-valign="top"
+
!width="150px" bgcolor="#CCCCCC"|Error
+
|width="850px"|<tt>*** Error in `./geos': free(): invalid size: 0x000000000662e090 ***</tt>
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Reference
+
|http://stackoverflow.com/questions/4729395/error-free-invalid-next-size-fast
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Explanation
+
|It means that you have a memory error. You may be trying to free a pointer that wasn't allocated (or delete an object that wasn't created) or you may be trying to nullify/delete such an object more than once. You may be overflowing a buffer or otherwise writing to memory to which you shouldn't be writing, causing heap corruption. 
+
 
+
Any number of programming errors can cause this problem. You need to use a debugger, get a backtrace, and see what your program is doing when the error occurs. If that fails and you determine you have corrupted the heap at some previous point in time, you may be in for some painful debugging (it may not be too painful if the project is small enough that you can tackle it piece by piece).
+
 
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:32, 6 January 2017 (UTC)
+
 
+
=== Memory error: "double free or corruption" ===
+
 
+
The following error is not common, but can occur under some circumstances:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-valign="top"
+
!width="150px" bgcolor="#CCCCCC"|Error
+
|width="850px"|<tt> *** glibc detected *** ./geos: double free or corruption (out):</tt>
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Reference
+
|http://stackoverflow.com/questions/2902064/how-to-track-down-a-double-free-or-corruption-error-in-c-with-gdb
+
 
+
|-valign="top"
+
!bgcolor="#CCCCCC"|Explanation
+
|There are at least two possible situations:
+
#You are deleting the same entity twice
+
#You are deleting something that wasn't allocated
+
 
+
For the first one I strongly suggest NULL-ing all deleted pointers.
+
 
+
You have [some] options:
+
#Overload new and delete and track the allocations
+
#Use a debugger -- then you'll get a backtrace from your crash, and that'll probably be very helpful
+
 
+
Three basic rules:
+
#Set pointer to NULL after free
+
#Check for NULL before freeing.
+
#Initialize pointer to NULL in the start.
+
 
+
Combination of these three works quite well.
+
 
+
|}
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:32, 6 January 2017 (UTC)
+

Latest revision as of 20:30, 14 June 2019