Difference between revisions of "Common GEOS-Chem error messages"

From Geos-chem
Jump to: navigation, search
m
Line 1: Line 1:
 
Here is a list of some commonly-encountered GEOS-Chem error messages.
 
Here is a list of some commonly-encountered GEOS-Chem error messages.
  
== File ann_mean_trop.geos5.* not found ==  
+
== Specific errors ==
 +
 
 +
=== File ann_mean_trop.geos5.* not found ===
  
 
If you are running a GEOS-5 simulation and get an error that says that GEOS-Chem cannot locate the <tt>ann_mean_trop.geos5.2x25</tt> or <tt>ann_mean_trop.geos5.4x5</tt> file, then make sure that the following option is set in your <tt>input.geos</tt> file.   
 
If you are running a GEOS-5 simulation and get an error that says that GEOS-Chem cannot locate the <tt>ann_mean_trop.geos5.2x25</tt> or <tt>ann_mean_trop.geos5.4x5</tt> file, then make sure that the following option is set in your <tt>input.geos</tt> file.   
Line 11: Line 13:
 
--[[User:Bmy|Bob Y.]] 15:18, 7 July 2008 (EDT)
 
--[[User:Bmy|Bob Y.]] 15:18, 7 July 2008 (EDT)
  
== A3 met fields not found ==  
+
=== A3 met fields not found ===  
  
 
If you encounter a "file not found" error in A3_READ_MOD very near to the beginning of a GEOS-Chem simulation, then this may be caused by the MEGAN biogenic emissions.  MEGAN keeps a 15-day running average of temperature, and therefore requires that the the met field files for the 15 days prior to the start of the GEOS-Chem simulation be present on disk.
 
If you encounter a "file not found" error in A3_READ_MOD very near to the beginning of a GEOS-Chem simulation, then this may be caused by the MEGAN biogenic emissions.  MEGAN keeps a 15-day running average of temperature, and therefore requires that the the met field files for the 15 days prior to the start of the GEOS-Chem simulation be present on disk.
Line 19: Line 21:
 
--[[User:Bmy|Bob Y.]] 15:18, 7 July 2008 (EDT)
 
--[[User:Bmy|Bob Y.]] 15:18, 7 July 2008 (EDT)
  
== Segmentation fault encountered after TPCORE initialization ==
+
=== Segmentation fault encountered after TPCORE initialization ===
  
 
On the [[Machine issues & portability#IFORT|IFORT Compiler]], there is a known issue with the the <tt>glibc</tt> library that can make your code appear to have run out of memory even though there is plenty of memory available.  The symptom is that your code will die with a segmentation fault right after the following text is printed:
 
On the [[Machine issues & portability#IFORT|IFORT Compiler]], there is a known issue with the the <tt>glibc</tt> library that can make your code appear to have run out of memory even though there is plenty of memory available.  The symptom is that your code will die with a segmentation fault right after the following text is printed:
Line 31: Line 33:
 
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
 
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
  
== Module file cannot be read ==
+
=== Module file cannot be read ===
  
 
If you should encounter this type of error:
 
If you should encounter this type of error:
Line 44: Line 46:
 
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
 
--[[User:Bmy|Bob Y.]] 13:39, 1 July 2008 (EDT)
  
== Error when reading the "restart_gprod_aprod" file ==  
+
=== Error when reading the "restart_gprod_aprod" file ===
  
 
'''''Eric Leibensperger (eleibens@fas.harvard.edu) wrote:'''''
 
'''''Eric Leibensperger (eleibens@fas.harvard.edu) wrote:'''''
Line 76: Line 78:
 
--[[User:Bmy|Bmy]] 15:59, 9 May 2008 (EDT)
 
--[[User:Bmy|Bmy]] 15:59, 9 May 2008 (EDT)
  
== Exit Status ==
+
=== Fatal error in IFORT ===
 +
 
 +
The following error, which resulted on the Altix platform using Intel "ifort" v9.1 compiler:
 +
 
 +
  ifort: error: /opt/intel/fc/9.0/bin/fpp: core dumped
 +
  ifort: error: Fatal error in /opt/intel/fc/9.0/bin/fpp,
 +
  terminated by unknown signal(139)
 +
  make: *** [transport_mod.o] Error 1
 +
 
 +
was caused by an omitted " in an #include declaration, i.e.
 +
 
 +
  #    include "CMN_DIAG
 +
 
 +
Adding the closing " fixed the problem.
 +
 
 +
== General Errors ==
 +
 
 +
=== Exit Status ===
  
 
If you submit a job to the PBS Queue, the exit status will tell you a little bit about what happened.  The exit status code is in the email that you will get back from the PBS server:
 
If you submit a job to the PBS Queue, the exit status will tell you a little bit about what happened.  The exit status code is in the email that you will get back from the PBS server:
Line 95: Line 114:
 
A common exit status number is 143.  This means that your job has exceeded the wall-clock time limit of the [[wiki:geos-chem:queues|queue]].  The solution is to restart your job in a queue with a longer time limit.
 
A common exit status number is 143.  This means that your job has exceeded the wall-clock time limit of the [[wiki:geos-chem:queues|queue]].  The solution is to restart your job in a queue with a longer time limit.
  
 
+
=== Segmentation Fault ===
== Segmentation Fault ==
+
  
 
A segmentation fault may mean:
 
A segmentation fault may mean:
Line 104: Line 122:
 
* You are trying to read data from a file into an array of the wrong dimensions
 
* You are trying to read data from a file into an array of the wrong dimensions
  
=== Example 1 ===
+
==== Example 1 ====
  
 
Here is segmentation fault caused by the code trying read data from a file into an array which is too small to contain the data.  This error was detected on the Altix platform.  The output comes from the IDB debugger.   
 
Here is segmentation fault caused by the code trying read data from a file into an array which is too small to contain the data.  This error was detected on the Altix platform.  The output comes from the IDB debugger.   
Line 117: Line 135:
 
   the '-g' (and '-gall' for cxx) flags.
 
   the '-g' (and '-gall' for cxx) flags.
  
=== Example 2 ===
+
==== Example 2 ====
  
 
The same error message as in Example 1 has also been known to occur on Altix when a variable that has not been initialized is passed to a subroutine or function:
 
The same error message as in Example 1 has also been known to occur on Altix when a variable that has not been initialized is passed to a subroutine or function:
Line 133: Line 151:
 
   ENDDO
 
   ENDDO
  
== Bus Error ==
+
=== Bus Error ===
  
 
A bus error usually means that you are trying to call a subroutine with the wrong number of arguments.   
 
A bus error usually means that you are trying to call a subroutine with the wrong number of arguments.   
  
 
On SGI, you usually will only get a bus error if you call a subroutine with too many, rather than too few arguments.  Other behaviors are platform and compiler dependent.
 
On SGI, you usually will only get a bus error if you call a subroutine with too many, rather than too few arguments.  Other behaviors are platform and compiler dependent.
 
== Compilation Errors ==
 
 
Sometimes you might encounter a weird error during GEOS-Chem compilation.  Here is an example.
 
 
=== Example: Fatal Error on Altix ===
 
 
The following error, which resulted on the Altix platform using Intel "ifort" v9 compiler:
 
 
  ifort: error: /opt/intel/fc/9.0/bin/fpp: core dumped
 
  ifort: error: Fatal error in /opt/intel/fc/9.0/bin/fpp,
 
  terminated by unknown signal(139)
 
  make: *** [transport_mod.o] Error 1
 
 
was caused by an omitted " in an #include declaration, i.e.
 
 
  #    include "CMN_DIAG
 
 
Adding the closing " fixed the problem.
 

Revision as of 19:47, 7 July 2008

Here is a list of some commonly-encountered GEOS-Chem error messages.

Specific errors

File ann_mean_trop.geos5.* not found

If you are running a GEOS-5 simulation and get an error that says that GEOS-Chem cannot locate the ann_mean_trop.geos5.2x25 or ann_mean_trop.geos5.4x5 file, then make sure that the following option is set in your input.geos file.

Use variable tropopause?: T

Starting in version GEOS-Chem v7-04-12, GEOS-Chem can now use a variable tropopause (i.e. chemistry is done up to the location of the actual tropopause as diagnosed from the met fields at any given timestep). You cannot use the annual mean tropopause for GEOS-5.

--Bob Y. 15:18, 7 July 2008 (EDT)

A3 met fields not found

If you encounter a "file not found" error in A3_READ_MOD very near to the beginning of a GEOS-Chem simulation, then this may be caused by the MEGAN biogenic emissions. MEGAN keeps a 15-day running average of temperature, and therefore requires that the the met field files for the 15 days prior to the start of the GEOS-Chem simulation be present on disk.

The solution to this error is to either copy over the missing met data for the previous 15 days to your disk space, or to start your GEOS-Chem simulation at a later date.

--Bob Y. 15:18, 7 July 2008 (EDT)

Segmentation fault encountered after TPCORE initialization

On the IFORT Compiler, there is a known issue with the the glibc library that can make your code appear to have run out of memory even though there is plenty of memory available. The symptom is that your code will die with a segmentation fault right after the following text is printed:

NASA-GSFC Tracer Transport Module successfully initialized

This error may not occur if you are running GEOS-Chem at 4 x 5 resolution. Users have reported that this error typically occurs when running 2 x 2.5 simulation.

The workaround is to reset your stacksize limit to a large positive number instead of "unlimited". See the following discussion on the Machine Issues and Portability wiki page.

--Bob Y. 13:39, 1 July 2008 (EDT)

Module file cannot be read

If you should encounter this type of error:

ifort -cpp -w -O2 -auto -noalign -convert big_endian -openmp -Dmultitask -c time_mod.f
fortcom: Error: time_mod.f, line 259: This module file was generated for a different 
platform or by an incompatible compiler or compiler release. It cannot be read.   [JULDAY_MOD]
      USE JULDAY_MOD, ONLY : JULDAY, CALDATE 

Then this means that you are trying to link to previously-created *.mod files that were generated by a different compiler. Making clean and re-compiling from scratch should solve this problem.

--Bob Y. 13:39, 1 July 2008 (EDT)

Error when reading the "restart_gprod_aprod" file

Eric Leibensperger (eleibens@fas.harvard.edu) wrote:

I am trying to run GEOS-Chem and have encountered and error. The log file gives me this:
  ===============================================================================
  GEOS-CHEM ERROR: No matches found for file restart_gprod_aprod.2001070100!
  STOP at READ_BPCH2 (bpch2_mod.f)!
  ===============================================================================
I have the aerosol restart file (with the same name) in my ~/testrun/runs/run.v7-04-12/ folder. Is it looking for it elsewhere? I get an additional message in the log.error file, but I think that it is possibily the result of not being able to find the file above:
  ******  FORTRAN RUN-TIME SYSTEM  ******
  Error 1183:  deallocating an unallocated allocatable array
  Location:  the DEALLOCATE statement at line 4933 of "carbon_mod.f"
  Abort
Any thoughts would be appreciated. Sorry to bother you with this!
Eric

Philippe Le Sager (plesager@seas.harvard.edu) replied:

You must rewrite your restart_gprod_aprod.YYYYMMDDHH so that the date in the filename is the same as the one in the datablock header.
I wrote a routine to do that: ~phs/IDL/dvpt/various_rewrite/rewrite_agprod.pro
-Philippe

NOTE: The file rewrite_agprod.pro will be released in the next GAMAP version.

--Bmy 15:59, 9 May 2008 (EDT)

Fatal error in IFORT

The following error, which resulted on the Altix platform using Intel "ifort" v9.1 compiler:

 ifort: error: /opt/intel/fc/9.0/bin/fpp: core dumped
 ifort: error: Fatal error in /opt/intel/fc/9.0/bin/fpp, 
 terminated by unknown signal(139)
 make: *** [transport_mod.o] Error 1

was caused by an omitted " in an #include declaration, i.e.

 #     include "CMN_DIAG

Adding the closing " fixed the problem.

General Errors

Exit Status

If you submit a job to the PBS Queue, the exit status will tell you a little bit about what happened. The exit status code is in the email that you will get back from the PBS server:

 PBS Job Id: 6015.altix
 Job Name:   test.sh
 Execution terminated
 Exit_status=1
 resources_used.cpupercent=0
 resources_used.cput=00:00:00
 resources_used.mem=2304kb
 resources_used.ncpus=1
 resources_used.vmem=4544kb
 resources_used.walltime=00:00:00

Jobs that finish normally should have an exit status of 0. In the above example, we have an exit status of 1, which means that the job encountered some kind of error (either a compilation error or a run-time error).

A common exit status number is 143. This means that your job has exceeded the wall-clock time limit of the queue. The solution is to restart your job in a queue with a longer time limit.

Segmentation Fault

A segmentation fault may mean:

  • You have gone outside the declared bounds of an array
  • You are trying to access an ALLOCATABLE array that hasn't been allocated yet
  • You are trying to read data from a file into an array of the wrong dimensions

Example 1

Here is segmentation fault caused by the code trying read data from a file into an array which is too small to contain the data. This error was detected on the Altix platform. The output comes from the IDB debugger.

 Thread received signal SEGV
 stopped at [<opaque> for_read_seq_xmit(...) 0x40000000006b6500] 
 
 Information:  An <opaque> type was presented during execution of 
 the previous command.  For complete type information on this symbol,
 recompilation of the program will be necessary.  Consult the compiler
 man pages for details on producing full symbol table information using   
 the '-g' (and '-gall' for cxx) flags.

Example 2

The same error message as in Example 1 has also been known to occur on Altix when a variable that has not been initialized is passed to a subroutine or function:

 CALL MYSUB( I )   ! I is uninitialized!
 DO I = 1, IIPAR 
    ... 
 ENDDO

The solution to the problem is as follows:

 DO I = 1, IIPAR 
    CALL MYSUB( I )  ! Put the MYSUB call in the I-loop!
    ... 
 ENDDO

Bus Error

A bus error usually means that you are trying to call a subroutine with the wrong number of arguments.

On SGI, you usually will only get a bus error if you call a subroutine with too many, rather than too few arguments. Other behaviors are platform and compiler dependent.