Difference between revisions of "Segmentation faults"

From Geos-chem
Jump to: navigation, search
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]] | [[Getting Started with GEOS-Chem]] | [[Main page|GEOS-Chem Main Page]]'''''
+
----
 +
<span style="color:red"><strong><big>We have migrated bug reports and support requests to our [https://github.com/geoschem/geos-chem/issues/ Github issue tracker].</big></strong></span>
 +
----
  
 +
 +
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]]'''''
 +
 +
#[[Understanding the different categories of errors]]
 
#[[Compile-time warnings and errors]]
 
#[[Compile-time warnings and errors]]
 
#[[Run-time crashes and abnormal exits]]
 
#[[Run-time crashes and abnormal exits]]
#[[Segmentation faults]]
+
#<span style="color:blue">'''Segmentation faults'''</span>
 
#[[Other less-common errors]]
 
#[[Other less-common errors]]
  
Line 9: Line 15:
 
== Overview ==
 
== Overview ==
  
If your simulation dies with a '''segmentation fault''' error, this means that GEOS-Chem tried to access an [http://stackoverflow.com/questions/2346806/what-is-segmentation-fault invalid memory location].  We list several instances of segmentation faults below.
+
If your simulation dies with a '''segmentation fault''' error, this means that GEOS-Chem tried to access an [http://stackoverflow.com/questions/2346806/what-is-segmentation-fault invalid memory location].  A segmentation fault error message looks similar to this:
 
+
=== Severe(174) SIGSEGV error ===
+
 
+
<span style="color:darkorange">'''''NOTE: In this section, we shall use the Intel Fortran Compiler error messages.  You may get a slightly different error message if you are using a different compiler (such as GNU Fortran).'''''</span>
+
 
+
If you compiled GEOS-Chem with the [[Intel Fortran Compiler|IFORT compiler]], you may encounter the following error message:
+
 
+
forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
 
+
This means that a segmentation fault (i.e. memory error) has occurred during your GEOS-Chem simulation.  This can be caused by:
+
 
+
==== Traceback error stack ====
+
 
+
<span style="color:darkorange">'''''NOTE: TRACEBACK=yes is turned on by default in [[GEOS-Chem v11-01|v11-01]] and higher versions.'''''</span>
+
 
+
When GEOS-Chem is compiled with the <tt>TRACEBACK=yes</tt> option, it will print out an error stack, which includes the list of routines that were called when the error occurred and the line at which the error occurred.
+
 
+
An error stack is included below:
+
  
 
  forrtl: severe (174): SIGSEGV, segmentation fault occurred
 
  forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
 
libintlc.so.5      00002ACA46B91961  Unknown              Unknown  Unknown
 
libintlc.so.5      00002ACA46B900B7  Unknown              Unknown  Unknown
 
libnetcdff.so.5    00002ACA4473D682  Unknown              Unknown  Unknown
 
libnetcdff.so.5    00002ACA4473D4D6  Unknown              Unknown  Unknown
 
libnetcdff.so.5    00002ACA4471DD4C  Unknown              Unknown  Unknown
 
libnetcdff.so.5    00002ACA44721DB8  Unknown              Unknown  Unknown
 
libpthread.so.0    00000031A0A0F710  Unknown              Unknown  Unknown
 
<span style="color:red">'''geos.mp            000000000175FF79  hco_interface_mod        341  hco_interface_mod.F90'''</span>
 
geos.mp            00000000005F1F47  carbon_mod_mp_emi        5490  carbon_mod.F
 
geos.mp            00000000016EAF33  emissions_mod_mp_        206  emissions_mod.F90
 
geos.mp            00000000010BB119  MAIN__                  1383  main.F
 
geos.mp            000000000040370E  Unknown              Unknown  Unknown
 
libc.so.6          00000031A061ED5D  Unknown              Unknown  Unknown
 
geos.mp            0000000000403619  Unknown              Unknown  Unknown
 
  
The top line with a valid routine and line number printed is the location of the error. In this case, there is an issue in <tt>hco_interface_mod.F</tt> at line 341. You may also choose to step back through the routines to determine what went wrong. Again, in this case, the problematic routine in <tt>hco_interface_mod.F90</tt> was called from <tt>carbon_mod.F</tt> (line 5490), etc. It may be useful to recompile and rerun GEOS-Chem with additional debug options turned on (e.g. <tt>BOUNDS=yes</tt>, <tt>FPE-yes</tt>) to determine the cause of the error. For more information, see our [[GEOS-Chem_coding_and_debugging#Recompile_GEOS-Chem_with_debug_options_turned_on|''GEOS-Chem coding and debugging'' wiki page]].
+
but may differ depending on the compiler version you are using. Segmentation faults can be due to several causes, as shown in the following sections.
  
==== Array-out-of-bounds error ====
+
=== Array-out-of-bounds error ===
  
 
Most often, a segmentation fault indicates an array out-of-bounds condition.  To find out more information about where this error is occurring, recompile GEOS-Chem with the following Makefile options:
 
Most often, a segmentation fault indicates an array out-of-bounds condition.  To find out more information about where this error is occurring, recompile GEOS-Chem with the following Makefile options:
Line 76: Line 49:
 
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:19, 6 January 2017 (UTC)
 
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:19, 6 January 2017 (UTC)
  
==== Invalid memory access ====
+
=== Invalid memory access ===
  
 
A segmentation fault can also happen if GEOS-Chem makes an reference to a memory location that is invalid.  You may see an error message such as this:
 
A segmentation fault can also happen if GEOS-Chem makes an reference to a memory location that is invalid.  You may see an error message such as this:
Line 99: Line 72:
 
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
 
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
  
==== Stack overflow ====
+
=== Stack overflow ===
  
 
Finally, a segmentation fault can happen if GEOS-Chem uses up all of the available [http://en.wikipedia.org/wiki/Stack-based_memory_allocation stack memory] on your system.  The stack memory is a special part of the memory where short-term variables get stored.   
 
Finally, a segmentation fault can happen if GEOS-Chem uses up all of the available [http://en.wikipedia.org/wiki/Stack-based_memory_allocation stack memory] on your system.  The stack memory is a special part of the memory where short-term variables get stored.   
Line 126: Line 99:
 
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
 
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
  
==== forrtl: error (76): IOT trap signal ====
+
=== forrtl: error (76): IOT trap signal ===
  
 
'''''[mailto:xun@gps.caltech.edu Xun Jiang] wrote:'''''
 
'''''[mailto:xun@gps.caltech.edu Xun Jiang] wrote:'''''
Line 191: Line 164:
 
If you are using the [[Intel Fortran Compiler]], the cause of this error can likely be traced to a known issue with the the <tt>glibc</tt> library.  This will cause GEOS-Chem to think that it has used up all of the available memory, when in fact there is plenty of memory still available.  However, you may also encounter this same error even if you have compiled GEOS-Chem with a different compiler.
 
If you are using the [[Intel Fortran Compiler]], the cause of this error can likely be traced to a known issue with the the <tt>glibc</tt> library.  This will cause GEOS-Chem to think that it has used up all of the available memory, when in fact there is plenty of memory still available.  However, you may also encounter this same error even if you have compiled GEOS-Chem with a different compiler.
  
You can usually correct this error by manually telling your system to use the maximum amount of stack memory when running GEOS-Chem.  For detailed instructions, please see the following links:
+
You can usually correct this error by manually telling your system to use the maximum amount of stack memory when running GEOS-Chem.  For detailed instructions, [https://geos-chem.readthedocs.io/en/latest/gcc-guide/01-startup/login-env-parallel.html please follow this link].
  
#[[Intel Fortran Compiler#Resetting stacksize for Linux|Setting stacksize for the Intel Fortran Compiler (aka "IFORT")]]
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 14:29, 20 September 2022 (UTC)
#[[Machine_issues_%26_portability#Setting_the_stacksize|Setting stacksize for the PGI Compiler]]
+
#[[Machine_issues_%26_portability#.22Not_enough_space.22_error_in_TPCORE|Setting stacksize for the Sun Studio compiler]]
+
  
--[[User:Bmy|Bob Y.]] 16:07, 14 December 2010 (EST)
 
  
 
----
 
----
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]] | [[Getting Started with GEOS-Chem]] | [[Main page|GEOS-Chem Main Page]]'''''
+
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]]'''''

Revision as of 18:26, 20 September 2022


We have migrated bug reports and support requests to our Github issue tracker.



Previous | Next | Guide to GEOS-Chem error messages

  1. Understanding the different categories of errors
  2. Compile-time warnings and errors
  3. Run-time crashes and abnormal exits
  4. Segmentation faults
  5. Other less-common errors


Overview

If your simulation dies with a segmentation fault error, this means that GEOS-Chem tried to access an invalid memory location. A segmentation fault error message looks similar to this:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

but may differ depending on the compiler version you are using. Segmentation faults can be due to several causes, as shown in the following sections.

Array-out-of-bounds error

Most often, a segmentation fault indicates an array out-of-bounds condition. To find out more information about where this error is occurring, recompile GEOS-Chem with the following Makefile options:

make realclean
make BOUNDS=yes TRACEBACK=yes

The BOUNDS=yes option will turn on Array Out-of-Bounds error checking. The TRACEBACK=yes option will print out the Error Stack, as described above. These options will provide more detailed error output.

After recompiling, you should receive an error message such as:

forrtl: severe (408): fort: (3): Subscript #1 of the array PBL_THICK has value -1000000 which is less than the lower bound of 1

This tells you that there is a problem with a certain array. Use the Unix grep command to search for all instances of this array in the GEOS-Chem source code:

grep -i PBL_THICK *.f*

and search for the problem.

NOTE: In the above example, we manually forced an out-of-bounds error with this line of code:

        !### FORCE OOB error for testing
        PBL_THICK(-1000000,J)   = BLTHIK

Removing this line will fix the error.

--Bob Yantosca (talk) 22:19, 6 January 2017 (UTC)

Invalid memory access

A segmentation fault can also happen if GEOS-Chem makes an reference to a memory location that is invalid. You may see an error message such as this:

severe (174): SIGSEGV, segmentation fault occurred
This message indicates that the program attempted an invalid memory reference.
Check the program for possible errors.

This can happen if you are trying to read data from a file into an array, but the array is too small to hold all of the data. You can use a debugger (such as Totalview or IDB) to try to diagnose the situation. You may receive an error message from the debugger similar to this one:

 Thread received signal SEGV
 stopped at [<opaque> for_read_seq_xmit(...) 0x40000000006b6500] 
 
 Information:  An <opaque> type was presented during execution of 
 the previous command.  For complete type information on this symbol,
 recompilation of the program will be necessary.  Consult the compiler
 man pages for details on producing full symbol table information using   
 the '-g' (and '-gall' for cxx) flags.

Usually, increasing the size of the array (i.e. until it is large enough to contain all of the data) will fix this problem.

--Bob Y. 15:57, 22 June 2012 (EDT)

Stack overflow

Finally, a segmentation fault can happen if GEOS-Chem uses up all of the available stack memory on your system. The stack memory is a special part of the memory where short-term variables get stored.

The compiler will typically place into the stack memory all local temporary variables, such as:

  • variables that are local to a given subroutine
  • variables that are NOT located within a COMMON block
  • variables that are NOT declared with the SAVE attribute
  • variables that are NOT declared as an ALLOCATABLE array
  • variables that are NOT declared as a POINTER variable or array

Therefore, it is important to make sure that your computational environment is set up to use the maximum amount of stack memory. You can do this by placing the following line in your .cshrc file:

limit stacksize unlimited

or .bashrc file:

 ulimit -s unlimited

If you encounter a SIGSEGV(174) message due to a stacksize memory error, you may see the following error text:

severe (174): SIGSEGV, possible program stack overflow occurred
Program requirements exceed current stacksize resource limit.

--Bob Y. 15:57, 22 June 2012 (EDT)

forrtl: error (76): IOT trap signal

Xun Jiang wrote:

We met the following error message
   forrtl: severe (174): SIGSEGV, segmentation fault occurred

   Stack trace terminated abnormally.
   forrtl: error (76): IOT trap signal

   Note: The error appears after
   - RDSOIL: Reading
   Data/GEOS_2x2.5/soil_NOx_200203/climatprep2x25.dat
   ### MAIN: a DAILY DATA
I have the following lines in .cshrc
   setenv KMP_STACKSIZE 329033024
   limit cputime     unlimited
   limit datasize    unlimited
   limit stacksize   unlimited
   limit filesize    unlimited
   limit memoryuse   unlimited
   limit descriptors unlimited
However, it still doesn't work. Any suggestion is really appreciated.

Bob Yantosca replied:

I found this internet post which has an explanation:
   Cause: 
   The stack size for child threads are overflowing.  The main stack size for the program 
   is changed by the ulimit command (in Bash shell) or limit command (in C shell). 
   However this environment variable does not set the size for the child thread stack size. 
   Thus the child thread stack overflow.

   Solution:
   Set the environment variables to increase the child thread stack size.

   #for intel, using bash shell
   export KMP_STACKSIZE=500000000

   # for intel, using csh or tcsh shell
   setenv KMP_STACKSIZE 500000000
For more information, please see our wiki post on Resetting the stack size for Linux.

--Bob Y. 11:20, 26 June 2012 (EDT)

Segmentation fault encountered after TPCORE initialization

You may encounter a segmentation fault right after the following text is printed.

NASA-GSFC Tracer Transport Module successfully initialized

This error usually occurs when:

  1. You are running GEOS-Chem at sufficiently fine resolution, such as 2° x 2.5° or finer. (Many users have reported that this error does not occur at 4° x 5° resolution.)
  2. You are using a large number of advected tracers.
  3. Both #1 and #2

If you are using the Intel Fortran Compiler, the cause of this error can likely be traced to a known issue with the the glibc library. This will cause GEOS-Chem to think that it has used up all of the available memory, when in fact there is plenty of memory still available. However, you may also encounter this same error even if you have compiled GEOS-Chem with a different compiler.

You can usually correct this error by manually telling your system to use the maximum amount of stack memory when running GEOS-Chem. For detailed instructions, please follow this link.

--Bob Yantosca (talk) 14:29, 20 September 2022 (UTC)



Previous | Next | Guide to GEOS-Chem error messages