Difference between revisions of "Segmentation faults"

From Geos-chem
Jump to: navigation, search
(Overview)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]]'''''
+
This content has been moved to our [https://geos-chem.readthedocs.io/en/latest/geos-chem-shared-docs/supplemental-guides/error-guide.html#segmentation-faults-and-similar-errors '''''Understand what error messages mean'''''] supplemental guide at [https://geos-chem.readthedocs.io '''<tt>geos-chem.readthedocs.io</tt>''']
 
+
#[[Understanding the different categories of errors]]
+
#[[Compile-time warnings and errors]]
+
#[[Run-time crashes and abnormal exits]]
+
#<span style="color:blue">'''Segmentation faults'''</span>
+
#[[Other less-common errors]]
+
 
+
 
+
== Overview ==
+
 
+
<span style="color:darkorange">'''''We have migrated bug reports to our GEOS-Chem issue tracker, which is located on our Github repository: https://github.com/geoschem/geos-chem/issues/. We recommend that you also look through both the open and closed issues on this page, as your issue might be listed there.'''''</span>
+
----
+
<span style="color:darkorange">'''''Also note: on this page we shall use the Intel Fortran Compiler error messages.  You may get a slightly different error message if you are using a different compiler (such as GNU Fortran).'''''</span>
+
----
+
 
+
If your simulation dies with a '''segmentation fault''' error, this means that GEOS-Chem tried to access an [http://stackoverflow.com/questions/2346806/what-is-segmentation-fault invalid memory location].  A segmentation fault error message looks similar to this:
+
 
+
forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
 
+
but may differ depending on the compiler version you are using.  Segmentation faults can be due to several causes, as shown in the following sections.
+
 
+
=== Array-out-of-bounds error ===
+
 
+
Most often, a segmentation fault indicates an array out-of-bounds condition.  To find out more information about where this error is occurring, recompile GEOS-Chem with the following Makefile options:
+
 
+
make realclean
+
make BOUNDS=yes TRACEBACK=yes
+
 
+
The <tt>BOUNDS=yes</tt> option will turn on '''Array Out-of-Bounds''' error checking.  The <tt>TRACEBACK=yes</tt> option will print out the '''Error Stack''', as [[#Traceback error stack|described above]].  These options will provide more detailed error output.
+
 
+
After recompiling, you should receive an error message such as:
+
 
+
forrtl: severe (408): fort: (3): Subscript #1 of the array PBL_THICK has value -1000000 which is less than the lower bound of 1
+
 
+
This tells you that there is a problem with a certain array.  Use the Unix <tt>grep</tt> command to search for all instances of this array in the GEOS-Chem source code:
+
 
+
grep -i PBL_THICK *.f*
+
 
+
and search for the problem. 
+
 
+
NOTE: In the above example, we manually forced an out-of-bounds error with this line of code:
+
 
+
        !### FORCE OOB error for testing
+
        PBL_THICK(-1000000,J)  = BLTHIK
+
 
+
Removing this line will fix the error.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:19, 6 January 2017 (UTC)
+
 
+
=== Invalid memory access ===
+
 
+
A segmentation fault can also happen if GEOS-Chem makes an reference to a memory location that is invalid.  You may see an error message such as this:
+
 
+
severe (174): SIGSEGV, segmentation fault occurred
+
This message indicates that the program attempted an invalid memory reference.
+
Check the program for possible errors.
+
 
+
This can happen if you are trying to read data from a file into an array, but the array is too small to hold all of the data.  You can use a debugger (such as Totalview or IDB) to try to diagnose the situation.  You may receive an error message from the debugger similar to this one: 
+
 
+
  Thread received signal SEGV
+
  stopped at [<opaque> for_read_seq_xmit(...) 0x40000000006b6500]
+
 
+
  Information:  An <opaque> type was presented during execution of
+
  the previous command.  For complete type information on this symbol,
+
  recompilation of the program will be necessary.  Consult the compiler
+
  man pages for details on producing full symbol table information using 
+
  the '-g' (and '-gall' for cxx) flags.
+
 
+
Usually, increasing the size of the array (i.e. until it is large enough to contain all of the data) will fix this problem.
+
 
+
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
+
 
+
=== Stack overflow ===
+
 
+
Finally, a segmentation fault can happen if GEOS-Chem uses up all of the available [http://en.wikipedia.org/wiki/Stack-based_memory_allocation stack memory] on your system. The stack memory is a special part of the memory where short-term variables get stored.
+
 
+
The compiler will typically place into the stack memory all local temporary variables, such as:
+
 
+
* variables that are local to a given subroutine
+
* variables that are NOT located within a <tt>COMMON</tt> block
+
* variables that are NOT declared with the <tt>SAVE</tt> attribute
+
* variables that are NOT declared as an <tt>ALLOCATABLE</tt> array
+
* variables that are NOT declared as a <tt>POINTER</tt> variable or array
+
 
+
Therefore, it is important to make sure that your computational environment is set up to use the maximum amount of stack memory.  You can do this by placing the following line in your <tt>.cshrc</tt> file:
+
 
+
limit stacksize unlimited
+
 
+
or <tt>.bashrc</tt> file:
+
 
+
  ulimit -s unlimited
+
 
+
If you encounter a <tt>SIGSEGV(174)</tt> message due to a stacksize memory error, you may see the following error text:
+
 
+
severe (174): SIGSEGV, possible program stack overflow occurred
+
Program requirements exceed current stacksize resource limit.
+
 
+
--[[User:Bmy|Bob Y.]] 15:57, 22 June 2012 (EDT)
+
 
+
=== forrtl: error (76): IOT trap signal ===
+
 
+
'''''[mailto:xun@gps.caltech.edu Xun Jiang] wrote:'''''
+
 
+
:We met the following error message
+
 
+
    forrtl: severe (174): SIGSEGV, segmentation fault occurred
+
+
    Stack trace terminated abnormally.
+
    forrtl: error (76): IOT trap signal
+
+
    Note: The error appears after
+
    - RDSOIL: Reading
+
    Data/GEOS_2x2.5/soil_NOx_200203/climatprep2x25.dat
+
    ### MAIN: a DAILY DATA
+
 
+
:I have the following lines in <tt>.cshrc</tt>
+
 
+
    setenv KMP_STACKSIZE 329033024
+
    limit cputime    unlimited
+
    limit datasize    unlimited
+
    limit stacksize  unlimited
+
    limit filesize    unlimited
+
    limit memoryuse  unlimited
+
    limit descriptors unlimited
+
 
+
:However, it still doesn't work. Any suggestion is really appreciated.
+
 
+
'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] replied:'''''
+
 
+
:I found [http://xtechnotes.blogspot.com/2006/01/1001-most-idiotic-error-messages.html this internet post] which has an explanation:
+
 
+
    Cause:
+
    The stack size for child threads are overflowing. The main stack size for the program
+
    is changed by the ulimit command (in Bash shell) or limit command (in C shell).
+
    However this environment variable does not set the size for the child thread stack size.
+
    Thus the child thread stack overflow.
+
+
    Solution:
+
    Set the environment variables to increase the child thread stack size.
+
+
    #for intel, using bash shell
+
    export KMP_STACKSIZE=500000000
+
+
    # for intel, using csh or tcsh shell
+
    setenv KMP_STACKSIZE 500000000
+
 
+
:For more information, please see our wiki post on [[Intel Fortran Compiler#Resetting stacksize for Linux|Resetting the stack size for Linux]].
+
 
+
--[[User:Bmy|Bob Y.]] 11:20, 26 June 2012 (EDT)
+
 
+
=== Segmentation fault encountered after TPCORE initialization ===
+
 
+
You may encounter a segmentation fault right after the following text is printed.
+
 
+
NASA-GSFC Tracer Transport Module successfully initialized
+
 
+
This error usually occurs when:
+
 
+
# You are running GEOS-Chem at sufficiently fine resolution, such as 2&deg; x 2.5&deg; or finer.  (Many users have reported that this error does not occur at 4&deg; x 5&deg; resolution.)
+
# You are using a large number of advected tracers.
+
# Both #1 and #2
+
 
+
If you are using the [[Intel Fortran Compiler]], the cause of this error can likely be traced to a known issue with the the <tt>glibc</tt> library.  This will cause GEOS-Chem to think that it has used up all of the available memory, when in fact there is plenty of memory still available.  However, you may also encounter this same error even if you have compiled GEOS-Chem with a different compiler.
+
 
+
You can usually correct this error by manually telling your system to use the maximum amount of stack memory when running GEOS-Chem.  For detailed instructions, please see the following links:
+
 
+
#[[Intel Fortran Compiler#Resetting stacksize for Linux|Setting stacksize for the Intel Fortran Compiler (aka "IFORT")]]
+
#[[Machine_issues_%26_portability#Setting_the_stacksize|Setting stacksize for the PGI Compiler]]
+
#[[Machine_issues_%26_portability#.22Not_enough_space.22_error_in_TPCORE|Setting stacksize for the Sun Studio compiler]]
+
 
+
--[[User:Bmy|Bob Y.]] 16:07, 14 December 2010 (EST)
+
 
+
 
+
----
+
'''''[[Run-time crashes and abnormal exits|Previous]] | [[Other less-common errors|Next]] | [[Guide to GEOS-Chem error messages]]'''''
+

Latest revision as of 15:22, 13 July 2023

This content has been moved to our Understand what error messages mean supplemental guide at geos-chem.readthedocs.io