Difference between revisions of "Machine issues & portability"

From Geos-chem
Jump to: navigation, search
(Sun Studio Compiler)
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
On this page we list the compiler-dependent and platform-dependent issues that we have recently encountered.
+
#REDIRECT [[GEOS-Chem_basics#Fortran resources]]
 
+
== IFORT Compiler ==
+
 
+
The Intel Fortran Compiler (aka IFORT) is our recommended compiler for GEOS-Chem.  We have created a [[Intel Fortran Compiler|separate wiki page for Intel Fortran Compiler]] topics. 
+
 
+
Please see the [[Intel Fortran Compiler#Known issues|list of known issues with the Intel Fortran Compiler]] on this page.
+
 
+
--[[User:Bmy|Bob Y.]] 16:12, 29 February 2012 (EST)
+
 
+
== PGI Compiler ==
+
 
+
=== Compatibility issues added in v9-01-02 ===
+
 
+
Please see [[GEOS-Chem v9-01-02#Bug fixes for compatibility with the PGI compiler|this list of minor bugs that were corrected]] in order to get the GEOS-Chem code to compile with the PGI compiler.
+
 
+
--[[User:Bmy|Bob Y.]] 10:04, 23 February 2012 (EST)
+
 
+
=== Error with ADJUSTL and ADJUSTR ===
+
 
+
'''''[mailto:win@cmu.edu Win Trivitayanurak] wrote:'''''
+
 
+
<blockquote>
+
In short, TRIM and ADJUSTL or ADJUSTR do not work together properly when compiled with Portland Group Fortran.  I propose removing TRIM inside the subroutine StrSqueeze.  This is not urgent and relevant to only the few PGI users.
+
</blockquote>
+
 
+
So if you are using the PGI compiler, then you will have to modify the code in routine STRSQUEEZE "charpak_mod.f" such that the statements
+
 
+
STR = ADJUSTR( TRIM( STR ) )
+
STR = ADJUSTL( TRIM( STR ) )
+
 
+
are now replaced with
+
 
+
STR = ADJUSTR( STR )
+
STR = ADJUSTL( STR )
+
 
+
and this will solve the problem.  We will incorporate this into a future release of GEOS-Chem.
+
 
+
=== Setting the stacksize ===
+
 
+
You may encounter an "out-of-memory" error (which may manifest itself as a segmentation fault) if your simulation uses a fine-resolution grid and/or a large number of advected tracers.  This error can occur if your simulation is not using all of the available stack memory.  Recall that the OpenMP parallelization utilizes the stack memory for storage of PRIVATE and THREADPRIVATE variables.
+
 
+
You can usually solve this problem by telling your simulation to use all of the available stack memory on the system.  Here is an example script, intended for use with the PGI compiler:
+
 
+
'''''[mailto:yfq@asrc.cestm.albany.edu Fangqun Yu] wrote:'''''
+
 
+
:Attached is a csh script (also copied below) we use to run the [[APM aerosol microphysics]] within GEOS-Chem. It works on our 8-core Linux machine with pgi compiler for all GEOS5 4x5, 2x2.5, and nested grid simulations.
+
 
+
    #!/bin/csh
+
    setenv NCPUS 8
+
    setenv MPSTKZ 1024M
+
    limit stacksize unlimited
+
    geos
+
 
+
NOTES:
+
#The <tt>limit stacksize unlimited</tt> command will set the Unix stack size memory to its maximum value.
+
#The PGI compiler environment variable <tt>MPSTKZ</tt> increases the stack size for threads executing in parallel regions.  Please see [http://www.cs.utk.edu/~lucio/pet/compilerguides/pgi-compiler-guide.htm#openmp_related the Online PGI compiler manual] for more information.
+
 
+
--[[User:Bmy|Bob Y.]] 15:49, 14 December 2010 (EST)
+
 
+
== Sun Studio Compiler ==
+
 
+
'''''NOTE: The Sun Studio compilers are now considered deprecated and are not officially supported.  The relevant code and Makefiles exist to build GEOS-Chem with Sun Studio are still present.  However, you may find that recent versions of GEOS-Chem may not compile without errors on Sun Studio.'''''
+
 
+
=== Use "sunf90" and "sunf95"  ===
+
 
+
'''''[mailto:jhy@as.harvard.edu Jack Yatteau] wrote:'''''
+
 
+
:In order to do the (Baselibs) installation, I had to make the gfortran (gnu) version of f90 the default .  This conflicts with the name f90 in the SunStudio12 path.  I also discovered that there was already a name conflict between the gnu version of f95 and SunStudio12 version. 
+
 
+
:Users can avoid this by using the names sunf90 and sunf95 (e.g. in their makefile).  Sun must have placed the names sunf90 and sunf95 in the Linux installation of SunStudio12 to cover just this situation.
+
 
+
=== Apparent "out of memory" error ===
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] wrote:'''''
+
 
+
:I started a 2x2.5 v8-01-01 run with 54 tracers (including SOA) this morning.  It gets through initialization and then fails when chemistry starts (I've attached the log so you can see how far it gets).  The error message that gets dumped out by my system is as follows:
+
 
+
    ******  FORTRAN RUN-TIME SYSTEM  ******
+
    Error 12:  Not enough space
+
    Location:  the AUTOMATIC variable declaration at line 586 of "tpcore_fvdas_mod.f90"
+
 
+
:I didn't run into this with an almost identical v7-04-13 run and I double-checked that all my directories are not close to quota.  I repeated the v8-01-01 run at 4x5 and it also ran no problem.  Have either of you tested v8-01-01 at 2x2.5?  Have you seen any tpcore related problems?
+
 
+
'''''[mailto:plesager@seas.harvard.edu Philippe Le Sager] replied:'''''
+
 
+
:You are using GEOS-5 met field and your simulation has 47 levels instead of 30 with GEOS3 and 4. This is more than 50% increase, which affect the memory requirement. Using GEOS5 with secondary aerosols at 2x2.5 requires a lot of memory, and it seems you break the bank. You should try to turn off some diagnostic, particularly those (like met fields) you can get with another quick run without chemistry/transport.
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] replied:'''''
+
 
+
:Thanks for your email.  That certainly makes sense and I have managed to get it to run (slowly!) by removing diagnostics.  What memory is the issue here?  I've got 8 Gb of RAM on the Sun 4200's (2 dual-core = 4 proc) I'm running with which seems comparable/superior to what other folks are using with GEOS-Chem.  Should I be looking for more RAM for future system purchases?  Or is there something else that would improve the model performance?
+
 
+
'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] replied:'''''
+
 
+
:I think the relevant memory limit is the stacksize.  This is the part of the memory where all automatic variables (i.e. variables in a subroutine or function that aren't in a common block, SAVE, or at the top of a module) are created on most compilers.
+
 
+
:Also, one of the issues is that when you enter parallel loops, then your memory use will increase.  For example, if you are using 4 processors then you need to make 4 copies of every PRIVATE variable in the DO-loop.  Most compilers (at least I know IFORT does) will use the stack memory for this.
+
 
+
:The -stackvar option in the [http://docs.sun.com/app/docs/doc/819-5263 SunStudio compiler] puts automatic arrays on the stack whenever possible. 
+
+
:Maybe you can play w/ the stacksize limit on your machine (set it to "unlimited" in your .cshrc) and that might help.
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] replied:'''''
+
 
+
:Setting the stacksize to unlimited in my run script did the trick - I now have a GEOS5 run with a full suite of diagnostics running on my SUN machines no problem.
+
 
+
--[[User:Bmy|Bob Y.]] 09:21, 2 May 2008 (EDT)
+
 
+
=== "Not enough space" error in TPCORE ===
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] wrote:'''''
+
 
+
:I just replaced the <tt>tpcore_fvdas_mod.f90</tt> with the patch for v8-01-04 to speed up the code.  However with this update my code is crashing during the run.  I'm running with Solaris on Sun machines.  The last entry in the log is:
+
 
+
    ===============================================================================
+
    TPCORE_FVDAS (based on GMI) Tracer Transport Module successfully initialized
+
    ===============================================================================
+
 
+
:The error file I get out from the run indicates:
+
 
+
    ******  FORTRAN RUN-TIME SYSTEM  ******
+
    Error 12:  Not enough space
+
    Location:  the AUTOMATIC variable declaration at line 565 of "tpcore_fvdas_mod.f90"
+
 
+
:Have you folks seen anything like this?  Any suggestions for how to proceed?
+
 
+
'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] wrote:'''''
+
 
+
:This is traced to:
+
 
+
    ! fx, fy, fz and qtemp are now 4D arrays for parallelization purposes.
+
    ! (ccc, 4/1/09)
+
    REAL*8            :: fx (im, jm, km, nq)
+
    REAL*8            :: fy (im, jm+1, km, nq)          ! one more for edges
+
    REAL*8            :: fz  (im, jm, km, nq)
+
    REAL*8            :: qtemp (im, jm, km, nq)
+
    <<< it dies here>>>     
+
    REAL*8            :: DTC(IM,JM,KM)              ! up/down flux temp array
+
 
+
:so what that means is that you don't have enough memory to make this array.  That may be expected since we now have a bunch more 4-D arrays than before.  You've maxed out your stacksize setting, right?  Otherwise it may be a system or compiler dependent error.
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] replied:'''''
+
 
+
:Just an fyi, it turns out that my memory issue arose because my code was compiled as 32-bit instead of 64-bit (apparently with 64-bit I can access more swap space).  I added the <tt>-m64</tt> switch to the compile options and that did the trick.
+
 
+
'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] replied:'''''
+
 
+
:Another thing you can try...in the Makefile.ifort we have:
+
 
+
    # Generic flags for all machines
+
    FFLAGS = -fpp -fast -stackvar -xfilebyteorder=big16:%all
+
+
    # Flags specific to TERRA
+
    FARCH  = -xtarget=opteron -xchip=opteron -xarch=generic64
+
+
    # Flags specific to TETHYS or CERES
+
    #FARCH  = -xtarget=opteron -xchip=opteron -xarch=sse3a
+
+
    # Flags specific to the older SPARC machines
+
    #FARCH  = -fpp -O4 -xarch=v9
+
+
    # Compile command -- multiprocessor
+
    F90    = f90 $(FFLAGS) $(FARCH) -openmp=parallel -Dmultitask
+
+
    # Compile command -- single processor
+
    #F90  = f90 $(FFLAGS) $(FARCH)
+
 
+
:The <tt>$(FARCH)</tt> variable sets stuff up for the specific chipset that you have.  This varies from machine to machine, we had a couple of options for our machines at Harvard.
+
 
+
:However, you can figure this out the proper options settings for your system with the following command:
+
 
+
    > sunf90 -native -dryrun
+
    ###    command line files and options (expanded):
+
    ### -xarch=ssse3 -xcache=32/64/8:4096/64/16 -xchip=core2 -dryrun
+
 
+
:so then you can use those values for the -xarch, -xchip, -xcache, etc. keywords.  That may obviate the need for having to use <tt>-m64</tt>, and you'll get precisely the options that will optimize well for your chipset.
+
 
+
--[[User:Bmy|Bob Y.]] 11:22, 24 April 2009 (EDT)
+

Latest revision as of 21:54, 10 January 2019