Difference between revisions of "Machine issues & portability"
m (→Resetting stacksize for Linux) |
m (→Resetting stacksize for Linux) |
||
Line 153: | Line 153: | ||
==== Resetting stacksize for Linux ==== | ==== Resetting stacksize for Linux ==== | ||
− | If you are using IFORT on a Linux machine, you will have to make a similar fix to your .cshrc file as was [[#07-Jan-2008|as was described below for the Altix/Itanium platform]]. | + | If you are using IFORT on a Linux machine, you will have to make a similar fix to your <tt>.cshrc</tt> file as was [[#07-Jan-2008|as was described below for the Altix/Itanium platform]]. |
* Harvard users: add these lines of code into your .cshrc file in the "Linux login machines and clusters" section: | * Harvard users: add these lines of code into your .cshrc file in the "Linux login machines and clusters" section: |
Revision as of 16:36, 3 July 2008
On this page we list the compiler-dependent and platform-dependent issues that we have recently encountered.
IFORT Compiler
Comparison between IFORT 9.1 and IFORT 10.1
The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare Intel Fortran Compiler (IFORT) v9.1 vs. v10.1.013. The simulations had all these things in common:
- GEOS-Chem v8-01-01
- 4x5 GEOS-5 met fields
- 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
- Base compiler options: -cpp -w -auto -noalign -convert big_endian
- Runs were done on the Harvard "Ceres" cluster (OS type "linux-rhel5-x86_64")
Run | IFORT version |
# CPUs | Optimization options | Wall clock (mm:ss) |
Speedup from IFORT 9.1 to IFORT 10.1 |
Speedup from 4 to 8 CPUs w/ the same compiler |
Mean OH (1e5 molec/cm3) |
---|---|---|---|---|---|---|---|
1 | 9.1 | 4 | -O2 | 36:16 | 11.2913755849576 | ||
2 | 10.1 | 4 | -O2 | 33:55 | 6.48% | 11.2913755842197 | |
3 | 9.1 | 4 | -O3 | 37:26 | 11.2913755849576 | ||
4 | 10.1 | 4 | -O3 | 33:36 | 10.24% | 11.2913755838124 | |
5 | 9.1 | 8 | -O2 | 24:15 | 33.13% | 11.2913755849576 | |
6 | 10.1 | 8 | -O2 | 22:46 | 6.12% | 32.88% | 11.2913755842197 |
7 | 9.1 | 8 | -O3 | 23:36 | 36.95% | 11.2913755849576 | |
8 | 10.1 | 8 | -O3 | 22:31 | 4.59% | 32.99% | 11.2913755838124 |
9 | 9.1 | 8 | -O3 -ipo -no-prec-div -static | 23:03 | 11.2913764967223 | ||
10 | 10.1 | 8 | -O3 -ipo -no-prec-div -static | 21:56 | 4.84% | 11.0809209646817 |
NOTES about the table:
- The column Speedup from IFORT 9.1 to IFORT 10.1 compares the wall clock time of equivalent runs done with IFORT 9.1 and IFORT 10.1. For example, the 6.48% speedup listed for Run #2 is comparing Run #2 to Run #1. Similarly Run #4 is compared against Run #3, etc.
- The column Speedup from 4 to 8 CPUs w/ the same compiler compares the wall clock time between runs with 4 CPUs and 8 CPUs for the same compiler (i.e. 4 CPUs on IFORT 9 vs 8 CPUs on IFORT 9, and ditto for IFORT 10). For example, the 33.13% speedup listed for Run #5 is comparing Run #5 to Run #1. Similarly, Run #6 is compared against Run #2, etc.
- The compiler options -O3 -ipo -non-prec-div -static correspond to IFORT's -fast optimization option. Using this option results in a mean OH concentration that is different than with the simpler optimization options of -O2 and -O3. This is because the -fast option sacrifices numerical accuracy for speed.
- With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration. Thus the bpch files of the runs were binary identical to each other.
- With IFORT 10.1, switching from -O2 to -O3 changes the mean OH concentration slightly. This implies that there are slight differences in the chemistry. However all runs done with -O2 have the same mean OH, as do all runs done with -O3.
PLOTS:
- Run #2 vs Run #1 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O2 on 4 CPUs)
- Run #3 vs Run #4 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O3 on 4 CPUs)
- Run #5 vs Run #9 (i.e. -O2 vs -fast with IFORT 9.1)
- Run #6 vs Run #10 (i.e. -O2 vs -fast with IFORT 10.1)
TAKE-HOME MESSAGE:
- IFORT 10.1 is always faster than the equivalent run with IFORT 9.1.
- IFORT 10.1 does indeed seem to optimize code better on machines with multi-core chipsets.
- For example: Run #6 (w/ IFORT 10) is 89 seconds faster per week than Run #5 (w/ IFORT 9) on 8 CPUs. This implies that a 52-week simulation with IFORT 10 on 8 CPUs would finish ~1hr 15m earlier than the equivalent IFORT 9 run.
- Switching from 4 to 8 CPU's results in a ~33% speedup for both IFORT 9.1 and IFORT 10.1.
- In general, switching from -O2 to -O3 (while using the same # of CPU's) does not result in a significant speedup. This is true for both IFORT 9.1 and IFORT 10.1.
OUR RECOMMENDATIONS:
- If possible, use IFORT 10.1 instead of IFORT 9.1
- Use the following compiler options (see Makefile.ifort):
- FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian
--Bob Y. 16:46, 16 April 2008 (EDT)
April 17, 2008
Upgrading to IFORT 10.1 does not seem to fix the stacksize problem listed below. You still need to manually reset the stacksize limit to a large positive number for both Linux and Altix platforms.
--Bob Y. 12:41, 25 April 2008 (EDT)
March 26, 2008
Resetting stacksize for Linux
If you are using IFORT on a Linux machine, you will have to make a similar fix to your .cshrc file as was as was described below for the Altix/Itanium platform.
- Harvard users: add these lines of code into your .cshrc file in the "Linux login machines and clusters" section:
- Non-Harvard users: add these lines of code into your .cshrc. You can edit this accordingly for your particular system. You can omit the test for the machine name if it is not relevant to your system.
#-------------------------------------------------------------------------- # Due to a limitation in the glibc library that is used by the Intel IFORT # v9.x and v10.x compilers, you must do the following in order to avoid # potential memory problems with OpenMP: # # (1) Explicitly set the "stacksize" limit to a large positive number # instead of to "unlimited". # # (2) Explicitly set the "KMP_STACKSIZE" environment variable to a large # positive number (but not so large that you get an error msg.) # # For more information see the Intel IFORT release notes: # http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm # # The symptom will be that GEOS-Chem will appear to be out of memory and # will die with a segmentation fault. # # Only reset the stacksize on Ceres & Tethys, since these are the only # 2 machines on which we will be running GEOS-Chem. # # (bmy, 3/31/08) #-------------------------------------------------------------------------- # Test if this is Ceres or Tethys (regardless of .as.harvard.edu etc.) set CeresOrTethys = `perl -e '$a=qx(uname -n); if ($a=~"ceres" or $a=~"tethys") {print 1;} else {print 0;}'` # Only reset stacksize limits on Ceres or Tethys if ( $CeresOrTethys == 1 ) then limit stacksize 10000000000 setenv KMP_STACKSIZE 100000000 endif # Undefine unset CeresOrTethys
Here, we are using an embedded Perl command to test if the machine name is Ceres or Tethys. The result is executed with the tcsh backticks `` and then the result is stored in the CeresOrTethys variable. If CeresOrTethys is 1 then we will reset the stacksize limit for the IFORT compiler.
--Bmy 10:48, 25 April 2008 (EDT)
January 7, 2008
Resetting stacksize for Altix
(1) The IFORT compiler has an error that can cause the GEOS-Chem to appear that it is running out of memory when it actually isn't. The symptom that we have noticed is that it seems to choke right when the TPCORE is called. This may tend to happen more often IFORT v9 or v10 on Linux Boxes, but it can also happen on Altix/Itanium systems.
If GEOS-Chem still crashes with the this error, then you may need to set the stacksize variable to a large positive # instead of unlimited. This is a known issue with the POSIX glibc library that is used by IFORT.
Try adding this code to your .cshrc file as well under the "Altix" section:
#-------------------------------------------------------------------------- # Due to a limitation in the glibc library that is used by the Intel IFORT # v9.x compilers, you must do the following in order to avoid potential # memory problems with OpenMP: # # (1) Explicitly set the "stacksize" limit to 2097152 kbytes (which is # the max allowable value) instead of to "unlimited". # # (2) Explicitly set the "KMP_STACKSIZE" environment variable to a # large positive number (e.g. 209715200). # # For more information see the Intel IFORT release notes: # http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm # # The symptom will be that GEOS-Chem will appear to be out of memory and # will die with a segmentation fault. This may happen especially if you # are running GEOS-Chem with GEOS-5 met on Altix or Titan. # # (bmy, 8/16/07, 3/13/08) #-------------------------------------------------------------------------- limit stacksize 2097152 kbytes setenv KMP_STACKSIZE 209715200
The 2097152 is the maximum allowable stacksize on the Harvard Altix/Itanium system. This may be different on your system. You can find out the maximum stacksize on your machine by typing "limit" at the Unix prompt. On your machine the number may vary. Then just cut-n-paste this number and replace the "2097152 kbytes" in the text above and put that into your .cshrc or .bashrc.
(2) If you are using the IFORT 10.x compilers, then you might also need to tell the compiler to put automatic arrays into heap memory instead of on the stack.
Mike Seymour (seymour@atmosp.physics.utoronto.ca) wrote:
I found this Intel page regarding stack sizes and ifort >=8.0:
It suggests for ifort 10.0 to use the heap for temporary storage with -heap-arrays <size>, where arrays known at compile-time to be larger than <size> are allocated on the heap instead of the stack.
However, setting <size> to be 1000 does not change things. I don't know if smaller values will have an effect, or if there will be performance issues.
PGI Compiler
July 2, 2007
Win Trivitayanurak (win@cmu.edu) wrote:
In short, TRIM and ADJUSTL or ADJUSTR do not work together properly when compiled with Portland Group Fortran. I propose removing TRIM inside the subroutine StrSqueeze. This is not urgent and relevant to only the few PGI users.
So if you are using the PGI compiler, then you will have to modify the code in routine STRSQUEEZE "charpak_mod.f" such that the statements
STR = ADJUSTR( TRIM( STR ) ) STR = ADJUSTL( TRIM( STR ) )
are now replaced with
STR = ADJUSTR( STR ) STR = ADJUSTL( STR )
and this will solve the problem. We will incorporate this into a future release of GEOS-Chem.
SGI-MIPS Compiler
Sun Studio Compiler
May 16, 2008
Use "sunf90" and "sunf95"
Jack Yatteau (jhy@as.harvard.edu) wrote:
- In order to do the (Baselibs) installation, I had to make the gfortran (gnu) version of f90 the default . This conflicts with the name f90 in the SunStudio12 path. I also discovered that there was already a name conflict between the gnu version of f95 and SunStudio12 version.
- Users can avoid this by using the names sunf90 and sunf95 (e.g. in their makefile). Sun must have placed the names sunf90 and sunf95 in the Linux installation of SunStudio12 to cover just this situation.
April 29, 2008
Apparent "out of memory" error
Colette Heald (heald@atmos.colostate.edu) wrote:
- I started a 2x2.5 v8-01-01 run with 54 tracers (including SOA) this morning. It gets through initialization and then fails when chemistry starts (I've attached the log so you can see how far it gets). The error message that gets dumped out by my system is as follows:
****** FORTRAN RUN-TIME SYSTEM ****** Error 12: Not enough space Location: the AUTOMATIC variable declaration at line 586 of "tpcore_fvdas_mod.f90"
- I didn't run into this with an almost identical v7-04-13 run and I double-checked that all my directories are not close to quota. I repeated the v8-01-01 run at 4x5 and it also ran no problem. Have either of you tested v8-01-01 at 2x2.5? Have you seen any tpcore related problems?
Philippe Le Sager (plesager@seas.harvard.edu) replied:
- You are using GEOS-5 met field and your simulation has 47 levels instead of 30 with GEOS3 and 4. This is more than 50% increase, which affect the memory requirement. Using GEOS5 with secondary aerosols at 2x2.5 requires a lot of memory, and it seems you break the bank. You should try to turn off some diagnostic, particularly those (like met fields) you can get with another quick run without chemistry/transport.
Colette Heald (heald@atmos.colostate.edu) replied:
- Thanks for your email. That certainly makes sense and I have managed to get it to run (slowly!) by removing diagnostics. What memory is the issue here? I've got 8 Gb of RAM on the Sun 4200's (2 dual-core = 4 proc) I'm running with which seems comparable/superior to what other folks are using with GEOS-Chem. Should I be looking for more RAM for future system purchases? Or is there something else that would improve the model performance?
Bob Yantosca (yantosca@seas.harvard.edu) replied:
- I think the relevant memory limit is the stacksize. This is the part of the memory where all automatic variables (i.e. variables in a subroutine or function that aren't in a common block, SAVE, or at the top of a module) are created on most compilers.
- Also, one of the issues is that when you enter parallel loops, then your memory use will increase. For example, if you are using 4 processors then you need to make 4 copies of every PRIVATE variable in the DO-loop. Most compilers (at least I know IFORT does) will use the stack memory for this.
- The -stackvar option in the SunStudio compiler puts automatic arrays on the stack whenever possible.
- Maybe you can play w/ the stacksize limit on your machine (set it to "unlimited" in your .cshrc) and that might help.
Colette Heald (heald@atmos.colostate.edu) replied:
- Setting the stacksize to unlimited in my run script did the trick - I now have a GEOS5 run with a full suite of diagnostics running on my SUN machines no problem.
--Bmy 09:21, 2 May 2008 (EDT)