Difference between revisions of "Machine issues & portability"

From Geos-chem
Jump to: navigation, search
(Comparison between IFORT 9.1 and IFORT 10.1)
(Comparison between IFORT 9.1 and IFORT 10.1)
Line 11: Line 11:
 
* 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
 
* 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
 
* Base compiler options: <tt>-cpp -w -auto -noalign -convert big_endian</tt>
 
* Base compiler options: <tt>-cpp -w -auto -noalign -convert big_endian</tt>
 +
  
 
{| border=1 cellpadding=5 cellspacing=0
 
{| border=1 cellpadding=5 cellspacing=0
 
|- bgcolor="#CCCCCC" align="center"
 
|- bgcolor="#CCCCCC" align="center"
 
! #
 
! #
! IFORT version
+
! IFORT
! NCPUs
+
! # CPUs
 
! Optimization options
 
! Optimization options
 
! Wall clock (mm:ss)
 
! Wall clock (mm:ss)
! Speedup
+
! Speedup from<br>9.1 to 10.1
! Mean OH (1e5 molec/cm3)
+
! Speedup from<br>4 to 8 CPUs,<br>same version
 +
! Mean OH<br>(1e5 molec/cm3)
 
|- align="center"
 
|- align="center"
 
| 1
 
| 1
Line 27: Line 29:
 
| -O2     
 
| -O2     
 
| 36:16
 
| 36:16
 +
|&nbsp;
 
|&nbsp;
 
|&nbsp;
 
| 11.2913755849576
 
| 11.2913755849576
Line 36: Line 39:
 
| 33:55
 
| 33:55
 
| 6.48%
 
| 6.48%
 +
|&nbsp;
 
| 11.2913755842197
 
| 11.2913755842197
 
|- align="center"
 
|- align="center"
Line 43: Line 47:
 
| -O3
 
| -O3
 
| 37:26
 
| 37:26
 +
|&nbsp;
 
|&nbsp;
 
|&nbsp;
 
| 11.2913755849576
 
| 11.2913755849576
Line 52: Line 57:
 
| 33:36
 
| 33:36
 
| 10.24%
 
| 10.24%
 +
|&nbsp;
 
| 11.2913755838124
 
| 11.2913755838124
 
|- align="center"
 
|- align="center"
Line 60: Line 66:
 
| 24:15
 
| 24:15
 
|&nbsp;
 
|&nbsp;
 +
| 33.13%
 
| 11.2913755849576
 
| 11.2913755849576
 
|- align="center" bgcolor="#CCFFFF"
 
|- align="center" bgcolor="#CCFFFF"
Line 68: Line 75:
 
| 22:46
 
| 22:46
 
| 6.12%
 
| 6.12%
 +
| 32.88%
 
| 11.2913755842197
 
| 11.2913755842197
 
|- align="center"  
 
|- align="center"  
Line 76: Line 84:
 
| 23:36
 
| 23:36
 
|&nbsp;
 
|&nbsp;
 +
|
 
| 11.2913755849576
 
| 11.2913755849576
 
|- align="center" bgcolor="#CCFFFF"
 
|- align="center" bgcolor="#CCFFFF"
Line 102: Line 111:
 
| 11.0809209646817
 
| 11.0809209646817
 
|}
 
|}
 +
 +
 +
NOTES:
 +
# The compiler options <tt>-O3 -ipo -non-prec-div -static</tt> correspond to the -fast optimization option. 
 +
# With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration.  The bpch files of the runs were binary identical to each other.
 +
# IFORT 10.1 is always faster than the equivalent run in IFORT 9.1.  IFORT 10.1 seems to optimize code better for multi-core chipsets.
 +
# In general, on the same # of CPU's, switching from -O2 to -O3 does not result in a significant speedup.  This is true for both IFORT 9.1 and IFORT 10.1
 +
# With IFORT 10.1,
  
 
=== 26-Mar-2008 ===
 
=== 26-Mar-2008 ===

Revision as of 17:40, 16 April 2008

On this page we list the compiler-dependent and platform-dependent issues that we have recently encountered.

IFORT Compiler

Comparison between IFORT 9.1 and IFORT 10.1

The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare Intel Fortran Compiler (IFORT) v9.1 vs. v10.1. The simulations had all these things in common:

  • GEOS-Chem v8-01-01
  • 4x5 GEOS-5 met fields
  • 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
  • Base compiler options: -cpp -w -auto -noalign -convert big_endian


# IFORT # CPUs Optimization options Wall clock (mm:ss) Speedup from
9.1 to 10.1
Speedup from
4 to 8 CPUs,
same version
Mean OH
(1e5 molec/cm3)
1 9.1 4 -O2 36:16     11.2913755849576
2 10.1 4 -O2 33:55 6.48%   11.2913755842197
3 9.1 4 -O3 37:26     11.2913755849576
4 10.1 4 -O3 33:36 10.24%   11.2913755838124
5 9.1 8 -O2 24:15   33.13% 11.2913755849576
6 10.1 8 -O2 22:46 6.12% 32.88% 11.2913755842197
7 9.1 8 -O3 23:36   11.2913755849576
8 10.1 8 -O3 22:31 4.59% 11.2913755838124
9 9.1 8 -O3 -ipo -no-prec-div -static 23:03   11.2913764967223
10 10.1 8 -O3 -ipo -no-prec-div -static 21:56 4.84% 11.0809209646817


NOTES:

  1. The compiler options -O3 -ipo -non-prec-div -static correspond to the -fast optimization option.
  2. With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration. The bpch files of the runs were binary identical to each other.
  3. IFORT 10.1 is always faster than the equivalent run in IFORT 9.1. IFORT 10.1 seems to optimize code better for multi-core chipsets.
  4. In general, on the same # of CPU's, switching from -O2 to -O3 does not result in a significant speedup. This is true for both IFORT 9.1 and IFORT 10.1
  5. With IFORT 10.1,

26-Mar-2008

Harvard users need to also make a similar fix as described below. Copy this code into your .cshrc file under the "Linux" section:

   #--------------------------------------------------------------------------
   # Due to a limitation in the glibc library that is used by the Intel IFORT 
   # v9.x compilers, you must do the following in order to avoid potential 
   # memory problems with OpenMP:
   #
   # (1) Explicitly set the "stacksize" limit to a large positive number
   #      instead of to "unlimited".
   #
   # (2) Explicitly set the "KMP_STACKSIZE" environment variable to a large
   #      positive number (but not so large that you get an error msg.)
   #
   # For more information see the Intel IFORT release notes:
   #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
   #
   # The symptom will be that GEOS-Chem will appear to be out of memory and 
   # will die with a segmentation fault.
   #
   # Only reset the stacksize on Ceres & Tethys, since these are the only
   # 2 machines on which we will be running GEOS-Chem.
   #
   # (bmy, 3/31/08)
   #--------------------------------------------------------------------------

   # Test if this is Ceres or Tethys (regardless of .as.harvard.edu etc.)
   set resetstack = `perl -e '$a=qx(uname -n); if ($a=~"ceres" or $a=~"tethys") {print 1;} else {print 0;}'`

   # Only reset stacksize limits on Ceres or Tethys
   if ( $resetstack == 1 ) then
      limit  stacksize     10000000000
      setenv KMP_STACKSIZE 100000000
   endif                   

   # Undefine 
   unset resetstack

07-Jan-2008

(1) The IFORT compiler has an error that can cause the GEOS-Chem to appear that it is running out of memory when it actually isn't. The symptom that we have noticed is that it seems to choke right when the TPCORE is called. This may tend to happen more often IFORT v9 or v10 on Linux Boxes, but it can also happen on Altix/Itanium systems.

If GEOS-Chem still crashes with the thiserror, then you may need to set the stacksize variable to a large positive # instead of unlimited. This is a known issue with the POSIX glibc library that is used by IFORT.

Try adding this code to your .cshrc file as well under the "Altix" section:

   #--------------------------------------------------------------------------
   # Due to a limitation in the glibc library that is used by the Intel IFORT 
   # v9.x compilers, you must do the following in order to avoid potential 
   # memory problems with OpenMP:
   #
   # (1) Explicitly set the "stacksize" limit to 2097152 kbytes (which is
   #      the max allowable value) instead of to "unlimited".
   #
   # (2) Explicitly set the "KMP_STACKSIZE" environment variable to a 
   #      large positive number (e.g. 209715200).
   # 
   # For more information see the Intel IFORT release notes:
   #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
   #
   # The symptom will be that GEOS-Chem will appear to be out of memory and 
   # will die with a segmentation fault.  This may happen especially if you
   # are running GEOS-Chem with GEOS-5 met on Altix or Titan.
   #
   # (bmy, 8/16/07, 3/13/08)
   #--------------------------------------------------------------------------
   limit  stacksize     2097152 kbytes
   setenv KMP_STACKSIZE 209715200

The 2097152 is the maximum allowable stacksize on the Harvard Altix/Itanium system. This may be different on your system. You can find out the maximum stacksize on your machine by typing "limit" at the Unix prompt. On your machine the number may vary. Then just cut-n-paste this number and replace the "2097152 kbytes" in the text above and put that into your .cshrc or .bashrc.

(2) If you are using the IFORT 10.x compilers, then you might also need to tell the compiler to put automatic arrays into heap memory instead of on the stack.

Mike Seymour (seymour@atmosp.physics.utoronto.ca) wrote:

I found this Intel page regarding stack sizes and ifort >=8.0:

http://www.intel.com/support/performancetools/fortran/sb/cs-007790.htm.

It suggests for ifort 10.0 to use the heap for temporary storage with -heap-arrays <size>, where arrays known at compile-time to be larger than <size> are allocated on the heap instead of the stack.

However, setting <size> to be 1000 does not change things. I don't know if smaller values will have an effect, or if there will be performance issues.

PGI Compiler

02-Jul-2007

Win Trivitayanurak (win@cmu.edu) wrote:

In short, TRIM and ADJUSTL or ADJUSTR do not work together properly when compiled with Portland Group Fortran. I propose removing TRIM inside the subroutine StrSqueeze. This is not urgent and relevant to only the few PGI users.

So if you are using the PGI compiler, then you will have to modify the code in routine STRSQUEEZE "charpak_mod.f" such that the statements

STR = ADJUSTR( TRIM( STR ) )
STR = ADJUSTL( TRIM( STR ) )

are now replaced with

STR = ADJUSTR( STR )
STR = ADJUSTL( STR )

and this will solve the problem. We will incorporate this into a future release of GEOS-Chem.

SGI-MIPS Compiler

Sun Studio Compiler