Intel Fortran Compiler

From Geos-chem
Revision as of 20:08, 25 March 2010 by Bmy (Talk | contribs) (Timing results: IFORT 11 vs. IFORT 10)

Jump to: navigation, search

This page contains information about the Intel Fortran Compiler (aka "IFORT" compiler).

IFORT 11

Timing results: IFORT 11 vs. IFORT 10

The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare IFORT 10.1.013 vs. IFORT 11.1.069. The simulations had all these things in common:

  1. GEOS-Chem v8-02-04
  2. 4x5 GEOS-5 met fields for month of 2008/07
  3. 1-month of simulation (0 GMT 2005/07/01 to 0 GMT 2005/08/01)
  4. Base compiler options: -cpp -w -O2 -auto -noalign -convert big_endian
  5. KPP compiler turned on
  6. Linoz stratospheric chemistry turned on
Run IFORT
version
# CPUs Wall clock
(mm:ss)
Parallel % Mean OH
(1e5 molec/cm3)
1 10.1.013 4 02:10:51 384.7% 12.5205894678448
2 11.1.069 4 02:09:14 382.7% 12.5217430768752
3 10.1.013 8 01:17:13 757.1% 12.5205894678448
4 11.1.069 8 01:18:44 753.4% 12.5213489686705

Here are some plots of surface ozone from the

NOTES:

  1. The wall times and parallel % are more or less identical when moving from IFORT 10 to IFORT 11.
  2. The ideal parallelization percentages are 400% (on 4p) and 800% (on 8p).
  3. The differences in the surface ozone is approximately between IFORT 10 and IFORT 11 is due to numerical differences in the libraries and optimization. The absolute magnitude of the differences is approximately 1 ppt of ozone.

--Bob Y. 16:08, 25 March 2010 (EDT)

Problems with IFORT 11.0.xxx

You should use GEOS-Chem with IFORT 11.1.058 or higher versions. Please see the discussion below about problems in the earlier versions of IFORT 11.0.xxx:

Tzung-May Fu wrote:

I tested the Intel Fortran v11.0.074 compiler, but found that it is incompatible with the GC code. This is related to the partition.f bug that I reported earlier. (Actually, I'm not sure there is a bug in partition.f any more, unless you have also run into it with IFORT v10).
I ran a 1-day simulation, using Bob's v8-01-03 standard run release, with no change at all. Using Intel Fortran v10.1.015, I was able to replicate Bob's standard run. However, when I switched to Intel Fortran v11.0.074, I ran into the error in partition.f, due to the CONCNOX-SUM1 < 0d0 check. Here's the error message in log:
   ===============================
   GEOS-CHEM ERROR: STOP 30000
   STOP at partition.f
   ===============================
I then tried Bob's fix to partition.f. This time the run finishes, warning the user about the CONCNOX-SUM1 < 0d0 issue. But the output result is completely wacky!!! Below you can compare the surface Ox concentrations, using
The (B) spatial pattern is completely off. NOx is also affected and shows the similar weird pattern.
I'm pretty sure the problem is in the chemistry part. I've tried turning off the optimization but the problem persists. Perhaps there is some problem with the way IFORTv11 treats floating points? Also, I am not sure if IFORTv11 caused the weird model result, or if IFORTv11 caused some issues in chemistry, and the partition.f 'fix' subsequently lead to the weird result.
Long story short, it seems like IFORTv11 is not a good choice for now, and that the 'fix' to partition.f should not be implemented.

Philippe Le Sager wrote:

Thanks for testing Ifort11. We did run into the partition bug with Ifort10 after fixing tpcore. So I doubt that the weird result is related to that partition fix, and it is probably just a problem with IFORT 11.

Bob Yantosca wrote:

You might have to go thru the IFORT 11 manuals to see if any default behavior has changed (i.e. optimization, compiler options, etc). It may not just be the concnox thing but something else in the numerics that is particular to IFORT 11.
There is usually a "What's new" document w/ every Intel compiler release. Maybe that has some more information, you could look at it.

Bob Yantosca wrote:

I've also heard from some folks @ NASA that IFORT 11.0 was problematic. They claim that IFORT 11.1 is much better. You may want to look into this in the meantime.

--Bob Y. 16:50, 7 October 2009 (EDT)

Eric Sofen wrote:

Both Becky Alexander and I have run into problems with IFORT 11.1. When either of us run offline aerosol simulations compiled on IFORT 11.1, the simulation compiles and runs without errors, but the sulfur budgets are way off. The problems seem to be occurring in the deposition code, as Becky's simulations end up with very little deposition, but at the same time, the S burdens are too low. In my case, the deposition ends up being an order of magnitude too high. Changing back to IFORT 10 fixed both of these problems.

--Eric Sofen 13:32, 22 October 2009

Yuxuan Wang wrote:

From our interaction with the Intel people, ifort 11.1.056 should work for GEOS-Chem. The GC version we tested at Tsinghua is v8-02-01 (nested-grid China with GEOS-5 meteorology). The platform we tested is Nehalem from Intel, with the following compilation options:
 -cpp -w -static -fno-alias -O2 -safe_cray_ptr -no-prec-sqrt -no-prec-div -auto -noalign -convert big_endian
Not sure whether these options will work for Mac OSX. From the testing, we found that codes compiled with ifort 11.1.056 ran at 2% faster than ifort 10.1.008.

--Bob Y. 14:59, 4 November 2009 (EST)

Documentation

You can find more information about the Intel Fortran Compiler v11.0 here:

--Bob Y. 15:25, 25 March 2010 (EDT)

IFORT 10

Comparison between IFORT 9.1 and IFORT 10.1

The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare Intel Fortran Compiler (IFORT) v9.1 vs. v10.1.013. The simulations had all these things in common:

  • GEOS-Chem v8-01-01
  • 4x5 GEOS-5 met fields
  • 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
  • Base compiler options: -cpp -w -auto -noalign -convert big_endian
  • Runs were done on the Harvard "Ceres" cluster (OS type "linux-rhel5-x86_64")
Run IFORT
version
# CPUs Optimization options Wall clock
(mm:ss)
Speedup from
IFORT 9.1 to
IFORT 10.1
Speedup from
4 to 8 CPUs w/
the same compiler
Mean OH
(1e5 molec/cm3)
1 9.1 4 -O2 36:16     11.2913755849576
2 10.1 4 -O2 33:55 6.48%   11.2913755842197
3 9.1 4 -O3 37:26     11.2913755849576
4 10.1 4 -O3 33:36 10.24%   11.2913755838124
5 9.1 8 -O2 24:15   33.13% 11.2913755849576
6 10.1 8 -O2 22:46 6.12% 32.88% 11.2913755842197
7 9.1 8 -O3 23:36   36.95% 11.2913755849576
8 10.1 8 -O3 22:31 4.59% 32.99% 11.2913755838124
9 9.1 8 -O3 -ipo -no-prec-div -static 23:03     11.2913764967223
10 10.1 8 -O3 -ipo -no-prec-div -static 21:56 4.84%   11.0809209646817

NOTES about the table:

  1. The column Speedup from IFORT 9.1 to IFORT 10.1 compares the wall clock time of equivalent runs done with IFORT 9.1 and IFORT 10.1. For example, the 6.48% speedup listed for Run #2 is comparing Run #2 to Run #1. Similarly Run #4 is compared against Run #3, etc.
  2. The column Speedup from 4 to 8 CPUs w/ the same compiler compares the wall clock time between runs with 4 CPUs and 8 CPUs for the same compiler (i.e. 4 CPUs on IFORT 9 vs 8 CPUs on IFORT 9, and ditto for IFORT 10). For example, the 33.13% speedup listed for Run #5 is comparing Run #5 to Run #1. Similarly, Run #6 is compared against Run #2, etc.
  3. The compiler options -O3 -ipo -non-prec-div -static correspond to IFORT's -fast optimization option. Using this option results in a mean OH concentration that is different than with the simpler optimization options of -O2 and -O3. This is because the -fast option sacrifices numerical accuracy for speed.
  4. With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration. Thus the bpch files of the runs were binary identical to each other.
  5. With IFORT 10.1, switching from -O2 to -O3 changes the mean OH concentration slightly. This implies that there are slight differences in the chemistry. However all runs done with -O2 have the same mean OH, as do all runs done with -O3.

PLOTS:

  1. Run #2 vs Run #1 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O2 on 4 CPUs)
  2. Run #3 vs Run #4 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O3 on 4 CPUs)
  3. Run #5 vs Run #9 (i.e. -O2 vs -fast with IFORT 9.1)
  4. Run #6 vs Run #10 (i.e. -O2 vs -fast with IFORT 10.1)

TAKE-HOME MESSAGE:

  1. IFORT 10.1 is always faster than the equivalent run with IFORT 9.1.
    • IFORT 10.1 does indeed seem to optimize code better on machines with multi-core chipsets.
    • For example: Run #6 (w/ IFORT 10) is 89 seconds faster per week than Run #5 (w/ IFORT 9) on 8 CPUs. This implies that a 52-week simulation with IFORT 10 on 8 CPUs would finish ~1hr 15m earlier than the equivalent IFORT 9 run.
  2. Switching from 4 to 8 CPU's results in a ~33% speedup for both IFORT 9.1 and IFORT 10.1.
  3. In general, switching from -O2 to -O3 (while using the same # of CPU's) does not result in a significant speedup. This is true for both IFORT 9.1 and IFORT 10.1.

OUR RECOMMENDATIONS:

  1. If possible, use IFORT 10.1 instead of IFORT 9.1
  2. Use the following compiler options (see Makefile.ifort):
    • FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian

--Bob Y. 16:46, 16 April 2008 (EDT)

Upgrading to IFORT 10.1 does not seem to fix the stacksize problem listed below. You still need to manually reset the stacksize limit to a large positive number for both Linux and Altix platforms.

--Bob Y. 12:41, 25 April 2008 (EDT)

IFORT 9

KPP not compatibile with IFORT 9.1

Please see this wiki post about how problems compiling the KPP solver with IFORT 9.1.

Other issues

Optimization options for faster runs

Yuxuan Wang told us about the optimization options: -ipo and -static and said these options would speed up the simulations. I've tested these options on our system at Harvard. The run with the new options show very tiny differences (much less than 1% over 1 month) compared to a run with the old options only. For a full-chemistry run (43 tracers) on 4x5 resolution and 4 processors, the run time is about 10% shorter than previously.

These options are especially efficient to handle the transport. So in simulations with a faster chemistry (like tagged tracers simulations), we expect to see a higher gain in time. For example, the time for a methane run is shorten by about 30 %.

To use these options, in Makefile.ifort, change:

 FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian

to

 FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian -ipo -static

--Ccarouge 15:54, 8 September 2009 (EDT)

Speedup With Hyperthreading on Nehalem chips

Hyperthreading is when a job uses more threads than there are actual CPU cores. I've noticed that using 16 threads ($OMP_NUM_THREADS = 16) on an 8-core system (2 x quad core Intel Nehalem X5570's) leads to a 15% speedup over using 8 threads. These tests were with GEOS-Chem v8-02-03, full chemistry, 2x2.5, ifort 10.1.021, and

 FFLAGS = -cpp -w -O3 -auto -noalign -convert big_endian -g -traceback -CB -vec-report0.   

This does not have a positive impact when using earlier generations of Intel chips (Harpertown or Clovertown).

--Daven Henze 1:42, 16 December 2009 (MDT)

Resetting stacksize for Linux

If you are using IFORT on a Linux machine, you will have to make a similar fix to your .cshrc file as was as was described below for the Altix/Itanium platform.

  • Harvard users: you do not have to do anything anymore. The default software configuration is set up to set the stacksize automatically on all nodes of Ceres, Tethys, and Terra so that you don't have to do this manually.
  • Non-Harvard users: add these lines of code into your .cshrc.
   #--------------------------------------------------------------------------
   # Due to a limitation in the glibc library that is used by the Intel IFORT
   # v9.x and v10.x compilers, you must do the following in order to avoid 
   # potential memory problems with OpenMP:
   #
   # (1) Explicitly set the "KMP_STACKSIZE" environment variable to a large
   #      positive number (but not so large that you get an error msg.)
   #
   # For more information see the Intel IFORT release notes:
   #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
   #
   # The symptom will be that GEOS-Chem will appear to be out of memory and 
   # will die with a segmentation fault.
   #--------------------------------------------------------------------------
   setenv KMP_STACKSIZE 100000000

--Bob Y. 15:39, 24 October 2008 (EDT)

Resetting stacksize for Altix

NOTE: The Altix platform is now mostly obsolete.

(1) As described above, the IFORT compiler has an error that can cause the GEOS-Chem to appear that it is running out of memory when it actually isn't. The symptom that we have noticed is that it seems to choke right when the TPCORE is called. This may tend to happen more often IFORT v9 or v10 on Linux Boxes, but it can also happen on Altix/Itanium systems.

If GEOS-Chem still crashes with the this error, then you may need to set the stacksize variable to a large positive # instead of unlimited. This is a known issue with the POSIX glibc library that is used by IFORT.

Try adding this code to your .cshrc file as well under the "Altix" section:

   #--------------------------------------------------------------------------
   # Due to a limitation in the glibc library that is used by the Intel IFORT 
   # v9.x compilers, you must do the following in order to avoid potential 
   # memory problems with OpenMP:
   #
   # (1) Explicitly set the "KMP_STACKSIZE" environment variable to a 
   #      large positive number (e.g. 209715200).
   # 
   # For more information see the Intel IFORT release notes:
   #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
   #
   # The symptom will be that GEOS-Chem will appear to be out of memory and 
   # will die with a segmentation fault.  This may happen especially if you
   # are running GEOS-Chem with GEOS-5 met on Altix or Titan.
   #
   # (bmy, 8/16/07, 9/9/08)
   #--------------------------------------------------------------------------
   setenv KMP_STACKSIZE 209715200

The 2097152 is the maximum allowable stacksize on the Harvard Altix/Itanium system. This may be different on your system. You can find out the maximum stacksize on your machine by typing "limit" at the Unix prompt. On your machine the number may vary. Then just cut-n-paste this number and replace the "2097152 kbytes" in the text above and put that into your .cshrc or .bashrc.

(2) If you are using the IFORT 10.x compilers, then you might also need to tell the compiler to put automatic arrays into heap memory instead of on the stack.

Mike Seymour wrote:

I found this Intel page regarding stack sizes and ifort >=8.0:

http://www.intel.com/support/performancetools/fortran/sb/cs-007790.htm.

It suggests for ifort 10.0 to use the heap for temporary storage with -heap-arrays <size>, where arrays known at compile-time to be larger than <size> are allocated on the heap instead of the stack.

However, setting <size> to be 1000 does not change things. I don't know if smaller values will have an effect, or if there will be performance issues.

--Bob Y. 16:13, 9 September 2008 (EDT)

Resetting stacksize on other platforms

Win Trivitayanurak wrote:

I'm running a 4x5 resolution with 310 tracers. Recent development was few subroutines and an additional allocation of 30 elements in an array -- not in STT array, so it's not like 30x(72x46x30) more memory, but still probably enough to reach the stacksize limit. Now the "ulimit" solves the problem.
I found that I can set environment in my shell-runscript (e.g. .cshrc file) to have the large enough stacksize. I found good suggestions from this website and for different platforms, the lines are:
   Compaq
      limit stacksize unlimited
      setenv MP_STACK_SIZE 17000000

   IBM
      limit stack size unlimited
      setenv XLSMPOPTS "stack=40000000"

   SGI origin
      limit stack size unlimited
      setenv MP_SLAVE_STACKSIZE 40000000

   SUN/Solaris
      limit stacksize unlimited

   PC/Linux
      limit stacksize unlimited
      setenv MPSTKZ 40000000

--Bob Y. 10:41, 17 October 2008 (EDT)