Difference between revisions of "Intel Fortran Compiler"

From Geos-chem
Jump to: navigation, search
(List of commonly-used optmizization options)
(List of commonly-used optmizization options)
Line 376: Line 376:
 
''In [[GEOS-Chem v9-01-03]] and higher versions, <tt>-mcmodel=medium</tt> is set by default when you compile GEOS-Chem with the <tt>NETCDF=yes</tt> or <tt>HDF=yes</tt> Makefile options.''
 
''In [[GEOS-Chem v9-01-03]] and higher versions, <tt>-mcmodel=medium</tt> is set by default when you compile GEOS-Chem with the <tt>NETCDF=yes</tt> or <tt>HDF=yes</tt> Makefile options.''
  
|-
+
|-valign="top"
|<tt>-i-dynamic</tt>  
+
|<tt>-i-dynamic</tt>
 
|This option needs to be used in conjunction with <tt>-mcmodel=medium</tt>.  It causes Intel-provided libraries to be linked in dynamically instead of statically (which is the default).
 
|This option needs to be used in conjunction with <tt>-mcmodel=medium</tt>.  It causes Intel-provided libraries to be linked in dynamically instead of statically (which is the default).
  
 
''In [[GEOS-Chem v9-01-03]] and higher versions, <tt>-i-dynamic</tt> is set by default when you compile GEOS-Chem with the <tt>NETCDF=yes</tt> or <tt>HDF=yes</tt> Makefile options.''
 
''In [[GEOS-Chem v9-01-03]] and higher versions, <tt>-i-dynamic</tt> is set by default when you compile GEOS-Chem with the <tt>NETCDF=yes</tt> or <tt>HDF=yes</tt> Makefile options.''
  
|-
+
|-valign="top"
 
|<tt>-ipo</tt>
 
|<tt>-ipo</tt>
 
|This option enables interprocedural optimization between files.  This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO).  When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.   
 
|This option enables interprocedural optimization between files.  This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO).  When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.   
Line 388: Line 388:
 
''NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations.  See the [[#Optimization options for faster runs|this wiki post]] below for more information.''
 
''NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations.  See the [[#Optimization options for faster runs|this wiki post]] below for more information.''
  
|-
+
|-valign="top"
 
|<tt>-static</tt>
 
|<tt>-static</tt>
 
|This option prevents linking with shared libraries.  It causes the executable to link all libraries statically.
 
|This option prevents linking with shared libraries.  It causes the executable to link all libraries statically.

Revision as of 22:56, 29 February 2012

This page contains information about the Intel Fortran Compiler (aka "IFORT" compiler).

Documentation

NOTE: The current Intel compiler version is now called Intel Fortran Composer XE (or something similar). Many GEOS-Chem users still use older versions (i.e. IFORT 11, IFORT 10).

You can find more information about the Intel Fortran Compiler here:

  1. Intel Fortran Compiler (v11.1) User and Reference Guide
  2. Determining the cause of SIGSEGV or SIGBUS errors
  3. Dr. Fortran's Blog

Also, normally when you installs the Intel Fortran compilers, you also will install the C and C++ compilers. These compilers are not needed for GEOS-Chem, but they will be needed if you install libraries (e.g. netCDF or HDF5) on your system.

--Bob Y. 15:29, 22 February 2012 (EST)

Performance

Timing results: IFORT 11 vs. IFORT 10

Description

The table below shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare IFORT 10.1.013 vs. IFORT 11.1.069. The simulations had all these things in common:

  1. GEOS-Chem v8-02-04
  2. 4x5 GEOS-5 met fields for month of 2008/07
  3. 1-month of simulation (0 GMT 2005/07/01 to 0 GMT 2005/08/01)
  4. Base compiler options: -cpp -w -O2 -auto -noalign -convert big_endian
  5. KPP compiler turned on
  6. Linoz stratospheric chemistry turned on
  7. All simulations ran on virtual machines (kvm guests) with the following characteristics:
     Machine Type              x86_64
     Operating System          Linux
     Operating System Release  2.6.18-128.1.6.el5_lustre.1.8.0
     CPU Count                 8 CPUs
     CPU Speed                 2659 MHz
     Memory Total              12008076.000 KB
     Swap Space Total          18415600.000 KB

Results

Run IFORT
version
# CPUs Wall clock
(mm:ss)
Parallel % Mean OH
(1e5 molec/cm3)
1 10.1.013 4 02:10:51 384.7% 12.5205894678448
2 11.1.069 4 02:09:14 382.7% 12.5217430768752
3 10.1.013 8 01:17:13 757.1% 12.5205894678448
4 11.1.069 8 01:18:44 753.4% 12.5213489686705

Here are some plots of surface ozone from the benchmarks simulations:

NOTES:

  1. The wall times and parallel % are more or less identical when moving from IFORT 10 to IFORT 11.
  2. The ideal parallelization percentages are 400% (on 4p) and 800% (on 8p).
  3. The differences in the surface ozone is approximately between IFORT 10 and IFORT 11 is due to numerical differences in the libraries and optimization. The absolute magnitude of the differences is approximately 1 ppt of ozone.

--Bob Y. 16:08, 25 March 2010 (EDT)

Comparison between IFORT 9.1 and IFORT 10.1

The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare Intel Fortran Compiler (IFORT) v9.1 vs. v10.1.013. The simulations had all these things in common:

  • GEOS-Chem v8-01-01
  • 4x5 GEOS-5 met fields
  • 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
  • Base compiler options: -cpp -w -auto -noalign -convert big_endian
  • Runs were done on the Harvard "Ceres" cluster (OS type "linux-rhel5-x86_64")
Run IFORT
version
# CPUs Optimization options Wall clock
(mm:ss)
Speedup from
IFORT 9.1 to
IFORT 10.1
Speedup from
4 to 8 CPUs w/
the same compiler
Mean OH
(1e5 molec/cm3)
1 9.1 4 -O2 36:16     11.2913755849576
2 10.1 4 -O2 33:55 6.48%   11.2913755842197
3 9.1 4 -O3 37:26     11.2913755849576
4 10.1 4 -O3 33:36 10.24%   11.2913755838124
5 9.1 8 -O2 24:15   33.13% 11.2913755849576
6 10.1 8 -O2 22:46 6.12% 32.88% 11.2913755842197
7 9.1 8 -O3 23:36   36.95% 11.2913755849576
8 10.1 8 -O3 22:31 4.59% 32.99% 11.2913755838124
9 9.1 8 -O3 -ipo -no-prec-div -static 23:03     11.2913764967223
10 10.1 8 -O3 -ipo -no-prec-div -static 21:56 4.84%   11.0809209646817

NOTES about the table:

  1. The column Speedup from IFORT 9.1 to IFORT 10.1 compares the wall clock time of equivalent runs done with IFORT 9.1 and IFORT 10.1. For example, the 6.48% speedup listed for Run #2 is comparing Run #2 to Run #1. Similarly Run #4 is compared against Run #3, etc.
  2. The column Speedup from 4 to 8 CPUs w/ the same compiler compares the wall clock time between runs with 4 CPUs and 8 CPUs for the same compiler (i.e. 4 CPUs on IFORT 9 vs 8 CPUs on IFORT 9, and ditto for IFORT 10). For example, the 33.13% speedup listed for Run #5 is comparing Run #5 to Run #1. Similarly, Run #6 is compared against Run #2, etc.
  3. The compiler options -O3 -ipo -non-prec-div -static correspond to IFORT's -fast optimization option. Using this option results in a mean OH concentration that is different than with the simpler optimization options of -O2 and -O3. This is because the -fast option sacrifices numerical accuracy for speed.
  4. With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration. Thus the bpch files of the runs were binary identical to each other.
  5. With IFORT 10.1, switching from -O2 to -O3 changes the mean OH concentration slightly. This implies that there are slight differences in the chemistry. However all runs done with -O2 have the same mean OH, as do all runs done with -O3.

PLOTS:

  1. Run #2 vs Run #1 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O2 on 4 CPUs)
  2. Run #3 vs Run #4 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O3 on 4 CPUs)
  3. Run #5 vs Run #9 (i.e. -O2 vs -fast with IFORT 9.1)
  4. Run #6 vs Run #10 (i.e. -O2 vs -fast with IFORT 10.1)

TAKE-HOME MESSAGE:

  1. IFORT 10.1 is always faster than the equivalent run with IFORT 9.1.
    • IFORT 10.1 does indeed seem to optimize code better on machines with multi-core chipsets.
    • For example: Run #6 (w/ IFORT 10) is 89 seconds faster per week than Run #5 (w/ IFORT 9) on 8 CPUs. This implies that a 52-week simulation with IFORT 10 on 8 CPUs would finish ~1hr 15m earlier than the equivalent IFORT 9 run.
  2. Switching from 4 to 8 CPU's results in a ~33% speedup for both IFORT 9.1 and IFORT 10.1.
  3. In general, switching from -O2 to -O3 (while using the same # of CPU's) does not result in a significant speedup. This is true for both IFORT 9.1 and IFORT 10.1.

OUR RECOMMENDATIONS:

  1. If possible, use IFORT 10.1 instead of IFORT 9.1
  2. Use the following compiler options (see Makefile.ifort):
    • FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian

--Bob Y. 16:46, 16 April 2008 (EDT)

Upgrading to IFORT 10.1 does not seem to fix the stacksize problem listed below. You still need to manually reset the stacksize limit to a large positive number for both Linux and Altix platforms.

--Bob Y. 12:41, 25 April 2008 (EDT)

Optimization

In this section we present information about the various optimization options available in the Intel Fortran Compiler.

Optimization options

Here is a quick reference table of IFORT's optimization options (taken from the online Intel Fortran Compiler User and Reference Guides.

Option Description
-O0 Turns off all optimizations. Math expressions will be evaluated in the same order in which they are written, which is necessary for debugging. If you are using a debugger such as Totalview, compile with -g -O0.
-O1 Enables optimizations for speed and disables some optimizations that increase code size and affect speed. The -O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

Setting -O1 automatically sets the following options:

  1. -funroll-loops0,
  2. -nofltconsistency (same as -mno-ieee-fp),
  3. -fomit-frame-pointer,
  4. -ftz
-O2 (aka -O) Enables optimizations for speed. This is the generally recommended optimization level.

This option also enables:

  1. Inlining of intrinsics
  2. Intra-file interprocedural optimizations, which include:
    • inlining
    • constant propagation#
    • forward substitution
    • routine attribute propagation
    • variable address-taken analysis
    • dead static function elimination
    • removal of unreferenced variables
  3. The following capabilities for performance gain:
    • constant propagation
    • copy propagation
    • dead-code elimination
    • global register allocation
    • global instruction scheduling and control speculation
    • loop unrolling
    • optimized code selection
    • partial redundancy elimination
    • strength reduction/induction variable simplification
    • variable renaming
    • exception handling optimizations
    • tail recursions
    • peephole optimizations
    • structure assignment lowering and optimizations
    • dead store elimination

On Linux and Mac OS X systems, if -g is specified, -O2 is turned off and -O0 is the default unless -O2 (or -O1 or -O3) is explicitly specified in the command line together with -g.

-O3 Enables -O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations.

Enables optimizations for maximum speed, such as:

  1. Loop unrolling, including instruction scheduling
  2. Code replication to eliminate branches
  3. Padding the size of certain power-of-two arrays to allow more efficient cache use.

On Linux and Mac OS X systems, the -O3 option sets option -fomitframe-pointer.

The -O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to -O2 optimizations. The -O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.

--Bob Y. 16:28, 29 February 2012 (EST)

Recommended compilation and optimization options for GEOS-Chem

In this section, we present information about the compilation and optimization options that are invoked when you compile a GEOS-Chem simulation.

List of commonly-used optmizization options

Here are the IFORT compilation options currently used by GEOS-Chem:

Option Description
Normal compiler settings
-cpp Turns on the C-preprocessor, to evaluate #if and #define statements in the source code.
-w Suppresses all compiler warnings. This is mainly a convenience to prevent excessive output to the screen or log file.

NOTE: Most compiler warnings are harmless. Execution does not stop when a warning is displayed, unlike an error message, which causes program execution to halt at the point where the error occurred.

-O2 Optimizes the source code for speed, without taking too many liberties with numerical precision. For more information, please see the optimization options section above.
-auto This option places local variables (scalars and arrays of all types), except those declared as SAVE, on the run-time stack. It is as if the variables were declared with the AUTOMATIC attribute. It does not affect variables that have the SAVE attribute or ALLOCATABLE attribute, or variables that appear in an EQUIVALENCE statement or in a common block.
-noalign Prevents the compiler from padding bytes anywhere in common blocks and structures. Padding can affect numerical precision.
-convert big_endian Specifies that the format will be big endian for integer data and big endian IEEE floating-point for real and complex data. This only affects file I/O to/from binary files (such as binary punch files) but not ASCII, netCDF, or other file formats.
-vec-report0 Tells the compiler to suppress printing LOOP HAS BEEN VECTORIZED messages. This reduces the amount of output that is sent to the screen and/or GEOS-Chem log file.
-fp-model source Rounds intermediate results to source-defined precision and enables value-safe optimizations. Basically, this tells the compiler not to take too many liberties with how numerical expressions are evaluated. For more information about this option, please see our precision-safe optimization section below. This option can be disabled by compiling GEOS-Chem with the PRECISE=no Makefile option.
Special compiler settings
-r8 This option tells the compiler to treat variables that are declared as REAL as REAL*8.

NOTE: This option is not used globally, but is only applied to certain indidvidual files (mostly from third-party codes like ISORROPIA.)

-mcmodel=medium This option is used to tell IFORT to use more than 2GB of static memory. This avoids a specific type of memory error that can occur if you compile GEOS-Chem for use with an extremely high-resolution grid (e.g. 0.25° x 0.3125° nested grid).

In GEOS-Chem v9-01-03 and higher versions, -mcmodel=medium is set by default when you compile GEOS-Chem with the NETCDF=yes or HDF=yes Makefile options.

-i-dynamic This option needs to be used in conjunction with -mcmodel=medium. It causes Intel-provided libraries to be linked in dynamically instead of statically (which is the default).

In GEOS-Chem v9-01-03 and higher versions, -i-dynamic is set by default when you compile GEOS-Chem with the NETCDF=yes or HDF=yes Makefile options.

-ipo This option enables interprocedural optimization between files. This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO). When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations. See the this wiki post below for more information.

-static This option prevents linking with shared libraries. It causes the executable to link all libraries statically.

NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations. See the this wiki post below for more information.

Settings only used for debugging
-g Tells the compiler to generate full debugging information in the object file. This will cause a debugger (like Totalview) to display the actual lines of source code, instead of hexadecimal addresses (which is gibberish to anyone except hardware engineers).
-O0 Turns off all optmization. Source code instructions (e.g. DO loops, IF blocks) and numerical expressions are evaluated in precisely the order in which they are listed, without being internally rewritten by the optimizer. This is necessary for using a debugger (like Totalview).
-CB Check for array-out-of-bounds errors. This is invoked when you compile GEOS-Chem with the BOUNDS=yes Makefile option. NOTE: Only use -CB for debugging, as this option will cause GEOS-Chem to execute more slowly!
-traceback This option tells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time. When the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace). This option increases the size of the executable program, but has no impact on run-time execution speeds. It functions independently of the debug option.

--Bob Y. 17:34, 29 February 2012 (EST)

Typical settings for a GEOS-Chem simulation

The normal GEOS-Chem build uses the following IFORT compiler flags:

-cpp -w -O2 -auto -noalign -convert big_endian -vec-report0 -fp-model source -openmp

whereas a debugging run (meant to execute in a debugger such as TotalView) will typically use these flags:

-cpp -w -O0 -auto -noalign -convert big_endian -g -CB -traceback

NOTE: In order to avoid running out of memory if you compiling GEOS-Chem at extremely high resolution (e.g. the 0.25° x 0.3125° nested grids), we recommend adding the following flags:

-mcmodel=medium -i-dynamic

These are automatically set when you compile with the NETCDF=yes or HDF=yes compiler options (in GEOS-Chem v9-01-03 and higher).

--Bob Y. 17:34, 29 February 2012 (EST)

Precision-safe optimization

You can use the following Intel Fortran Compiler options to select how aggressively you would like to optimize floating-point operations.

Default behavior

-fp-model fast

Example source code:

REAL T0, T1, T2;
...
T0 = 4.0E + 0.1E + T1 + T2; 

When this option is specified, the compiler applies the following semantics:

  1. Additions may be performed in any order
  2. Intermediate expressions may use single, double, or extended precision
  3. The constant addition may be pre-computed, assuming the default rounding mode

Using these semantics, the following shows some possible ways the compiler may interpret the original code:

REAL T0, T1, T2; 
...
T0 = (T1 + T2) + 4.1E; 

or

REAL T0, T1, T2; 
...
T0 = (T1 + 4.1E) + T2;

Preferred alternative

-fp-model source (aka -fp-model precise)

Example source code:

REAL T0, T1, T2;
...
T0 = 4.0E + 0.1E + T1 + T2; 

When this option is specified, the compiler applies the following semantics:

  1. Additions are performed in program order, taking into account any parentheses
  2. Intermediate expressions use the precision specified in the source code
  3. The constant addition may be pre-computed, assuming the default rounding mode

Using these semantics, the following shows a possible way the compiler may interpret the original code:

REAL T0, T1, T2;
...
T0 = ((4.1E + T1) + T2);

Summary

If you do not select any -fp-model option, the Intel Fortran Compiler will default to -fp-model fast. As you can see from the examples above, this may not optimize the code in the same way each time. This can lead to minor numerical noise in the output, as was seen in ISORROPIA II.

To avoid this situation, we recommend compiling all source code files with -fp-model source. This will be the new default in GEOS-Chem v9-01-02.

Reference: Intel® Fortran Floating-point Operations; Document Number: 315892-003US

--Bob Y. 17:01, 25 August 2011 (EDT)

Optimization options for faster runs

Yuxuan Wang told us about the optimization options: -ipo and -static and said these options would speed up the simulations. I've tested these options on our system at Harvard. The run with the new options show very tiny differences (much less than 1% over 1 month) compared to a run with the old options only. For a full-chemistry run (43 tracers) on 4x5 resolution and 4 processors, the run time is about 10% shorter than previously.

These options are especially efficient to handle the transport. So in simulations with a faster chemistry (like tagged tracers simulations), we expect to see a higher gain in time. For example, the time for a methane run is shorten by about 30 %.

To use these options, compile GEOS-Chem with the IPO=yes Makefile option, e.g.

make -j4 IPO=yes

--Ccarouge 15:54, 8 September 2009 (EDT)
--Bob Y. 17:50, 29 February 2012 (EST)

Optimization level for debugging

If you would like to run your code in a debugger, such as Totalview, you must use the following compiler switches:

-g -O0

Using -O0 will ensure that the source code gets executed in the same order in which it is written (i.e. this disables all compiler optimizations). The -g switch will tell the debugger to display lines of source code instead of hexadecimal memory addresses (which are more or less gibberish unless you are a hardware engineer).

GEOS-Chem will add these switches automatically for you if you compile with the DEBUG=yes option.

--Bob Y. 15:28, 22 February 2012 (EST)

Known issues

Relocation truncated to fit error

If your code uses many large arrays, or if you are compiling an ultra-fine resolution version of GEOS-Chem (e.g. a 0.25° x 0.3125° GEOS-5.7.2 nested grid), then you may see this type of error:

Relocation truncated to fit: R_X86_64_32S against `.bss' Error"

The wording you get may differ slightly than the example shown above.

Long story short: IFORT is telling you that your program is trying to use more than 2GB of statically-allocated data (i.e. data space that is not declared with an ALLOCATABLE statement) at compile time. The default setting in IFORT is to expect to use less than 2GB of memory, so you are hitting the upper limit.

The solution is simple: recompile your code with the following compiler flags:

-mcmodel=medium -i-dynamic

The -mcmodel=medium flag will tell IFORT that you expect to use more than 2GB of statically-allocated memory in your program. However, this also requires that you use link using dynamic libraries instead of the normal shared libraries. Using the -i-dynamic flag will turn on the dynamic library linking.

IMPORTANT NOTE! If your code links to any libraries such as HDF or netCDF, then you MUST rebuild each library, making sure that the Fortran and C compilers use the -mcmodel=medium option. Please see our Installing HDF5 and netCDF4 page for examples.

GEOS-Chem v9-01-03 and higher will automatically set these flags for you if compile with the HDF5=yes or NETCDF=yes Makefile options.

For more information, please see the following links:

  1. Typist vs. Programmer blog
  2. MITGCM support blog
  3. Software.intel.com blog

--Bob Y. 15:49, 22 February 2012 (EST)

Problems with IFORT 11.0.xxx

You should use GEOS-Chem with IFORT 11.1.058 or higher versions. Please see the discussion below about problems in the earlier versions of IFORT 11.0.xxx:

Tzung-May Fu wrote:

I tested the Intel Fortran v11.0.074 compiler, but found that it is incompatible with the GC code. This is related to the partition.f bug that I reported earlier. (Actually, I'm not sure there is a bug in partition.f any more, unless you have also run into it with IFORT v10).
I ran a 1-day simulation, using Bob's v8-01-03 standard run release, with no change at all. Using Intel Fortran v10.1.015, I was able to replicate Bob's standard run. However, when I switched to Intel Fortran v11.0.074, I ran into the error in partition.f, due to the CONCNOX-SUM1 < 0d0 check. Here's the error message in log:
   ===============================
   GEOS-CHEM ERROR: STOP 30000
   STOP at partition.f
   ===============================
I then tried Bob's fix to partition.f. This time the run finishes, warning the user about the CONCNOX-SUM1 < 0d0 issue. But the output result is completely wacky!!! Below you can compare the surface Ox concentrations, using
The (B) spatial pattern is completely off. NOx is also affected and shows the similar weird pattern.
I'm pretty sure the problem is in the chemistry part. I've tried turning off the optimization but the problem persists. Perhaps there is some problem with the way IFORTv11 treats floating points? Also, I am not sure if IFORTv11 caused the weird model result, or if IFORTv11 caused some issues in chemistry, and the partition.f 'fix' subsequently lead to the weird result.
Long story short, it seems like IFORTv11 is not a good choice for now, and that the 'fix' to partition.f should not be implemented.

Philippe Le Sager wrote:

Thanks for testing Ifort11. We did run into the partition bug with Ifort10 after fixing tpcore. So I doubt that the weird result is related to that partition fix, and it is probably just a problem with IFORT 11.

Bob Yantosca wrote:

You might have to go thru the IFORT 11 manuals to see if any default behavior has changed (i.e. optimization, compiler options, etc). It may not just be the concnox thing but something else in the numerics that is particular to IFORT 11.
There is usually a "What's new" document w/ every Intel compiler release. Maybe that has some more information, you could look at it.

Bob Yantosca wrote:

I've also heard from some folks @ NASA that IFORT 11.0 was problematic. They claim that IFORT 11.1 is much better. You may want to look into this in the meantime.

--Bob Y. 16:50, 7 October 2009 (EDT)

Eric Sofen wrote:

Both Becky Alexander and I have run into problems with IFORT 11.1. When either of us run offline aerosol simulations compiled on IFORT 11.1, the simulation compiles and runs without errors, but the sulfur budgets are way off. The problems seem to be occurring in the deposition code, as Becky's simulations end up with very little deposition, but at the same time, the S burdens are too low. In my case, the deposition ends up being an order of magnitude too high. Changing back to IFORT 10 fixed both of these problems.

--Eric Sofen 13:32, 22 October 2009

Yuxuan Wang wrote:

From our interaction with the Intel people, ifort 11.1.056 should work for GEOS-Chem. The GC version we tested at Tsinghua is v8-02-01 (nested-grid China with GEOS-5 meteorology). The platform we tested is Nehalem from Intel, with the following compilation options:
 -cpp -w -static -fno-alias -O2 -safe_cray_ptr -no-prec-sqrt -no-prec-div -auto -noalign -convert big_endian
Not sure whether these options will work for Mac OSX. From the testing, we found that codes compiled with ifort 11.1.056 ran at 2% faster than ifort 10.1.008.

--Bob Y. 14:59, 4 November 2009 (EST)

Problem with IFORT 11 and GEOS-Chem adjoint

Nicolas Bousserez wrote:

We have been struggling for some time with the following problem when running GC adjoint (v8-02-01):
  "OMP abort: Initializing libguide.so, 
but found libguide.so already initialized".
After some investigations it seems like it is a linker error generated when different parts of the program try to link both static and dynamic verions of the OpenMP runtime. There is an option in ifort 11 to have openmp linked statically, which theoretically should fix this problem.
But using ifort 11 for GC seems to cause other problems and this compilation option doesn't exist with ifort 10. The fact is that Daven Henze, who is using ifort 10 and a linux platform similar to ours never got the above problem. Has anyone got this error before? My platform configuration is the following:
   Linux node9 2.6.9-89.0.23.ELsmp #1 SMP Wed Mar 17 06:49:21 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
If anyone running GC adjoint has a similar configuration please let me know what is your Makefile (using netcdf libraries) configuration and which version of ifort you're using so that I can do some testing.

--Bob Y. 09:43, 8 April 2011 (EDT)

Incompatibility between IFORT 11 and OS version

If you are using a the Intel Fortran Compiler version 11, you may encounter some incompatibilities with your operating system, which might require an OS upgrade.

Nicolas Bousserez wrote:

We have been struggling for some time with the following problem when running GC adjoint (v8-02-01). We get this error:
   "OMP abort: Initializing libguide.so, but we but found libguide.so already initialized".
After some investigations it seems like it is a linker error generated when different parts of the program try to link both static and dynamic verions of the OpenMP runtime. There is an option in ifort 11 to have openmp linked statically, which theoretically should fix this problem. But using ifort 11 for GC seems to cause other problems and this compilation option doesn't exist with ifort 10. The fact is that Daven Henze, who is using ifort 10 and a linux platform similar to ours never got the above problem. Has anyone got this error before? My platform configuration is the following:
   Linux node9 2.6.9-89.0.23.ELsmp #1 SMP Wed Mar 17 06:49:21 EDT 2010
   x86_64 x86_64 x86_64 GNU/Linux

Nicolas Bousserez wrote:

For what it's worth, this is the oldest OS we're using:
   Linux terra-01.vpn.as.harvard.edu 2.6.18-194.3.1.el5_lustre.1.8.4 #1 SMP 
   Fri Jul 9 21:55:24 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux
and this is the newest:
   Linux kvm-12.s.as.harvard.edu 2.6.18-194.32.1.el5 #1 SMP 
   Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
What is relevant is the 2.6.9 and the 2.6.18, not the compilation dates. It means you're running something equivalent to a RHEL4/CentOS-4 kernel instead of RHEL5/CentOS-5, which has implications for your libraries, compatibility, bugs, security, etc. I would guess that you've been updating RHEL4 (first released 2005) or equivalent for several years, and ifort11 was released during the era of RHEL5 (first released 2007), so it wouldn't be too surprising if there were a library incompatibility. I don't know whether that is the cause of your symptom, but it might be. (RHEL6 was released late last year and v12 of the Intel compilers have also been released. CentOS-6 will be out soon.)
You can continue to use the older compiler with the older OS, but I'd recommend upgrading the OS, which is worth doing anyway.

--Bob Y. 10:25, 13 April 2011 (EDT)

Error in partition.f when compiling with IFORT 10

Prasad Kasibhatla reported an error in routine partition.f which caused a GEOS-Chem v9-01-01 execution to halt. Upon further investigation, we found that this error only occurs when compiling GEOS-Chem with the Intel Fortran Compiler version 10 (aka IFORT 10) when selecting -O2 optimization and -openmp parallelization.

We recommend that all GEOS-Chem users upgrade to Intel Fortran Compiler version 11 (aka IFORT 11). If you must use IFORT 10, then we recommend that you compile the entire code with the -O1 optimization option. For more information, please see this wiki post.

--Bob Y. 09:17, 29 June 2011 (EDT)

KPP not compatibile with IFORT 9.1

Please see this wiki post about how problems compiling the KPP solver with IFORT 9.1.

Speedup With Hyperthreading on Nehalem chips

Hyperthreading is when a job uses more threads than there are actual CPU cores. I've noticed that using 16 threads ($OMP_NUM_THREADS = 16) on an 8-core system (2 x quad core Intel Nehalem X5570's) leads to a 15% speedup over using 8 threads. These tests were with GEOS-Chem v8-02-03, full chemistry, 2x2.5, ifort 10.1.021, and

 FFLAGS = -cpp -w -O3 -auto -noalign -convert big_endian -g -traceback -CB -vec-report0.   

This does not have a positive impact when using earlier generations of Intel chips (Harpertown or Clovertown).

--Daven Henze 1:42, 16 December 2009 (MDT)

Resetting stacksize for Linux

If you are using IFORT on a Linux machine, you will have to make a similar fix to your .cshrc file as was as was described below for the Altix/Itanium platform.

  • Harvard users: you do not have to do anything anymore. The default software configuration is set up to set the stacksize automatically on all nodes of Ceres, Tethys, and Terra so that you don't have to do this manually.
  • Non-Harvard users: add these lines of code into your .cshrc.
   #--------------------------------------------------------------------------
   # Due to a limitation in the glibc library that is used by the Intel IFORT
   # v9.x and v10.x compilers, you must do the following in order to avoid 
   # potential memory problems with OpenMP:
   #
   # (1) Explicitly set the "KMP_STACKSIZE" environment variable to a large
   #      positive number (but not so large that you get an error msg.)
   #
   # For more information see the Intel IFORT release notes:
   #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
   #
   # The symptom will be that GEOS-Chem will appear to be out of memory and 
   # will die with a segmentation fault.
   #--------------------------------------------------------------------------
   setenv KMP_STACKSIZE 100000000

--Bob Y. 15:39, 24 October 2008 (EDT)

NOTE: If you are using another shell such as sh, bash, or ksh, then you should use the following commands:

   # change the stack size
   ulimit -s 100000000
   export KMP_STACKSIZE=100000000

--Bob Y. 16:25, 19 April 2010 (EDT)

Resetting stacksize on other platforms

Win Trivitayanurak wrote:

I'm running a 4x5 resolution with 310 tracers. Recent development was few subroutines and an additional allocation of 30 elements in an array -- not in STT array, so it's not like 30x(72x46x30) more memory, but still probably enough to reach the stacksize limit. Now the "ulimit" solves the problem.
I found that I can set environment in my shell-runscript (e.g. .cshrc file) to have the large enough stacksize. I found good suggestions from this website and for different platforms, the lines are:
   Compaq
      limit stacksize unlimited
      setenv MP_STACK_SIZE 17000000

   IBM
      limit stack size unlimited
      setenv XLSMPOPTS "stack=40000000"

   SGI origin
      limit stack size unlimited
      setenv MP_SLAVE_STACKSIZE 40000000

   SUN/Solaris
      limit stacksize unlimited

   PC/Linux
      limit stacksize unlimited
      setenv MPSTKZ 40000000

--Bob Y. 10:41, 17 October 2008 (EDT)