Intel Fortran Compiler: Difference between revisions

From Geos-chem
Jump to navigation Jump to search
(New page: == IFORT 11 == === Timing results === === Problems with IFORT 11.0.xxx === '''''[mailto:May.Fu@polyu.edu.hk Tzung-May Fu] wrote:''''' :I tested the Intel Fortran v11.0.074 compiler, b...)
 
 
(201 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== IFORT 11 ==
'''''[[GNU Fortran compiler|Previous]] | [[Fortran language resources|Next]] | [[Guide to compilers for GEOS-Chem]]'''''


=== Timing results ===
#[[Supported compilers for GEOS-Chem]]
#[[GNU Fortran compiler|The GNU Fortran compiler (gfortran)]]
#<span style="color:blue">'''The Intel Fortran compiler (ifort, ifx)'''</span>
#[[Fortran language resources]]


=== Problems with IFORT 11.0.xxx ===


'''''[mailto:May.Fu@polyu.edu.hk Tzung-May Fu] wrote:'''''
This page contains information about the Intel Fortran compiler, which has recently been renamed from <tt>ifort</tt> to <tt>ifx</tt>.
 
'''''The Intel Fortran compiler is our recommended proprietary compiler for GEOS-Chem.'''''
 
== Overview ==
 
=== Documentation ===
 
You can find more information about the Intel Fortran Compiler here:
 
#[https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference|Intel Fortran 19 documentaton]
#[http://acmg.seas.harvard.edu/geos/wiki_docs/compilers/PDF_Fortran_Compiler_UG_17_0.pdf Intel Fortran 17]
 
Also, normally when you installs the Intel Fortran compilers, you also will install the C and C++ compilers. 
 
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:44, 10 January 2019 (UTC)
 
=== Intel Fortran Compiler versions that have been tested with GEOS-Chem ===
 
The [https://geoschem.github.io/support-team GEOS-Chem Support Team] has tested GEOS-Chem with the compiler versions listed below.  But you should be able to use other Intel Fortran Compiler versions as well.
 
{| border=1 cellspacing=0 cellpadding=5
|- bgcolor="#CCCCCC"
!width="100px"|Platform
!width="200px"|Compiler
!width="450px"|Status
 
|-valign="top"
|Linux
|ifort 23.0.0
|Supported
 
|-valign="top"
|Linux
|ifort 19.0.5.281
|Supported
 
|-valign="top"
|Linux
|ifort 18.0.5
|Supported
 
|-valign="top"
|Linux
|ifort 17.0.4
|Supported
 
|-valign="top"
|Linux
|ifort 15.0.0 and similar builds
|Supported
*NOTE: IFORT 15 has a compiler bug that causes errors when turning on array-out-of-bounds checking and optimization.
 
|-valign="top"
|Linux
|ifort 13.0.079 and similar builds
|Supported
 
|-valign="top"
|Linux
|ifort 12
|Supported
*NOTE: A|compiler bug in ifort 12 and higher versions]] has forced us to add a workaround to HEMCO in v11-01.
 
|-valign="top"
|Linux
|ifort 11.1.069 and similar builds
|Supported
 
|}
 
== Environment settings for Intel Fortran ==
 
Here is some information about how you can customize your Unix environment to use the Intel Fortran compiler.
 
Here is some information about how you can customize your Linux environment to use the GNU Fortran compiler.  This information was recently migrated to our [https://geos-chem.readthedocs.io geos-chem.readthedocs.io] manual.
 
* [https://geos-chem.readthedocs.io/en/latest/getting-started/login-env-files-intel.html Create an environment file for Intel compilers]
* [https://geos-chem.readthedocs.io/en/latest/getting-started/login-env-compilers.html Set environment variables for compilers]
* [https://geos-chem.readthedocs.io/en/latest/getting-started/login-env-parallel.html Set environment variables for parallelization]
 
== Optimization ==
 
In this section we present information about the various optimization options available in the Intel Fortran Compiler.
 
=== Optimization options ===
 
Here is a quick reference table of optimization options (taken from the online [http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm Intel Fortran Compiler User and Reference Guides].
 
{| border=1 cellspacing=0 cellpadding=5
|- bgcolor="#CCCCCC"
!width="200px"|Option
!width="650px"|Description
!width="150px"|How invoked in GEOS-Chem?
|-valign="top"
|<tt>-O0</tt>
|Turns off all optimizations.  Math expressions will be evaluated in the same order in which they are written, which is necessary for debugging.  [[#Optimization level for debugging|If you are using a debugger (such as Totalview)]], compile with <tt>-g -O0</tt>.
|<tt>DEBUG=yes</tt> or<br><tt>OPT=-O0</tt>
 
|-valign="top"
|<tt>-O1</tt>
|Enables optimizations for speed and disables some optimizations that increase code size and affect speed.  The <tt>-O1</tt> option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.
 
Setting <tt>-O1</tt> automatically sets the following options:
#<tt>-funroll-loops0</tt>,
#<tt>-nofltconsistency</tt> (same as <tt>-mno-ieee-fp</tt>),
#<tt>-fomit-frame-pointer</tt>,
#<tt>-ftz</tt>
 
|<tt>OPT=-O1</tt>
 
|-valign="top"
|<tt>-O2</tt> (aka <tt>-O</tt>)
|Enables optimizations for speed.  This is the generally recommended optimization level.
 
This option also enables: 
#Inlining of intrinsics
# Intra-file interprocedural optimizations, which include: 
#*inlining
#*constant propagation#
#* forward substitution
#*routine attribute propagation
#*variable address-taken analysis
#*dead static function elimination
#*removal of unreferenced variables
#The following capabilities for performance gain:
#*constant propagation
#*copy propagation
#*dead-code elimination
#*global register allocation
#*global instruction scheduling and control speculation
#*loop unrolling
#*optimized code selection
#*partial redundancy elimination
#*strength reduction/induction variable simplification
#*variable renaming
#*exception handling optimizations
#*tail recursions
#*peephole optimizations
#*structure assignment lowering and optimizations
#*dead store elimination
 
On Linux and Mac OS X systems, if <tt>-g</tt> is specified, <tt>-O2</tt> is turned off and <tt>-O0</tt> is the default unless <tt>-O2</tt> (or <tt>-O1</tt> or <tt>-O3</tt>) is explicitly specified in the command line together with <tt>-g</tt>. 
|Default setting
 
|-valign="top"
|<tt>-O3</tt>
|Enables <tt>-O2</tt> optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. 
 
Enables optimizations for maximum speed, such as: 
# Loop unrolling, including instruction scheduling
# Code replication to eliminate branches
# Padding the size of certain power-of-two arrays to allow more efficient cache use.
 
On Linux and Mac OS X systems, the <tt>-O3</tt> option sets option <tt>-fomitframe-pointer</tt>.
   
   
:I tested the Intel Fortran v11.0.074 compiler, but found that it is incompatible with the GC code.  This is related to the [[Bugs and fixes#Error message in partition.f|<tt>partition.f</tt> bug that I reported earlier]](Actually, I'm not sure there is a bug in partition.f any more, unless you have also run into it with IFORT v10).
The <tt>-O3</tt> optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to <tt>-O2</tt> optimizationsThe <tt>-O3</tt> option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.   
   
|<tt>OPT=-O3</tt>
:I ran a 1-day simulation, using Bob's v8-01-03 standard run release, with no change at all.  Using Intel Fortran v10.1.015, I was able to replicate Bob's standard run. However, when I switched to Intel Fortran v11.0.074, I ran into the error in partition.f, due to the CONCNOX-SUM1 < 0d0 check.  Here's the error message in log:
|}
 
    ===============================
--[[User:Bmy|Bob Y.]] 11:14, 3 October 2013 (EDT)
    GEOS-CHEM ERROR: STOP 30000
 
    STOP at partition.f
=== Recommended compilation and optimization options for GEOS-Chem ===
    ===============================
 
In this section, we present information about the compilation and optimization options that are invoked when you compile a GEOS-Chem simulation.
 
==== List of commonly-used compilation options ====
 
Here are the IFORT compilation options currently used by GEOS-Chem:


:I then tried [[Bugs and fixes#Error message in partition.f|Bob's fix to partition.f]].  This time the run finishes, warning the user about the CONCNOX-SUM1 < 0d0 issue.  But the output result is completely wacky!!! Below you can compare the surface Ox concentrations, using
{| border=1 cellspacing=0 cellpadding=5
|- bgcolor="#CCCCCC"
!width="200px"|Option
!width="650px"|Description
!width="150px"|How invoked in GEOS-Chem?


:* (A) [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/A_Ox_sfc_20050701_ifort10.gif IFORT v10]
|- bgcolor="#CCFFFF"
:* (B) [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/B_Ox_sfc_20050701_ifort11_partition_fix.gif IFORT v11 and the <tt>partition.f</tt> fix]
!width="200px"|
!width="650px"|Normal compiler settings
!width="150px"|


:The (B) spatial pattern is completely off.  NOx is also affected and shows the similar weird pattern.
|-valign="top"
|<tt>-cpp</tt>
:I'm pretty sure the problem is in the chemistry part.  I've tried turning off the optimization but the problem persists.  Perhaps there is some problem with the way IFORTv11 treats floating points?  Also, I am not sure if IFORTv11 caused the weird model result, or if IFORTv11 caused some issues in chemistry, and the <tt>partition.f</tt> 'fix' subsequently lead to the weird result.
|Turns on the C-preprocessor, to evaluate <code>#if</code> and <code>#define</code> statements in the source code.
|Default setting
:Long story short, it seems like IFORTv11 is not a good choice for now, and that the 'fix' to partition.f should not be implemented.


'''''[mailto:plesager@seas.harvard.eud Philippe Le Sager] wrote:'''''
|-valign="top"
|<tt>-w</tt>
|Suppresses all compiler warnings. This is mainly a convenience to prevent excessive output to the screen or log file.


:Thanks for testing Ifort11. We did run into the partition bug with Ifort10 after fixing tpcore. So I doubt that the weird result is related to that partition fix, and it is probably just a problem with IFORT 11.
''NOTE: Most compiler warnings are harmless. Execution does not stop when a warning is displayed, unlike an error message, which causes program execution to halt at the point where the error occurred.''
|Default setting


'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] wrote:'''''
|-valign="top"
|<tt>-O2</tt>
|Optimizes the source code for speed, without taking too many liberties with numerical precision.  For more information, please see the [[#Optimization options|optimization options section above]].
|Default setting


:You might have to go thru the IFORT 11 manuals to see if any default behavior has changed (i.e. optimization, compiler options, etc).  It may not just be the concnox thing but something else in the numerics that is particular to IFORT 11.
|-valign="top"
|<tt>-auto</tt>
|This option places local variables (scalars and arrays of all types), except those declared as <code>SAVE</code>, on the run-time stack.  It is as if the variables were declared with the <code>AUTOMATIC</code> attribute.  It does not affect variables that have the <code>SAVE</code> attribute or <code>ALLOCATABLE</code> attribute, or variables that appear in an <code>EQUIVALENCE</code> statement or in a common block.
|Default setting


:There is usually a "What's new" document w/ every Intel compiler releaseMaybe that has some more information, you could look at it.
|-valign="top"
|<tt>-noalign</tt>
|Prevents the compiler from padding bytes anywhere in common blocks and structuresPadding can affect numerical precision.
|Default setting


'''''[mailto:yantosca@seas.harvard.edu Bob Yantosca] wrote:'''''
|-valign="top"
|<tt>-convert big_endian</tt>
|Specifies that the format will be big endian for integer data and big endian IEEE floating-point for real and complex data. This only affects file I/O to/from binary files (such as binary punch files) but not ASCII, netCDF, or other file formats.
|Default setting


:I've also heard from some folks @ NASA that IFORT 11.0 was problematicThey claim that IFORT 11.1 is much better.  You may want to look into this in the meantime.
|-valign="top"
|<tt>-vec-report0</tt>
|Tells the compiler to suppress printing <tt>"LOOP HAS BEEN VECTORIZED"</tt> messagesThis reduces the amount of output that is sent to the screen and/or GEOS-Chem log file.
|Default setting


--[[User:Bmy|Bob Y.]] 16:50, 7 October 2009 (EDT)
|-valign="top"
|<tt>-fp-model source</tt>
|Rounds intermediate results to source-defined precision and enables value-safe optimizations.  Basically, this tells the compiler not to take too many liberties with how numerical expressions are evaluated.  For more information about this option, please see our [[#Precision-safe optimization|precision-safe optimization section]] below.  This option can be disabled by compiling GEOS-Chem with the <tt>PRECISE=no</tt> Makefile option.
|Default setting


'''''[mailto:esofen@atmos.washington.edu Eric Sofen] wrote:'''''
|-valign="top"
|<tt>-traceback</tt>
|This option tells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time.  When the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace).  This option increases the size of the executable program, but has no impact on run-time execution speeds. It functions independently of the debug option.
|
*Default setting<br>([[GEOS-Chem v11-01|v11-01]] and higher)
*<tt>TRACEBACK=yes</tt><br>(prior versions)


:Both Becky Alexander and I have run into problems with IFORT 11.1.  When either of us run offline aerosol simulations compiled on IFORT 11.1, the simulation compiles and runs without errors, but the sulfur budgets are way off.  The problems seem to be occurring in the deposition code, as Becky's simulations end up with very little deposition, but at the same time, the S burdens are too low.  In my case, the deposition ends up being an order of magnitude too high.  Changing back to IFORT 10 fixed both of these problems.
|- bgcolor="#CCFFFF"
!width="200px"|
!width="650px"|Special compiler settings
!width="150px"|


--[[User:Esofen|Eric Sofen]] 13:32, 22 October 2009
|-valign="top"
|<tt>-r8</tt>
|This option tells the compiler to treat variables that are declared as <code>REAL</code> as <code>REAL*8</code> (as opposed to <code>REAL*4</code>.


'''''[mailto:yxw@mail.tsinghua.edu.cn Yuxuan Wang] wrote:'''''
''NOTE: This option is not used globally, but is only applied to certain indidvidual files (mostly from third-party codes like ISORROPIA. Current GEOS-Chem programming practice is to use either <code>REAL*4</code> or <code>REAL*8</code> instead of <code>REAL</code>, which avoids confusion.''
|Used as needed


:From our interaction with the Intel people, <tt>ifort 11.1.056</tt> should work for GEOS-Chem. The GC version we tested at Tsinghua is v8-02-01 (nested-grid China with GEOS-5 meteorology). The platform we tested is Nehalem from Intel, with the following compilation options:
|-valign="top"
|<tt>-mcmodel=medium</tt>
|This option is used to tell IFORT to use more than 2GB of static memory.  This avoids a [[#Relocation truncated to fit error|specific type of memory error]] that can occur if you compile GEOS-Chem for use with an extremely high-resolution grid (e.g. 0.25&deg; x 0.3125&deg; nested grid).
|Default setting


  -cpp -w -static -fno-alias -O2 -safe_cray_ptr -no-prec-sqrt -no-prec-div -auto -noalign -convert big_endian
|-valign="top"
|<tt>-shared-intel</tt><br>(formerly <tt>-i-dynamic</tt>)
|This option needs to be used in conjunction with <tt>-mcmodel=medium</tt>.  It causes Intel-provided libraries to be linked in dynamically instead of statically (which is the default).
|Default setting


:Not sure whether these options will work for Mac OSX. From the testing, we found that codes compiled with <tt>ifort 11.1.056</tt> ran at 2% faster than <tt>ifort 10.1.008</tt>.  
|-valign="top"
|<tt>-ipo</tt>
|This option enables interprocedural optimization between files. This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO). When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.


--[[User:Bmy|Bob Y.]] 14:59, 4 November 2009 (EST)
''NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations.  See the [[#Optimization options for faster runs|this wiki post]] below for more information.''
|<tt>IPO=yes</tt>


== IFORT 10 ==
|-valign="top"
|<tt>-static</tt>
|This option prevents linking with shared libraries.  It causes the executable to link all libraries statically.


=== Comparison between IFORT 9.1 and IFORT 10.1 ===
''NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations. See the [[#Optimization options for faster runs|this wiki post]] below for more information.''
|<tt>IPO=yes</tt>


The table shows the wallclock time and mean OH for several GEOS-Chem simulations that were done in order to compare Intel Fortran Compiler (IFORT) v9.1 vs. v10.1.013.  The simulations had all these things in common:
|- bgcolor="#CCFFFF"
!width="200px"|
!width="650px"|Settings only used for debugging
!width="150px"|


* GEOS-Chem v8-01-01
|-valign="top"
* 4x5 GEOS-5 met fields
|<tt>-debug all</tt>
* 1-week of simulation (0 GMT 2008/01/01 to 0 GMT 2008/01/08)
|Tells the compiler turn on all debug error output.
* Base compiler options: <tt>-cpp -w -auto -noalign -convert big_endian</tt>
|<tt>DEBUG=yes</tt>
* Runs were done on the Harvard "Ceres" cluster (OS type <tt>"linux-rhel5-x86_64"</tt>)


{| border=1 cellpadding=5 cellspacing=0
|-valign="top"
|- bgcolor="#CCCCCC" align="center"
|<tt>-g</tt>
! Run
|Tells the compiler to generate full debugging information in the object file. This will cause a debugger (like Totalview) to display the actual lines of source code, instead of hexadecimal addresses (which is gibberish to anyone except hardware engineers).
! IFORT<br>version
|<tt>DEBUG=yes</tt>
! # CPUs
! Optimization options
! Wall clock<br>(mm:ss)
! Speedup from<br>IFORT 9.1 to<br>IFORT 10.1
! Speedup from<br>4 to 8 CPUs w/<br>the same compiler
! Mean OH<br>(1e5 molec/cm3)
|- align="center"
| 1
| 9.1       
| 4             
| -O2   
| 36:16
|&nbsp;
|&nbsp;
| 11.2913755849576
|- align="center" bgcolor="#CCFFFF"
| 2
| 10.1
| 4
| -O2
| 33:55
| 6.48%
|&nbsp;
| 11.2913755842197
|- align="center"
| 3
| 9.1
| 4
| -O3
| 37:26
|&nbsp;
|&nbsp;
| 11.2913755849576
|- align="center" bgcolor="#CCFFFF"
| 4
| 10.1
| 4
| -O3
| 33:36
| 10.24%
|&nbsp;
| 11.2913755838124
|- align="center"
| 5
| 9.1
| 8
| -O2
| 24:15
|&nbsp;
| 33.13%
| 11.2913755849576
|- align="center" bgcolor="#CCFFFF"
| 6
| 10.1
| 8
| -O2
| 22:46
| 6.12%
| 32.88%
| 11.2913755842197
|- align="center"
| 7
| 9.1
| 8
| -O3
| 23:36
|&nbsp;
| 36.95%
| 11.2913755849576
|- align="center" bgcolor="#CCFFFF"
| 8
| 10.1
| 8
| -O3
| 22:31
| 4.59%
| 32.99%
| 11.2913755838124
|- align="center"
| 9
| 9.1
| 8
| -O3 -ipo -no-prec-div -static
| 23:03
|&nbsp;
|&nbsp;
| 11.2913764967223
|- align="center" bgcolor="#CCFFFF"
| 10
| 10.1
| 8
| -O3 -ipo -no-prec-div -static
| 21:56
| 4.84%
|&nbsp;
| 11.0809209646817
|}


NOTES about the table:
|-valign="top"
# The column '''Speedup from IFORT 9.1 to IFORT 10.1''' compares the wall clock time of equivalent runs done with IFORT 9.1 and IFORT 10.1.  For example, the 6.48% speedup listed for Run #2 is comparing Run #2 to Run #1.  Similarly Run #4 is compared against Run #3, etc.
|<tt>-O0</tt>
# The column '''Speedup from 4 to 8 CPUs w/ the same compiler''' compares the wall clock time between runs with 4 CPUs and 8 CPUs for the same compiler (i.e. 4 CPUs on IFORT 9 vs 8 CPUs on IFORT 9, and ditto for IFORT 10).  For example, the 33.13% speedup listed for Run #5 is comparing Run #5 to Run #1.  Similarly, Run #6 is compared against Run #2, etc.
|Turns off all optmizationSource code instructions (e.g. DO loops, IF blocks) and numerical expressions are evaluated in precisely the order in which they are listed, without being internally rewritten by the optimizer.  This is necessary for using a debugger (like Totalview).
# The compiler options <tt>-O3 -ipo -non-prec-div -static</tt> correspond to IFORT's <tt>-fast</tt> optimization optionUsing this option results in a mean OH concentration that is different than with the simpler optimization options of -O2 and -O3.  This is because the <tt>-fast</tt> option sacrifices numerical accuracy for speed.
|<tt>DEBUG=yes</tt>
# With IFORT 9.1, switching from -O2 to -O3 does not change the mean OH concentration.  Thus the bpch files of the runs were binary identical to each other.
# With IFORT 10.1, switching from -O2 to -O3 changes the mean OH concentration slightly.  This implies that there are slight differences in the chemistry.  However all runs done with -O2 have the same mean OH, as do all runs done with -O3. 


PLOTS:
|-valign="top"
# [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/4p_O2_v9_v10.gif Run #2 vs Run #1 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O2 on 4 CPUs)]
|<tt>-check bounds</tt> (aka <tt>-CB</tt>)
# [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/4p_O3_v9_v10.gif Run #3 vs Run #4 (i.e. IFORT 9.1 vs IFORT 10.1 w/ -O3 on 4 CPUs)]
|Check for [[Common GEOS-Chem error messages#Array-out-of-bounds_error|array-out-of-bounds errors]]. This is invoked when you compile GEOS-Chem with the <tt>BOUNDS=yes</tt> Makefile option. ''NOTE: Only use <tt>-CB</tt> for debugging, as this option will cause GEOS-Chem to execute more slowly!''
# [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/v9_8p_O2_fast.gif Run #5 vs Run #9 (i.e. -O2 vs -fast with IFORT 9.1)]
|<tt>DEBUG=yes</tt>
# [http://www.as.harvard.edu/ctm/geos/wiki_docs/machines/v10_8p_O2_fast.gif Run #6 vs Run #10 (i.e. -O2 vs -fast with IFORT 10.1)]


TAKE-HOME MESSAGE:
|-valign="top"
# IFORT 10.1 is '''always''' faster than the equivalent run with IFORT 9.1. 
|<tt>-check arg_temp_created</tt>
#* IFORT 10.1 does indeed seem to optimize code better on machines with multi-core chipsets.
|Checks to see if any array temporaries are createdDepending on how you write your subroutine and function calls, the compiler may need to create a temporary array to hold the values in the array before it passes them to the subroutineFor detailed information, please see our [[Passing array arguments efficiently in GEOS-Chem]] wiki page.
#* For example: Run #6 (w/ IFORT 10) is 89 seconds faster per week than Run #5 (w/ IFORT 9) on 8 CPUsThis implies that a 52-week simulation with IFORT 10 on 8 CPUs would finish ~1hr 15m earlier than the equivalent IFORT 9 run. 
|<tt>DEBUG=yes</tt>
# Switching from 4 to 8 CPU's results in a ~33% speedup for both IFORT 9.1 and IFORT 10.1.
# In general, switching from -O2 to -O3 (while using the same # of CPU's) does not result in a significant speedupThis is true for both IFORT 9.1 and IFORT 10.1.


OUR RECOMMENDATIONS:
|-valign="top"
# If possible, use IFORT 10.1 instead of IFORT 9.1
|<tt>-fpe0</tt>
# Use the following compiler options (see Makefile.ifort):
|This option will cause GEOS-Chem to halt if any type of floating-point error is encountered.  This can happen if an equation results in a denormal value, e.g. <tt>NaN</tt>, or <tt>+/-Infinity</tt>. Common causes of floating-point errors are divisions where the denominator becomes zero.<br>''NOTE: The default compiler setting is <tt>-fpe3</tt>, which will convert many of these denormal values to zeros and then continue execution.''
#*<tt>FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian</tt>
|<tt>FPE=yes</tt>


--[[User:Bmy|Bob Y.]] 16:46, 16 April 2008 (EDT)
|-valign="top"
|<tt>-ftrapuv</tt>
|This option will assign a large numeric value to all local automatic variables.  This makes it easier to identify numerical errors caused by improper initialization.
|<tt>FPE=yes</tt>


Upgrading to IFORT 10.1 does not seem to fix the stacksize problem listed below.  You still need to manually reset the stacksize limit to a large positive number for both [[#Resetting stacksize for Linux|Linux]] and [[#Resetting stacksize for Altix|Altix]] platforms.
|}


--[[User:Bmy|Bob Y.]] 12:41, 25 April 2008 (EDT)
--[[User:Bmy|Bob Y.]] 11:21, 3 October 2013 (EDT)


=== Optimization options for faster runs ===
==== Typical settings for a GEOS-Chem simulation ====
[mailto:yxw@mail.tsinghua.edu.cn Yuxuan Wang] told us about the optimization options: -ipo and -static and said these options would speed up the simulations. I've tested these options on our system at Harvard. The run with the new options show very tiny differences (much less than 1% over 1 month) compared to a run with the old options only. For a full-chemistry run (43 tracers) on 4x5 resolution and 4 processors, the run time is about 10% shorter than previously.


These options are especially efficient to handle the transport. So in simulations with a faster chemistry (like tagged tracers simulations), we expect to see a higher gain in time. For example, the time for a methane run is shorten by about 30 %.
The normal GEOS-Chem build uses the following IFORT compiler flags:


To use these options, in Makefile.ifort, change:
-cpp -w -O2 -auto -noalign -convert big_endian -vec-report0 -fp-model source -openmp
  FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian


to
whereas a debugging run (meant to execute in a debugger such as TotalView) will typically use these flags:
  FFLAGS = -cpp -w -O2 -auto -noalign -convert big_endian -ipo -static


--[[User:Ccarouge|Ccarouge]] 15:54, 8 September 2009 (EDT)
-cpp -w -O0 -auto -noalign -convert big_endian -g -DDEBUG -check arg_temp_created -debug all -fp-model source -fpe0 -ftrapuv -check bounds


NOTE: In order to [[#Relocation truncated to fit error|avoid running out of memory]] if you compiling GEOS-Chem at extremely high resolution (e.g. the 0.25&deg; x 0.3125&deg; nested grids), we recommend adding the following flags:


-mcmodel=medium -shared-intel


== IFORT 9 ==
These are automatically set when you compile with the <tt>NETCDF=yes</tt> or <tt>HDF=yes</tt> compiler options (in [[GEOS-Chem v9-01-03]] and higher).


--[[User:Bmy|Bob Y.]] 17:34, 29 February 2012 (EST)


=== Precision-safe optimization ===


You can use the following Intel Fortran Compiler options to select how aggressively you would like to optimize floating-point operations. 


==== Default behavior ====


'''-fp-model fast'''


=== Resetting stacksize for Linux ===
Example source code:


If you are using IFORT on a Linux machine, you will have to make a similar fix to your <tt>.cshrc</tt> file as was [[#07-Jan-2008|as was described below for the Altix/Itanium platform]].   
REAL T0, T1, T2;
...
  T0 = 4.0E + 0.1E + T1 + T2;


* Harvard users: you do not have to do anything anymore.  The default software configuration is set up to set the stacksize automatically on all nodes of Ceres, Tethys, and Terra so that you don't have to do this manually.
When this option is specified, the compiler applies the following semantics:


* Non-Harvard users: add these lines of code into your <tt>.cshrc</tt>.
#Additions may be performed in any order
#Intermediate expressions may use single, double, or extended precision
#The constant addition may be pre-computed, assuming the default rounding mode


    #--------------------------------------------------------------------------
Using these semantics, the following shows some possible ways the compiler may interpret the original code:  
    # Due to a limitation in the glibc library that is used by the Intel IFORT
    # v9.x and v10.x compilers, you must do the following in order to avoid
    # potential memory problems with OpenMP:
    #
    # (1) Explicitly set the "KMP_STACKSIZE" environment variable to a large
    #      positive number (but not so large that you get an error msg.)
    #
    # For more information see the Intel IFORT release notes:
    #  http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
    #
    # The symptom will be that GEOS-Chem will appear to be out of memory and
    # will die with a segmentation fault.
    #--------------------------------------------------------------------------
    setenv KMP_STACKSIZE 100000000


--[[User:Bmy|Bob Y.]] 15:39, 24 October 2008 (EDT)
REAL T0, T1, T2;
...
T0 = (T1 + T2) + 4.1E;


=== Resetting stacksize for Altix ===
or 


'''''NOTE: The Altix platform is now mostly obsolete.'''''
REAL T0, T1, T2;
...
T0 = (T1 + 4.1E) + T2;


(1) As described above, the IFORT compiler has an error that can cause the GEOS-Chem to appear that it is running out of memory when it actually isn't.  The symptom that we have noticed is that it seems to choke right when the TPCORE is called.  This may tend to happen more often IFORT v9 or v10 on Linux Boxes, but it can also happen on Altix/Itanium systems. 
==== Preferred alternative ====


If GEOS-Chem still crashes with the this error, then you may need to set the stacksize variable to a large positive # instead of unlimited.  This is a known issue with the POSIX glibc library that is used by IFORT.
'''-fp-model source''' (aka '''-fp-model precise''')


Try adding this code to your .cshrc file as well under the "Altix" section:
Example source code:  


    #--------------------------------------------------------------------------
REAL T0, T1, T2;
    # Due to a limitation in the glibc library that is used by the Intel IFORT
...
    # v9.x compilers, you must do the following in order to avoid potential
  T0 = 4.0E + 0.1E + T1 + T2;
    # memory problems with OpenMP:
    #
    # (1) Explicitly set the "KMP_STACKSIZE" environment variable to a
    #      large positive number (e.g. 209715200).
    #
    # For more information see the Intel IFORT release notes:
    # http://archimede.mat.ulaval.ca/intel/fc/9.1.036/doc/Release_Notes.htm
    #
    # The symptom will be that GEOS-Chem will appear to be out of memory and
    # will die with a segmentation fault.  This may happen especially if you
    # are running GEOS-Chem with GEOS-5 met on Altix or Titan.
    #
    # (bmy, 8/16/07, 9/9/08)
    #--------------------------------------------------------------------------
    setenv KMP_STACKSIZE 209715200


The 2097152 is the maximum allowable stacksize on the Harvard Altix/Itanium system.  This may be different on your system.  You can find out the maximum stacksize on your machine by typing "limit" at the Unix prompt.  On your machine the number may vary.  Then just cut-n-paste this number and replace the "2097152 kbytes" in the text above and put that into your .cshrc or .bashrc. 
When this option is specified, the compiler applies the following semantics:


(2) If you are using the IFORT 10.x compilers, then you might also need to tell the compiler to put automatic arrays into heap memory instead of on the stack. 
#Additions are performed in program order, taking into account any parentheses 
#Intermediate expressions use the precision specified in the source code
#The constant addition may be pre-computed, assuming the default rounding mode


'''''[mailto:seymour@atmosp.physics.utoronto.ca Mike Seymour] wrote:'''''
Using these semantics, the following shows a possible way the compiler may interpret the original code:  


<blockquote>
REAL T0, T1, T2;
I found this Intel page regarding stack sizes and ifort >=8.0:
...
T0 = ((4.1E + T1) + T2);


:http://www.intel.com/support/performancetools/fortran/sb/cs-007790.htm.
==== Summary ====


It suggests for ifort 10.0 to use the heap for temporary storage with -heap-arrays <size>, where arrays known at compile-time to be larger than <size> are allocated on the heap instead of the stack.
If you do not select any <tt>-fp-model</tt> option, the Intel Fortran Compiler will default to <tt>-fp-model fast</tt>. As you can see from the examples above, this may not optimize the code in the same way each time.  This can lead to minor numerical noise in the output, as was [[ISORROPIA II#Optimization and.2For parallelization issues in ISORROPIA_II|seen in ISORROPIA II]].


However, setting <size> to be 1000 does not change things. I don't know if smaller values will have an effect, or if there will be performance issues.
To avoid this situation, we recommend compiling all source code files with <tt>-fp-model source</tt>. This will be the new default in [[GEOS-Chem v9-01-02]].
</blockquote>


--[[User:Bmy|Bob Y.]] 16:13, 9 September 2008 (EDT)
Reference: ''Intel® Fortran Floating-point Operations''; Document Number: 315892-003US


=== Resetting stacksize on other platforms ===
--[[User:Bmy|Bob Y.]] 17:01, 25 August 2011 (EDT)


'''''[mailto:win@cmu.edu Win Trivitayanurak] wrote:'''''
=== Optimization options for faster runs ===


:I'm running a 4x5 resolution with 310 tracers. Recent development was few subroutines and an additional allocation of 30 elements in an array -- not in STT array, so it's not like 30x(72x46x30) more memory, but still probably enough to reach the stacksize limit.  Now the "ulimit" solves the problem.
[mailto:yxw@mail.tsinghua.edu.cn Yuxuan Wang] told us about the optimization options: <tt>-ipo</tt> and <tt>-static</tt> and said these options would speed up the simulations. I've tested these options on our system at Harvard. The run with the new options show very tiny differences (much less than 1% over 1 month) compared to a run with the old options only. For a full-chemistry run (43 tracers) on 4x5 resolution and 4 processors, the run time is about 10% shorter than previously.


:I found that I can set environment in my shell-runscript (e.g. .cshrc file) to have the large enough stacksize. I found good suggestions from [http://www.ccsm.ucar.edu/models/atm-cam/docs/usersguide/node18.html this website] and for different platforms, the lines are:
These options are especially efficient to handle the transport. So in simulations with a faster chemistry (like tagged tracers simulations), we expect to see a higher gain in time. For example, the time for a methane run is shorten by about 30 %.
    Compaq
      limit stacksize unlimited
      setenv MP_STACK_SIZE 17000000
    IBM
      limit stack size unlimited
      setenv XLSMPOPTS "stack=40000000"
    SGI origin
      limit stack size unlimited
      setenv MP_SLAVE_STACKSIZE 40000000
    SUN/Solaris
      limit stacksize unlimited
    PC/Linux
      limit stacksize unlimited
      setenv MPSTKZ 40000000


--[[User:Bmy|Bob Y.]] 10:41, 17 October 2008 (EDT)
To use these options, compile GEOS-Chem with the <tt>IPO=yes</tt> Makefile option, e.g.


make -j4 IPO=yes


=== Speedup With Hyperthreading on Nehalem chips ===
--[[User:Ccarouge|Ccarouge]] 15:54, 8 September 2009 (EDT)<br>
--[[User:Bmy|Bob Y.]] 17:50, 29 February 2012 (EST)


Hyperthreading is when a job uses more threads than there are actual CPU cores.  I've noticed that using 16 threads ($OMP_NUM_THREADS = 16) on an 8-core system (2 x quad core Intel Nehalem X5570's) leads to a 15% speedup over using 8 threads.  These tests were with GEOS-Chem v8-02-03, full chemistry, 2x2.5, ifort 10.1.021, and
=== Optimization level for debugging ===


  FFLAGS = -cpp -w -O3 -auto -noalign -convert big_endian -g -traceback -CB -vec-report0. 
If you would like to run your code in a debugger, such as Totalview, you must use the following compiler switches:


This does not have a positive impact when using earlier generations of Intel chips (Harpertown or Clovertown).
-g -O0


--[[User:daven|Daven Henze]] 1:42, 16 December 2009 (MDT)
Using <tt>-O0</tt> will ensure that the source code gets executed in the same order in which it is written (i.e. this disables all compiler optimizations).  The <tt>-g</tt> switch will tell the debugger to display lines of source code instead of hexadecimal memory addresses (which are more or less gibberish unless you are a hardware engineer).


== PGI Compiler ==
GEOS-Chem will add these switches automatically for you if you compile with the <tt>DEBUG=yes</tt> option.


=== Error with ADJUSTL and ADJUSTR ===
--[[User:Bmy|Bob Y.]] 15:28, 22 February 2012 (EST)


'''''[mailto:win@cmu.edu Win Trivitayanurak] wrote:'''''
=== Caveat about optimizing for specific chipsets ===


<blockquote>
The standard GEOS-Chem build sequence does not include any optimization flags that are specific to a certain type of CPUIf you are interested, you can certainly experiment for yourselfBut be aware that this may invoke certain chip-level optimizations that could potentially change the simulation output.
In short, TRIM and ADJUSTL or ADJUSTR do not work together properly when compiled with Portland Group FortranI propose removing TRIM inside the subroutine StrSqueezeThis is not urgent and relevant to only the few PGI users.
</blockquote>


So if you are using the PGI compiler, then you will have to modify the code in routine STRSQUEEZE "charpak_mod.f" such that the statements
'''''[[User:Jaf|Jenny Fisher]] wrote:'''''


STR = ADJUSTR( TRIM( STR ) )
<blockquote>I have tested the new chips & compiler option. I found that there are small differences [in difference test output]...if I use exactly the same compile commands and number of processors between our old cores and our new Broadwell cores (E5-2690 v4). The differences are very small and I think nothing to worry about.
STR = ADJUSTL( TRIM( STR ) )


are now replaced with
However, adding the preferred compiler flag <tt>-xCORE-AVX2</tt> led to much bigger differences (e.g., up to 5% difference or 10 ppb in ozone…). I haven’t investigated the differences in detail. I did run a one month benchmark comparison, and see that the differences can be consequential after a month (i.e. not just differences in regions where values are low.


STR = ADJUSTR( STR )
I have no idea what is causing these differences. So I guess for the moment, I would recommend *not* using the specific optimisation for Broadwell/Haswell cores. However, I think it probably is ok to use the Broadwell cores without this flag. I am not sure what impact this choice will have on performance.</blockquote>
STR = ADJUSTL( STR )


and this will solve the problem.  We will incorporate this into a future release of GEOS-Chem.
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 14:50, 28 March 2017 (UTC)

Latest revision as of 19:28, 21 May 2024

Previous | Next | Guide to compilers for GEOS-Chem

  1. Supported compilers for GEOS-Chem
  2. The GNU Fortran compiler (gfortran)
  3. The Intel Fortran compiler (ifort, ifx)
  4. Fortran language resources


This page contains information about the Intel Fortran compiler, which has recently been renamed from ifort to ifx.

The Intel Fortran compiler is our recommended proprietary compiler for GEOS-Chem.

Overview

Documentation

You can find more information about the Intel Fortran Compiler here:

  1. Fortran 19 documentaton
  2. Intel Fortran 17

Also, normally when you installs the Intel Fortran compilers, you also will install the C and C++ compilers.

--Bob Yantosca (talk) 19:44, 10 January 2019 (UTC)

Intel Fortran Compiler versions that have been tested with GEOS-Chem

The GEOS-Chem Support Team has tested GEOS-Chem with the compiler versions listed below. But you should be able to use other Intel Fortran Compiler versions as well.

Platform Compiler Status
Linux ifort 23.0.0 Supported
Linux ifort 19.0.5.281 Supported
Linux ifort 18.0.5 Supported
Linux ifort 17.0.4 Supported
Linux ifort 15.0.0 and similar builds Supported
  • NOTE: IFORT 15 has a compiler bug that causes errors when turning on array-out-of-bounds checking and optimization.
Linux ifort 13.0.079 and similar builds Supported
Linux ifort 12 Supported
  • NOTE: A|compiler bug in ifort 12 and higher versions]] has forced us to add a workaround to HEMCO in v11-01.
Linux ifort 11.1.069 and similar builds Supported

Environment settings for Intel Fortran

Here is some information about how you can customize your Unix environment to use the Intel Fortran compiler.

Here is some information about how you can customize your Linux environment to use the GNU Fortran compiler. This information was recently migrated to our geos-chem.readthedocs.io manual.

Optimization

In this section we present information about the various optimization options available in the Intel Fortran Compiler.

Optimization options

Here is a quick reference table of optimization options (taken from the online Intel Fortran Compiler User and Reference Guides.

Option Description How invoked in GEOS-Chem?
-O0 Turns off all optimizations. Math expressions will be evaluated in the same order in which they are written, which is necessary for debugging. If you are using a debugger (such as Totalview), compile with -g -O0. DEBUG=yes or
OPT=-O0
-O1 Enables optimizations for speed and disables some optimizations that increase code size and affect speed. The -O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

Setting -O1 automatically sets the following options:

  1. -funroll-loops0,
  2. -nofltconsistency (same as -mno-ieee-fp),
  3. -fomit-frame-pointer,
  4. -ftz
OPT=-O1
-O2 (aka -O) Enables optimizations for speed. This is the generally recommended optimization level.

This option also enables:

  1. Inlining of intrinsics
  2. Intra-file interprocedural optimizations, which include:
    • inlining
    • constant propagation#
    • forward substitution
    • routine attribute propagation
    • variable address-taken analysis
    • dead static function elimination
    • removal of unreferenced variables
  3. The following capabilities for performance gain:
    • constant propagation
    • copy propagation
    • dead-code elimination
    • global register allocation
    • global instruction scheduling and control speculation
    • loop unrolling
    • optimized code selection
    • partial redundancy elimination
    • strength reduction/induction variable simplification
    • variable renaming
    • exception handling optimizations
    • tail recursions
    • peephole optimizations
    • structure assignment lowering and optimizations
    • dead store elimination

On Linux and Mac OS X systems, if -g is specified, -O2 is turned off and -O0 is the default unless -O2 (or -O1 or -O3) is explicitly specified in the command line together with -g.

Default setting
-O3 Enables -O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations.

Enables optimizations for maximum speed, such as:

  1. Loop unrolling, including instruction scheduling
  2. Code replication to eliminate branches
  3. Padding the size of certain power-of-two arrays to allow more efficient cache use.

On Linux and Mac OS X systems, the -O3 option sets option -fomitframe-pointer.

The -O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to -O2 optimizations. The -O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.

OPT=-O3

--Bob Y. 11:14, 3 October 2013 (EDT)

Recommended compilation and optimization options for GEOS-Chem

In this section, we present information about the compilation and optimization options that are invoked when you compile a GEOS-Chem simulation.

List of commonly-used compilation options

Here are the IFORT compilation options currently used by GEOS-Chem:

Option Description How invoked in GEOS-Chem?
Normal compiler settings
-cpp Turns on the C-preprocessor, to evaluate #if and #define statements in the source code. Default setting
-w Suppresses all compiler warnings. This is mainly a convenience to prevent excessive output to the screen or log file.

NOTE: Most compiler warnings are harmless. Execution does not stop when a warning is displayed, unlike an error message, which causes program execution to halt at the point where the error occurred.

Default setting
-O2 Optimizes the source code for speed, without taking too many liberties with numerical precision. For more information, please see the optimization options section above. Default setting
-auto This option places local variables (scalars and arrays of all types), except those declared as SAVE, on the run-time stack. It is as if the variables were declared with the AUTOMATIC attribute. It does not affect variables that have the SAVE attribute or ALLOCATABLE attribute, or variables that appear in an EQUIVALENCE statement or in a common block. Default setting
-noalign Prevents the compiler from padding bytes anywhere in common blocks and structures. Padding can affect numerical precision. Default setting
-convert big_endian Specifies that the format will be big endian for integer data and big endian IEEE floating-point for real and complex data. This only affects file I/O to/from binary files (such as binary punch files) but not ASCII, netCDF, or other file formats. Default setting
-vec-report0 Tells the compiler to suppress printing "LOOP HAS BEEN VECTORIZED" messages. This reduces the amount of output that is sent to the screen and/or GEOS-Chem log file. Default setting
-fp-model source Rounds intermediate results to source-defined precision and enables value-safe optimizations. Basically, this tells the compiler not to take too many liberties with how numerical expressions are evaluated. For more information about this option, please see our precision-safe optimization section below. This option can be disabled by compiling GEOS-Chem with the PRECISE=no Makefile option. Default setting
-traceback This option tells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time. When the severe error occurs, source file, routine name, and line number correlation information is displayed along with call stack hexadecimal addresses (program counter trace). This option increases the size of the executable program, but has no impact on run-time execution speeds. It functions independently of the debug option.
  • Default setting
    (v11-01 and higher)
  • TRACEBACK=yes
    (prior versions)
Special compiler settings
-r8 This option tells the compiler to treat variables that are declared as REAL as REAL*8 (as opposed to REAL*4.

NOTE: This option is not used globally, but is only applied to certain indidvidual files (mostly from third-party codes like ISORROPIA. Current GEOS-Chem programming practice is to use either REAL*4 or REAL*8 instead of REAL, which avoids confusion.

Used as needed
-mcmodel=medium This option is used to tell IFORT to use more than 2GB of static memory. This avoids a specific type of memory error that can occur if you compile GEOS-Chem for use with an extremely high-resolution grid (e.g. 0.25° x 0.3125° nested grid). Default setting
-shared-intel
(formerly -i-dynamic)
This option needs to be used in conjunction with -mcmodel=medium. It causes Intel-provided libraries to be linked in dynamically instead of statically (which is the default). Default setting
-ipo This option enables interprocedural optimization between files. This is also called multifile interprocedural optimization (multifile IPO) or Whole Program Optimization (WPO). When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.

NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations. See the this wiki post below for more information.

IPO=yes
-static This option prevents linking with shared libraries. It causes the executable to link all libraries statically.

NOTE: Yuxuan Wang found that this option was useful for certain nested-grid simulations. See the this wiki post below for more information.

IPO=yes
Settings only used for debugging
-debug all Tells the compiler turn on all debug error output. DEBUG=yes
-g Tells the compiler to generate full debugging information in the object file. This will cause a debugger (like Totalview) to display the actual lines of source code, instead of hexadecimal addresses (which is gibberish to anyone except hardware engineers). DEBUG=yes
-O0 Turns off all optmization. Source code instructions (e.g. DO loops, IF blocks) and numerical expressions are evaluated in precisely the order in which they are listed, without being internally rewritten by the optimizer. This is necessary for using a debugger (like Totalview). DEBUG=yes
-check bounds (aka -CB) Check for array-out-of-bounds errors. This is invoked when you compile GEOS-Chem with the BOUNDS=yes Makefile option. NOTE: Only use -CB for debugging, as this option will cause GEOS-Chem to execute more slowly! DEBUG=yes
-check arg_temp_created Checks to see if any array temporaries are created. Depending on how you write your subroutine and function calls, the compiler may need to create a temporary array to hold the values in the array before it passes them to the subroutine. For detailed information, please see our Passing array arguments efficiently in GEOS-Chem wiki page. DEBUG=yes
-fpe0 This option will cause GEOS-Chem to halt if any type of floating-point error is encountered. This can happen if an equation results in a denormal value, e.g. NaN, or +/-Infinity. Common causes of floating-point errors are divisions where the denominator becomes zero.
NOTE: The default compiler setting is -fpe3, which will convert many of these denormal values to zeros and then continue execution.
FPE=yes
-ftrapuv This option will assign a large numeric value to all local automatic variables. This makes it easier to identify numerical errors caused by improper initialization. FPE=yes

--Bob Y. 11:21, 3 October 2013 (EDT)

Typical settings for a GEOS-Chem simulation

The normal GEOS-Chem build uses the following IFORT compiler flags:

-cpp -w -O2 -auto -noalign -convert big_endian -vec-report0 -fp-model source -openmp

whereas a debugging run (meant to execute in a debugger such as TotalView) will typically use these flags:

-cpp -w -O0 -auto -noalign -convert big_endian -g -DDEBUG -check arg_temp_created -debug all -fp-model source -fpe0 -ftrapuv -check bounds

NOTE: In order to avoid running out of memory if you compiling GEOS-Chem at extremely high resolution (e.g. the 0.25° x 0.3125° nested grids), we recommend adding the following flags:

-mcmodel=medium -shared-intel

These are automatically set when you compile with the NETCDF=yes or HDF=yes compiler options (in GEOS-Chem v9-01-03 and higher).

--Bob Y. 17:34, 29 February 2012 (EST)

Precision-safe optimization

You can use the following Intel Fortran Compiler options to select how aggressively you would like to optimize floating-point operations.

Default behavior

-fp-model fast

Example source code:

REAL T0, T1, T2;
...
T0 = 4.0E + 0.1E + T1 + T2; 

When this option is specified, the compiler applies the following semantics:

  1. Additions may be performed in any order
  2. Intermediate expressions may use single, double, or extended precision
  3. The constant addition may be pre-computed, assuming the default rounding mode

Using these semantics, the following shows some possible ways the compiler may interpret the original code:

REAL T0, T1, T2; 
...
T0 = (T1 + T2) + 4.1E; 

or

REAL T0, T1, T2; 
...
T0 = (T1 + 4.1E) + T2;

Preferred alternative

-fp-model source (aka -fp-model precise)

Example source code:

REAL T0, T1, T2;
...
T0 = 4.0E + 0.1E + T1 + T2; 

When this option is specified, the compiler applies the following semantics:

  1. Additions are performed in program order, taking into account any parentheses
  2. Intermediate expressions use the precision specified in the source code
  3. The constant addition may be pre-computed, assuming the default rounding mode

Using these semantics, the following shows a possible way the compiler may interpret the original code:

REAL T0, T1, T2;
...
T0 = ((4.1E + T1) + T2);

Summary

If you do not select any -fp-model option, the Intel Fortran Compiler will default to -fp-model fast. As you can see from the examples above, this may not optimize the code in the same way each time. This can lead to minor numerical noise in the output, as was seen in ISORROPIA II.

To avoid this situation, we recommend compiling all source code files with -fp-model source. This will be the new default in GEOS-Chem v9-01-02.

Reference: Intel® Fortran Floating-point Operations; Document Number: 315892-003US

--Bob Y. 17:01, 25 August 2011 (EDT)

Optimization options for faster runs

Yuxuan Wang told us about the optimization options: -ipo and -static and said these options would speed up the simulations. I've tested these options on our system at Harvard. The run with the new options show very tiny differences (much less than 1% over 1 month) compared to a run with the old options only. For a full-chemistry run (43 tracers) on 4x5 resolution and 4 processors, the run time is about 10% shorter than previously.

These options are especially efficient to handle the transport. So in simulations with a faster chemistry (like tagged tracers simulations), we expect to see a higher gain in time. For example, the time for a methane run is shorten by about 30 %.

To use these options, compile GEOS-Chem with the IPO=yes Makefile option, e.g.

make -j4 IPO=yes

--Ccarouge 15:54, 8 September 2009 (EDT)
--Bob Y. 17:50, 29 February 2012 (EST)

Optimization level for debugging

If you would like to run your code in a debugger, such as Totalview, you must use the following compiler switches:

-g -O0

Using -O0 will ensure that the source code gets executed in the same order in which it is written (i.e. this disables all compiler optimizations). The -g switch will tell the debugger to display lines of source code instead of hexadecimal memory addresses (which are more or less gibberish unless you are a hardware engineer).

GEOS-Chem will add these switches automatically for you if you compile with the DEBUG=yes option.

--Bob Y. 15:28, 22 February 2012 (EST)

Caveat about optimizing for specific chipsets

The standard GEOS-Chem build sequence does not include any optimization flags that are specific to a certain type of CPU. If you are interested, you can certainly experiment for yourself. But be aware that this may invoke certain chip-level optimizations that could potentially change the simulation output.

Jenny Fisher wrote:

I have tested the new chips & compiler option. I found that there are small differences [in difference test output]...if I use exactly the same compile commands and number of processors between our old cores and our new Broadwell cores (E5-2690 v4). The differences are very small and I think nothing to worry about.

However, adding the preferred compiler flag -xCORE-AVX2 led to much bigger differences (e.g., up to 5% difference or 10 ppb in ozone…). I haven’t investigated the differences in detail. I did run a one month benchmark comparison, and see that the differences can be consequential after a month (i.e. not just differences in regions where values are low.

I have no idea what is causing these differences. So I guess for the moment, I would recommend *not* using the specific optimisation for Broadwell/Haswell cores. However, I think it probably is ok to use the Broadwell cores without this flag. I am not sure what impact this choice will have on performance.

--Bob Yantosca (talk) 14:50, 28 March 2017 (UTC)