Timing tests with GEOS-Chem v11-01

From Geos-chem
Revision as of 19:31, 25 June 2019 by Bmy (Talk | contribs) (For more information)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

On this page we show the results of timing tests done with GEOS-Chem v11-01.

Overview

The GEOS-Chem Support Team has created a timing test package that you can use to determine the performance of GEOS-Chem on your system. The time test runs the GEOS-Chem 12.0.0 code for 7 model days with the "Standard" chemistry mechanism. Our experience has shown that a 7-day simulation will give a more accurate timing result than a 1-day simulation. This is because much of the file I/O (i.e. HEMCO reading annual or monthly-mean emissions fields) occurs on the first day of a run.

Installation

If you haven't already, download the GEOS-Chem v11-01 source code using:

git clone https://bitbucket.org/gcst/geos-chem Code.v11-01

Create your run directory using the GEOS-Chem UnitTest by following these instructions. To copy the gc_timing run directory, make sure you have the following lines in your CopyRunDirs.input file:

#--------|-----------|------|------------|------------|------------|---------|
# MET    | GRID      | NEST | SIMULATION | START DATE | END DATE   | EXTRA?  |
#--------|-----------|------|------------|------------|------------|---------|
 geosfp   4x5         -      gc_timing    2016070100   2016070800   -

Compilation

To build the code, follow these steps:

 cd geosfp_4x5_gc_timing
 make realclean
 make -j4 mpbuild 2>&1 log.build

Information about the options used for the compilation (as well as the compiler version) will be printed to the file lastbuild.mp.

Performing the timing test

To run the code, follow the instructions in the

 geosfp_4x5_gc_timing/README 

file. We have provided sample run scripts that you can use to submit jobs:

 geosfp_4x5_gc_timing/doTimeTest          # Submit job directly
 geosfp_4x5_gc_timing/doTimeTest.slurm    # Submit job using the SLURM scheduler  

The regular GEOS-Chem output as well as timing information will be sent to a log file named:

 doTimeTest.log.ID

where ID is either the SLURM job ID # or the process ID.

Displaying the test results

You can print out the timing results with the printTime script:

 cd geosfp_4x5_gc_timing
 ./printTime doTimeTest.log.ID

which will display results similar to this:

GEOS-Chem 7-Model-Day Time Test Results
===============================================================================

Machine information
-------------------------------------------------------------------------------
Machine or node name: : holyjacob03.rc.fas.harvard.edu
CPU vendor            : GenuineIntel
CPU model name        : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
CPU speed [MHz]       : 2494.097 

GEOS-Chem information
-------------------------------------------------------------------------------
GEOS-Chem Version     : v11-01
Last commit           : Now use correct species DB FullName for TOMAS DUST* species 
Commit date           : Tue Dec 13 18:05:28 2016 -0500 
Compiler version      : ifort 11.1
Compilation options   : geosfp 4x5 standard no_reduced UCX timers

Simulation information
-------------------------------------------------------------------------------
Simulation start date : 20130701 000000
Simulation end date   : 20130708 000000
Number of CPUs used   : 4
Total CPU time  [s]   : 42926.897
Wall clock time [s]   : 11810.496
CPU / Wall ratio      : 3.6346
% of ideal performace : 90.87

You can then use these results to fill in the table below.

--Bob Yantosca (talk) 18:12, 19 December 2016 (UTC)

Table of 7-model-day run times

The following timing test results were done with the "out-of-the-box" GEOS-Chem v11-01 provisional release code configuration.

  • All jobs used GEOS-FP meteorology at 4° x 5° resolution.
  • Jobs started on model date 2013/07/01 00:00 GMT and finished on 2013/07/08 00:00 GMT.
  • The code was compiled from the run directory (geosfp_4x5_gc_timing) with the the standard option make -j4 mpbuild. This sets the following compilation variables:
    • MET=geosfp GRID=4x5 CHEM=benchmark UCX=y NO_REDUCED=n TRACEBACK=n BOUNDS=n FPE=n DEBUG=n NO_ISO=n NEST=n
  • Wall clock times are listed from fastest to slowest, for the same number of CPUs.
  • It's OK to round CPU and wall clock times to the nearest second, for clarity.
Submitter Machine or Node
and Compiler
CPU vendor CPU model Speed [MHz] # of
CPUs
CPU time Wall time CPU / Wall
ratio
% of ideal
Results from 7-model-day time tests using 36 CPUs
Chi Li (Dalhousie) newnode28
stetson cluster
ifort 17.0.5 (without -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz 2992.967 36 89196.76 s
24:46:37
2827.36 s
00:47:08
31.5477 87.63
Results from 7-model-day time tests using 32 CPUs
William Porter (UCR) hood
aldo cluster
ifort 18.0.1 (without -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) Gold 6130F CPU @ 2.10GHz 1300.000 32 78541.66 s
21:49:02
2786.12 s
00:46:26
28.1903 88.09
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 32 95693.52 s
26:34:53
3999.24 s
01:06:39
23.9279 74.77
Results from 7-model-day time tests using 28 CPUs
Jenny Fisher
(NCI)
r4122
ifort 16.0.3 (without -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2601.000 28 86045.92 s
23:54:06
3687.29 s
01:01:27
23.3358 83.34
Results from 7-model-day time tests using 24 CPUs
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 24 79975.84 s
22:12:55
4170.86 s
01:09:30
19.1749 79.9
Results from 7-model-day time tests using 16 CPUs
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 16 61617.87 s
17:06:57
4678.14 s
01:17:58
13.1714 82.32
Jenny Fisher
(NCI)
r1808
ifort 12.1.3
GenuineIntel Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz 2601.000 16 73955.44 s
20:32:35
6129.04 s
01:42:09
12.0664 75.41
Results from 7-model-day time tests using 14 CPUs
Jenny Fisher
(NCI)
r3709
ifort 17.0.1 (with -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2601.000 14 49676.31 s
13:47:56
4386.56 s
01:13:07
11.3247 80.89
Jenny Fisher
(NCI)
r3713
ifort 16.0.3 (without -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2601.000 14 55131.59 s
15:18:52
4767.78 s
01:19:28
11.5634 82.6
Results from 7-model-day time tests using 12 CPUs
Andy Jacobson &
Ken Schuldt
(NOAA ESRL)
t0442
(NOAA “theia” platform)
ifort 15.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2601.000 12 56505.83 s
15:41:45
4932.59 s
01:22:12
11.4556 95.46
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 12 51007.05 s
14:10:07
4984.12 s
01:23:04
10.2339 85.28
Jiawei Zhuang (Harvard) r305i1n13
(Pleiades Haswell node)
ifort 16.0.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2501.000 12 99915.79 s
27:45:16
4998.65 s
01:23:18
19.9886 83.29
Jiawei Zhuang (Harvard) r629i6n1
(Pleiades Broadwell node)
ifort 16.0.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz 2401.000 12 54498.33 s
15:08:18
5203.77 s
01:26:44
10.4729 87.27
Bob Yantosca
(GCST)
holyjacob02.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.444 12 52928.482 s
14:33:49
5600.497 s
01:33:20
9.4507 78.76
Bob Yantosca
(GCST)
holyjacob06.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.120 12 53075.95 s
14:26:37
5612.76 s
01:33:33
9.4563 78.8
Jiawei Zhuang (Harvard) r591i3n3
(Pleiades Haswell node)
ifort 16.0.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2501.000 12 63960.11 s
17:46:00
6102.58 s
01:41:43
10.4808 87.34
Jiawei Zhuang (Harvard) r459i3n6
(Pleiades Ivy-Bridge node)
ifort 16.0.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz 2801.000 12 64005.95 s
17:46:46
6166.78 s
01:42:47
10.3792 86.49
Bob Yantosca
(GCST)
holyjacob03.rc.fas.harvard.edu
gfortran 4.8.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.097 12 47826.4 s
13:17:06
6205.87 s
01:43:26
7.7066 64.22
Jiawei Zhuang (Harvard) r305i1n13
(Pleiades Sandy-Bridge node)
ifort 16.0.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz 2601.000 12 75891.12 s
21:04:51
7334.51 s
02:02:15
10.3471 86.23
Results from 7-model-day time tests using 8 CPUs
Jenny Fisher
(NCI)
r3866
ifort 17.0.1 (with -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2601.000 8 38550.7 s
10:42:31
5594.28 s
01:33:14
6.8911 86.14
Jenny Fisher
(NCI)
r3713
ifort 16.0.3 (without -xCORE-AVX2)
GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2601.000 8 42235.58 s
11:43:56
5834.34 s
01:37:14
7.2391 90.49
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 8 44324.83 s
12:18:44
6357.22 s
01:45:57
6.9724 87.16
Melissa Sulprizio
(GCST)
holyjacob05.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.177 8 50711.89 s
14:05:11
7673.93 s
02:07:56
6.6083 82.6
Bob Yantosca
(GCST)
holyjacob03.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.097 8 51955.61 s
14:25:56
7917.931 s
02:11:58
6.6188 82.74
Bob Yantosca
(GCST)
holyjacob04.rc.fas.harvard.edu
gfortran 4.8.2
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.196 8 47629.5 s
13:13:50
8290.15 s
02:18:10
5.7453 71.82
Jenny Fisher
(NCI)
r52
ifort 12.1.3
GenuineIntel Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz 2601.000 8 57825.7 s
16:03:46
8812.05 s
02:26:52
6.5621 82.03
Bob Yantosca
(GCST)
regal16.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 2199.915 8 62554.07 s
17:22:34
9355.80 s
02:35:59
6.6861 83.58
Results from 7-model-day time tests using 6 CPUs
Bob Yantosca
(GCST)
holyjacob06.rc.fas.harvard.edu GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.120 6 43571.73
12:06:12
8351.52
02:19:12
5.2172 86.95
Results from 7-model-day time tests using 4 CPUs
Daniel Rothenberg (MIT) c094
svante cluster
ifort 17.0.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz 2599.84 4 35147.97 s
09:45:47
9455.95 s
02:37:35
3.717 92.93
Bob Yantosca
(GCST)
holyjacob03.rc.fas.harvard.edu
ifort 11.1
GenuineIntel Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 2494.097 4 42667.423 s
11:51:07
11810.496 s
03:16:50
3.6346 90.87
Results from 7-model-day time tests using 2 CPUs
TBD
Results from 7-model-day time tests using 1 CPU
TBD

--Bob Yantosca (talk) 00:11, 14 February 2017 (UTC)

Graph of 7-model-day run times

This plot shows displays the "wall clock" run time vs. # of CPUs for each entry in the table above.

V11-01-time-test-results.png

As you can see, there is a steady decrease of wall time when running GEOS-Chem with 16 CPUs or less. After that, the additional overhead (i.e. more CPUs having to talk to each other and to the memory) starts to negatively impact performance.

--Bob Yantosca (talk) 17:42, 29 March 2017 (UTC)

Time spent in each GEOS-Chem operation

You can view the amount of time spent in each GEOS-Chem operation by enabling GEOS-Chem timers. Adding TIMERS=1 to your make command will tell GEOS-Chem to enable timers and print a timer summary at the end of your log file.

Comparing ifort 11 vs. gfortran 6.2.0

For some timing information comparing the Intel Fortran Compiler version 11 vs. the GNU Fortran compiler version 6.2.0, please see this post on our GNU Fortran compiler wiki page.

--Bob Yantosca (talk) 16:55, 20 April 2017 (UTC)

Comparing ifort 11 vs. gfortran 4.8.2

NOTE: This analysis was done with v11-01 and GNU Fortran 4.8.2. Newer versions of GNU Fortran (e.g. 6.2.0) might show different results.

Bob Yantosca wrote:

For reference, I was able to compare how long each GC operation takes from 2 timetest runs on the Intel Xeon nodes of odyssey.rc.fas.harvard.edu. The colored lines indicate where gfortran takes significantly longer than ifort:
                               ifort 11           gfortran 4.8.2
                               12 CPUs            12 CPUs
                               holyjacob06        holyjacob03 
                               SLURM: 77647117    SLURM: 77631200 

      Timer name               hh:mm:ss.SSS       hh:mm:ss.SSS
    ----------------------------------------------------------------
      GEOS-Chem             :  01:33:30.437       01:43:22.906
      Initialization        :  00:00:07.390       00:00:04.062       
      Timesteps             :  01:33:21.906       01:43:18.062
      HEMCO                 :  00:17:03.859       00:15:21.625
      All chemistry         :  00:37:55.312       00:38:45.875
      => Strat chem         :  00:00:39.140       00:00:38.781
      => Gas-phase chem     :  00:33:06.609       00:30:21.562
      => All aerosol chem   :  00:03:40.921       00:07:07.343
      Transport             :  00:07:58.062       00:12:25.000
      Convection            :  00:21:48.296       00:26:19.812
      Dry deposition        :  00:00:25.062       00:00:30.375       
      Wet deposition        :  00:03:41.156       00:03:38.343       
      Diagnostics           :  00:02:23.421       00:03:51.343
      Reading met fields    :  00:00:06.265       00:00:08.437       
      Reading restart file  :  00:00:00.312       00:00:00.343       
      Writing restart file  :  00:01:13.703       00:01:32.843
I wonder if some of this comes down to either OpenMP or math optimizations. In any case, we know that gfortran is slower in general but it’s interesting to see that for some operations it’s very comparable to ifort. Note that the results above are only from 2 individual tests; we can improve the statistics by averaging the results of several tests. We will do this as time allows.

--Bob Yantosca (talk) 17:22, 22 December 2016 (UTC)

For more information

We invite you to see our Guide to GEOS-Chem performance for more inforamation about GEOS-Chem parallelization, performance, and scalability.

--Bob Yantosca (talk) 19:30, 25 June 2019 (UTC)