Timing tests with GEOS-Chem v11-01
On this page we show the results of timing tests done with GEOS-Chem v11-01.
Contents
Overview
The GEOS-Chem Support Team has created a timing test package that you can use to determine the performance of GEOS-Chem on your system. The time test runs the GEOS-Chem v11-01 provisional release code for 7 model days with the "Standard" chemistry mechanism. Our experience has shown that a 7-day simulation will give a more accurate timing result than a 1-day simulation. This is because much of the file I/O (i.e. HEMCO reading annual or monthly-mean emissions fields) occurs on the first day of a run.
Installation
If you haven't already, download the GEOS-Chem v11-01 source code using:
git clone https://bitbucket.org/gcst/geos-chem Code.v11-01
Create your run directory using the GEOS-Chem UnitTest by following these instructions. To copy the gc_timing run directory, make sure you have the following lines in your CopyRunDirs.input file:
#--------|-----------|------|------------|------------|------------|---------| # MET | GRID | NEST | SIMULATION | START DATE | END DATE | EXTRA? | #--------|-----------|------|------------|------------|------------|---------| geosfp 4x5 - gc_timing 2013070100 2013070800 -
Compilation
To build the code, follow these steps:
cd geosfp_4x5_gc_timing make realclean make -j4 mpbuild 2>&1 log.build
Information about the options used for the compilation (as well as the compiler version) will be printed to the file lastbuild.mp.
Performing the timing test
To run the code, follow the instructions in the
geosfp_4x5_gc_timing/README
file. We have provided sample run scripts that you can use to submit jobs:
geosfp_4x5_gc_timing/doTimeTest # Submit job directly geosfp_4x5_gc_timing/doTimeTest.slurm # Submit job using the SLURM scheduler
The regular GEOS-Chem output as well as timing information will be sent to a log file named:
doTimeTest.log.ID
where ID is either the SLURM job ID # or the process ID.
Displaying the test results
You can print out the timing results with the printTime script:
cd geosfp_4x5_gc_timing ./printTime doTimeTest.log.ID
which will display results similar to this:
GEOS-Chem 7-Model-Day Time Test Results =============================================================================== Machine information ------------------------------------------------------------------------------- Machine or node name: : holyjacob03.rc.fas.harvard.edu CPU vendor : GenuineIntel CPU model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz CPU speed [MHz] : 2494.097 GEOS-Chem information ------------------------------------------------------------------------------- GEOS-Chem Version : v11-01 Last commit : Now use correct species DB FullName for TOMAS DUST* species Commit date : Tue Dec 13 18:05:28 2016 -0500 Compiler version : ifort 11.1 Compilation options : geosfp 4x5 standard no_reduced UCX timers Simulation information ------------------------------------------------------------------------------- Simulation start date : 20130701 000000 Simulation end date : 20130708 000000 Number of CPUs used : 4 Total CPU time [s] : 42926.897 Wall clock time [s] : 11810.496 CPU / Wall ratio : 3.6346 % of ideal performace : 90.87
You can then use these results to fill in the table below.
--Bob Yantosca (talk) 18:12, 19 December 2016 (UTC)
Table of 7-model-day run times
The following timing test results were done with the "out-of-the-box" GEOS-Chem v11-01 provisional release code configuration.
- All jobs used GEOS-FP meteorology at 4° x 5° resolution.
- Jobs started on model date 2013/07/01 00:00 GMT and finished on 2013/07/08 00:00 GMT.
- The code was compiled from the run directory (geosfp_4x5_gc_timing) with the the standard option make -j4 mpbuild. This sets the following compilation variables:
- MET=geosfp GRID=4x5 CHEM=benchmark UCX=y NO_REDUCED=n TRACEBACK=n BOUNDS=n FPE=n DEBUG=n NO_ISO=n NEST=n
- Wall clock times are listed from fastest to slowest, for the same number of CPUs.
- It's OK to round CPU and wall clock times to the nearest second, for clarity.
Submitter | Machine or Node and Compiler |
CPU vendor | CPU model | Speed [MHz] | # of CPUs |
CPU time | Wall time | CPU / Wall ratio |
% of ideal |
---|---|---|---|---|---|---|---|---|---|
Results from 7-model-day time tests using 32 CPUs | |||||||||
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 32 | 95693.52 s 26:34:53 |
3999.24 s 01:06:39 |
23.9279 | 74.77 |
Results from 7-model-day time tests using 28 CPUs | |||||||||
Jenny Fisher (NCI) |
r4122 ifort 16.0.3 (without -xCORE-AVX2) |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 2601.000 | 28 | 86045.92 s 23:54:06 |
3687.29 s 01:01:27 |
23.3358 | 83.34 |
Results from 7-model-day time tests using 24 CPUs | |||||||||
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 24 | 79975.84 s 22:12:55 |
4170.86 s 01:09:30 |
19.1749 | 79.9 |
Results from 7-model-day time tests using 16 CPUs | |||||||||
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 16 | 61617.87 s 17:06:57 |
4678.14 s 01:17:58 |
13.1714 | 82.32 |
Jenny Fisher (NCI) |
r1808 ifort 12.1.3 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 2601.000 | 16 | 73955.44 s 20:32:35 |
6129.04 s 01:42:09 |
12.0664 | 75.41 |
Results from 7-model-day time tests using 14 CPUs | |||||||||
Jenny Fisher (NCI) |
r3709 ifort 17.0.1 (with -xCORE-AVX2) |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 2601.000 | 14 | 49676.31 s 13:47:56 |
4386.56 s 01:13:07 |
11.3247 | 80.89 |
Jenny Fisher (NCI) |
r3713 ifort 16.0.3 (without -xCORE-AVX2) |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 2601.000 | 14 | 55131.59 s 15:18:52 |
4767.78 s 01:19:28 |
11.5634 | 82.6 |
Results from 7-model-day time tests using 12 CPUs | |||||||||
Andy Jacobson & Ken Schuldt (NOAA ESRL) |
t0442 (NOAA “theia” platform) ifort 15.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz | 2601.000 | 12 | 56505.83 s 15:41:45 |
4932.59 s 01:22:12 |
11.4556 | 95.46 |
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 12 | 51007.05 s 14:10:07 |
4984.12 s 01:23:04 |
10.2339 | 85.28 |
Jiawei Zhuang (Harvard) | r305i1n13 (Pleiades Haswell node) ifort 16.0.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2501.000 | 12 | 99915.79 s 27:45:16 |
4998.65 s 01:23:18 |
19.9886 | 83.29 |
Jiawei Zhuang (Harvard) | r629i6n1 (Pleiades Broadwell node) ifort 16.0.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 2401.000 | 12 | 54498.33 s 15:08:18 |
5203.77 s 01:26:44 |
10.4729 | 87.27 |
Bob Yantosca (GCST) |
holyjacob02.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.444 | 12 | 52928.482 s 14:33:49 |
5600.497 s 01:33:20 |
9.4507 | 78.76 |
Bob Yantosca (GCST) |
holyjacob06.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.120 | 12 | 53075.95 s 14:26:37 |
5612.76 s 01:33:33 |
9.4563 | 78.8 |
Jiawei Zhuang (Harvard) | r591i3n3 (Pleiades Haswell node) ifort 16.0.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2501.000 | 12 | 63960.11 s 17:46:00 |
6102.58 s 01:41:43 |
10.4808 | 87.34 |
Jiawei Zhuang (Harvard) | r459i3n6 (Pleiades Ivy-Bridge node) ifort 16.0.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz | 2801.000 | 12 | 64005.95 s 17:46:46 |
6166.78 s 01:42:47 |
10.3792 | 86.49 |
Bob Yantosca (GCST) |
holyjacob03.rc.fas.harvard.edu gfortran 4.8.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.097 | 12 | 47826.4 s 13:17:06 |
6205.87 s 01:43:26 |
7.7066 | 64.22 |
Jiawei Zhuang (Harvard) | r305i1n13 (Pleiades Sandy-Bridge node) ifort 16.0.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 2601.000 | 12 | 75891.12 s 21:04:51 |
7334.51 s 02:02:15 |
10.3471 | 86.23 |
Results from 7-model-day time tests using 8 CPUs | |||||||||
Jenny Fisher (NCI) |
r3866 ifort 17.0.1 (with -xCORE-AVX2) |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 2601.000 | 8 | 38550.7 s 10:42:31 |
5594.28 s 01:33:14 |
6.8911 | 86.14 |
Jenny Fisher (NCI) |
r3713 ifort 16.0.3 (without -xCORE-AVX2) |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 2601.000 | 8 | 42235.58 s 11:43:56 |
5834.34 s 01:37:14 |
7.2391 | 90.49 |
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 8 | 44324.83 s 12:18:44 |
6357.22 s 01:45:57 |
6.9724 | 87.16 |
Melissa Sulprizio (GCST) |
holyjacob05.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.177 | 8 | 50711.89 s 14:05:11 |
7673.93 s 02:07:56 |
6.6083 | 82.6 |
Bob Yantosca (GCST) |
holyjacob03.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.097 | 8 | 51955.61 s 14:25:56 |
7917.931 s 02:11:58 |
6.6188 | 82.74 |
Bob Yantosca (GCST) |
holyjacob04.rc.fas.harvard.edu gfortran 4.8.2 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.196 | 8 | 47629.5 s 13:13:50 |
8290.15 s 02:18:10 |
5.7453 | 71.82 |
Jenny Fisher (NCI) |
r52 ifort 12.1.3 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz | 2601.000 | 8 | 57825.7 s 16:03:46 |
8812.05 s 02:26:52 |
6.5621 | 82.03 |
Bob Yantosca (GCST) |
regal16.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz | 2199.915 | 8 | 62554.07 s 17:22:34 |
9355.80 s 02:35:59 |
6.6861 | 83.58 |
Results from 7-model-day time tests using 6 CPUs | |||||||||
Bob Yantosca (GCST) |
holyjacob06.rc.fas.harvard.edu | GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.120 | 6 | 43571.73 12:06:12 |
8351.52 02:19:12 |
5.2172 | 86.95 |
Results from 7-model-day time tests using 4 CPUs | |||||||||
Daniel Rothenberg (MIT) | c094 svante cluster ifort 17.0.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz | 2599.84 | 4 | 35147.97 s 09:45:47 |
9455.95 s 02:37:35 |
3.717 | 92.93 |
Bob Yantosca (GCST) |
holyjacob03.rc.fas.harvard.edu ifort 11.1 |
GenuineIntel | Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz | 2494.097 | 4 | 42667.423 s 11:51:07 |
11810.496 s 03:16:50 |
3.6346 | 90.87 |
Results from 7-model-day time tests using 2 CPUs | |||||||||
TBD | |||||||||
Results from 7-model-day time tests using 1 CPU | |||||||||
TBD |
--Bob Yantosca (talk) 00:11, 14 February 2017 (UTC)
Graph of 7-model-day run times
This plot shows displays the "wall clock" run time vs. # of CPUs for each entry in the table above.
--Bob Yantosca (talk) 18:56, 27 January 2017 (UTC)
Time spent in each GEOS-Chem operation
You can view the amount of time spent in each GEOS-Chem operation by enabling GEOS-Chem timers. Adding TIMERS=1 to your make command will tell GEOS-Chem to enable timers and print a timer summary at the end of your log file.
Comparison of ifort and gfortran
Bob Yantosca wrote:
For reference, I was able to compare how long each GC operation takes from 2 timetest runs on the Intel Xeon nodes of odyssey.rc.fas.harvard.edu. The colored lines indicate where gfortran takes significantly longer than ifort:
ifort 11 gfortran 4.8.2 12 CPUs 12 CPUs holyjacob06 holyjacob03 SLURM: 77647117 SLURM: 77631200 Timer name hh:mm:ss.SSS hh:mm:ss.SSS ---------------------------------------------------------------- GEOS-Chem : 01:33:30.437 01:43:22.906 Initialization : 00:00:07.390 00:00:04.062 Timesteps : 01:33:21.906 01:43:18.062 HEMCO : 00:17:03.859 00:15:21.625 All chemistry : 00:37:55.312 00:38:45.875 => Strat chem : 00:00:39.140 00:00:38.781 => Gas-phase chem : 00:33:06.609 00:30:21.562 => All aerosol chem : 00:03:40.921 00:07:07.343 Transport : 00:07:58.062 00:12:25.000 Convection : 00:21:48.296 00:26:19.812 Dry deposition : 00:00:25.062 00:00:30.375 Wet deposition : 00:03:41.156 00:03:38.343 Diagnostics : 00:02:23.421 00:03:51.343 Reading met fields : 00:00:06.265 00:00:08.437 Reading restart file : 00:00:00.312 00:00:00.343 Writing restart file : 00:01:13.703 00:01:32.843
I wonder if some of this comes down to either OpenMP or math optimizations. In any case, we know that gfortran is slower in general but it’s interesting to see that for some operations it’s very comparable to ifort. Note that the results above are only from 2 individual tests; we can improve the statistics by averaging the results of several tests. We will do this as time allows.
--Bob Yantosca (talk) 17:22, 22 December 2016 (UTC)
For more information
We invite you to consult the following resources for more information about GEOS-Chem's performance:
--Bob Yantosca (talk) 21:36, 15 December 2016 (UTC)