Difference between revisions of "GEOS-Chem performance"

From Geos-chem
Jump to: navigation, search
(Table of 7-model-day run times)
(Redirected page to Guide to GEOS-Chem performance)
 
(62 intermediate revisions by 8 users not shown)
Line 1: Line 1:
On this page we will post information about GEOS-Chem performance and timing results.
+
#REDIRECT [[Guide to GEOS-Chem performance]]
 
+
== 7-day time tests ==
+
 
+
=== Overview ===
+
 
+
The GEOS-Chem Support Team has created a timing test package that you can use to determine the performance of GEOS-Chem on your system.  The time test runs the GEOS-Chem v10-01 public release code for 7 model days with the "benchmark" chemistry mechanism.  Our experience has shown that a 7-day simulation will give a more accurate timing result than a 1-day simulation.  This is because much of the file I/O (i.e. HEMCO reading annual or monthly-mean emissions fields) occurs on the first day of a run.
+
 
+
To install the time test package on your system with:
+
 
+
  wget "ftp://ftp.as.harvard.edu/pub/exchange/bmy/gc_timing.tar.gz"
+
  tar xvzf gc_timing_tar.gz
+
 
+
To build the code, follow these steps:
+
 
+
  cd gc_timing/run.v10-01
+
  make realclean
+
  make -j4 mpbuild > log.build
+
 
+
To run the code, follow the instructions in the
+
 
+
  gc_timing/run.v10-01/README
+
 
+
file.  We have provided sample run scripts that you can use to submit jobs:
+
 
+
  gc_timing/run.v10-01/doTimeTest          # Submit job directly
+
  gc_timing/run.v10-01/doTimeTest.slurm    # Submit job using the SLURM scheduler 
+
 
+
The regular GEOS-Chem output as well as timing information will be sent to a log file named:
+
 
+
  doTimeTest.log.ID
+
 
+
where ID is either the SLURM job ID # or the process ID.  You can print out the timing results with the <tt>printTime</tt> script:
+
 
+
  cd gc_timing/run.v10-01
+
  ./printTime doTimeTest.log.ID
+
 
+
which will display results similar to this:
+
 
+
  GEOS-Chem Time Test output
+
  ====================================================================
+
  Machine or node name: : holyseas04.rc.fas.harvard.edu
+
  CPU vendor            : AuthenticAMD
+
  CPU model name        : AMD Opteron(tm) Processor 6376               
+
  CPU speed [MHz]      : 2300.078
+
  Number of CPUs used  : 8
+
  Simulation start date : 20130701 000000
+
  Simulation end date  : 20130708 000000
+
  Total CPU time  [s]  : 55287.61
+
  Wall clock time [s]  : 7999.61
+
  CPU / Wall ratio      : 6.9113
+
  % of ideal performace : 86.39
+
 
+
You can then use these results to fill in [[#Table of 7-model-day run times|the table below]].
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:06, 30 November 2015 (UTC)
+
 
+
=== Table of 7-model-day run times ===
+
 
+
The following timing test results were done with the "out-of-the-box" GEOS-Chem v10-01 public release code configuration.
+
 
+
*All jobs used [[GEOS-FP]] meteorology at 4&deg; x 5&deg; resolution. 
+
*Jobs started on model date <tt>2013/07/01 00:00 GMT</tt> and finished on <tt>2013/07/08 00:00 GMT</tt>.
+
*The code was compiled from the run directory (<tt>run.v10-01</tt>) with the the standard option <tt>make -j4 mpbuild</tt>.  This sets the following compilation variables:
+
**<tt>MET=geosfp GRID=4x5 CHEM=benchmark UCX=y NO_REDUCED=n TRACEBACK=n BOUNDS=n FPE=n DEBUG=n NO_ISO=n NEST=n</tt>
+
*Wall clock times are listed from fastest to slowest.
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="150px"|Submitter
+
!width="175px"|Machine or Node<br>and Compiler
+
!width="100px"|CPU vendor
+
!width="150px"|CPU model
+
!width="75px"|Speed [MHz]
+
!width="50px"|# of<br>CPUs
+
!width="85px"|CPU time
+
!width="85px"|Wall time
+
!width="75px|CPU / Wall<br>ratio
+
!width="75px"|% of ideal
+
 
+
|-valign="top" align="center" bgcolor="#CCFFFF"
+
!colspan="10"|Machines with Intel CPUs
+
 
+
|-valign="top"
+
|Mat Evans (York/NCAS)
+
|earth0.york.ac.uk <br> ifort Version 13.0.1.117
+
|GenuineIntel  / SGU UV-2000
+
|Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
+
|2600.153
+
|32
+
|49170.2 s<br>13:39:3-
+
|1775.27  s<br>'''00:29:34'''
+
|27.6973
+
|86.55
+
 
+
|-valign="top"
+
|Mat Evans (York/NCAS)
+
|earth0.york.ac.uk <br> ifort Version 13.0.1.117
+
|GenuineIntel  / SGU UV-2000
+
|Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
+
|2600.153
+
|64
+
|98821.79 s<br>27:27:01
+
|1841.46  s<br>'''00:30:41'''
+
|53.6649
+
|83.85
+
 
+
|-valign="top"
+
|Mat Evans (York/NCAS)
+
|earth0.york.ac.uk <br> ifort Version 13.0.1.117
+
|GenuineIntel / SGU UV-2000
+
|Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
+
|2600.153
+
|16
+
|29962.15 s<br>08:19:22
+
|2088.55  s<br>'''00:34:48'''
+
|14.3459
+
|89.66
+
 
+
|-valign="top"
+
|Mat Evans (York/NCAS)
+
|earth0.york.ac.uk  <br>ifort Version 13.0.1.117
+
|GenuineIntel / SGU UV-2000
+
|Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz
+
|2600.153
+
|8
+
|20082 s<br> 05:34:42
+
|2681 s <br>'''00:44:40'''
+
| 7.4884
+
| 93.61
+
 
+
|-valign="top"
+
|Melissa Sulprizio (GCST)
+
|fry-02.as.harvard.edu<br>ifort 11.1.069
+
|GenuineIntel
+
|Westmere E56xx/L56xx/X56xx (Nehalem-C)
+
|2925.998
+
|16
+
|35221 s<br>9:47:01
+
|2734 s<br>'''00:45:34'''
+
|12.8978
+
|80.61
+
 
+
|-valign="top"
+
|Melissa Sulprizio (GCST)
+
|fry-01.as.harvard.edu<br>ifort 11.1.069
+
|GenuineIntel
+
|Westmere E56xx/L56xx/X56xx (Nehalem-C)
+
|2925.998
+
|8
+
|23048 s<br>06:24:08
+
|3312 s<br>'''00:55:12'''
+
|6.9611
+
|87.01
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|fry-01.as.harvard.edu<br>ifort 11.1.069
+
|GenuineIntel
+
|Westmere E56xx/L56xx/X56xx (Nehalem-C)
+
|2925.998
+
|8
+
|24234 s<br>06:43:54
+
|3456 s<br>'''00:57:36'''
+
|7.0114
+
|87.64
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|fry-02.as.harvard.edu<br>ifort 11.1.069
+
|GenuineIntel
+
|Westmere E56xx/L56xx/X56xx (Nehalem-C)
+
|2925.998
+
|8
+
|25222 s<br>07:00:22
+
|3583 s<br>'''00:59:43'''
+
|7.0397
+
|88.0
+
 
+
|-valign="top"
+
|Melissa Sulprizio (GCST)
+
|regal17.rc.fas.harvard.edu<br>ifort 11.1.069
+
|GenuineIntel
+
|Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20 GHz
+
|2199.849
+
|8
+
|20398 s<br>05:39:58
+
|2837 s<br>'''00:47:17'''
+
|7.2045
+
|90.06
+
 
+
|-valign="top"
+
|Yanko Davila (CU Boulder)
+
|node39<br>ifort 11.1.069
+
|GenuineIntel
+
|Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
+
|2400.00
+
|32
+
|40000.39 s<br>11:06:40
+
|1641.58 s<br>'''00:27:22'''
+
|24.367
+
|76.15
+
 
+
|-valign="top"
+
|Yanko Davila (CU Boulder)
+
|node30<br>ifort 11.1.069
+
|GenuineIntel
+
|Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
+
|2670.00
+
|24
+
|42881.39 s<br>11:54:41
+
|2262.19 s<br>'''00:37:42'''
+
|18.9557
+
|78.98
+
 
+
|-valign="top"
+
|Jenny Fisher (U. Wollongong / NCI)
+
|r3199 (Raijin @ NCI)<br>ifort 12.1.9.293
+
|GenuineIntel
+
|Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
+
|2601.00
+
|16
+
|22150.52 s<br>06:09:11
+
|2660.78 s<br>'''00:29:41'''
+
|12.4368
+
|77.73
+
 
+
|-valign="top"
+
|Jenny Fisher (U. Wollongong / NCI)
+
|r105 (Raijin @ NCI)<br>ifort 12.1.9.293
+
|GenuineIntel
+
|Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
+
|2601.00
+
|8
+
|18535.98 s<br>05:08:56
+
|2660.78 s<br>'''00:44:21'''
+
|6.9664
+
|87.08
+
 
+
|-valign="top" align="center" bgcolor="#CCFFFF"
+
!colspan="10"|Machines with Advanced Micro Devices (AMD) CPUs
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|holyseas03.rc.fas.harvard.edu<br>ifort 11.1.069
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2300.024
+
|8
+
|32972 s<br>09:09:32
+
|5054 s<br>'''01:24:14'''
+
|6.5241
+
|81.55
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|holyseas02.rc.fas.harvard.edu<br>ifort 11.1.069
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2300.054
+
|8
+
|33722 s<br>09:22:02
+
|5281 s<br>'''01:28:01'''
+
|6.385
+
|79.81
+
 
+
|-valign="top"
+
|Melissa Sulprizio (GCST)
+
|holyseas01.rc.fas.harvard.edu<br>ifort 11.1.069
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|299.936
+
|8
+
|37379 s<br>10:22:59
+
|5477 s<br>'''01:31:17'''
+
|6.8353
+
|85.44
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|holyseas04.rc.fas.harvard.edu<br>ifort 11.1.069
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2300.078
+
|8
+
|55288 s<br>15:21:28
+
|8000 s<br>'''02:13:20'''
+
|6.9113
+
|86.39
+
 
+
|-valign="top"
+
|Bob Yantosca (GCST)
+
|holyseas02.rc.fas.harvard.edu<br>pgf90 14.10-0
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2300.054
+
|8
+
| ?
+
| ?
+
| ?
+
| ?
+
 
+
|-valign="top"
+
|Jenny Fisher (U. Wollongong)
+
|hpcn11.local<br>ifort 2015
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2299.992
+
|16
+
|50992.52 s<br>14:09:53
+
|3725.06 s<br>'''01:02:05'''
+
|13.689
+
|85.56
+
 
+
|-valign="top"
+
|Jenny Fisher (U. Wollongong)
+
|hpcn01.local<br>ifort 2015
+
|AuthenticAMD
+
|AMD Opteron(tm) Processor 6376
+
|2299.983
+
|8
+
|37536.33 s<br>10:25:36
+
|5146.54 s<br>'''01:25:47'''
+
|7.2935
+
|91.17
+
 
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 22:22, 1 December 2015 (UTC)
+
 
+
== GEOS-Chem scalability ==
+
 
+
=== Overview ===
+
 
+
'''''[mailto:heald@atmos.colostate.edu Colette Heald] wrote:'''''
+
 
+
:I was wondering if you could give me your thoughts on GEOS-Chem scalability?  I'm about to purchase some new servers, and the default would be 6 dual core servers, so 12 processors total.  I see a huge difference in my 4p vs. 8p machines, but I'm wondering if there's much advantage going beyond that to 12p. My sense from past discussions is that GC does not scale very well.
+
 
+
'''''[mailto:jhy@as.harvard.edu Jack Yatteau] replied:'''''
+
 
+
:First, if you’re getting Intel processors with hyperthreading, your 2 socket hex core system will look as if it has 24 processors under Linux.  We’re current using 2 socket quad core systems that appear to have 16 processors under Linux.  Codes run almost twice as fast up to 8 threads, and run about the same speed at 16 threads, meaning, an 8p job will run faster on the newer system than on a older 2 socket quad core system without hyperthreading at about the clock speed, but two 8p jobs running simultaneously will each run at about the same speed as they would on the older systems.  So the system appears to slow down as you add more than 8 threads.  On a hex core system, the threshold would be at 12 threads.  So you’ll have a difficult time making sense of timing tests if you use one of the newer systems unless you disable hyperthreading, in which case you might as well limit the number of threads to 12 and leave hyperthreading enabled.
+
 
+
:I measured scaling 5 years ago using a 16 processor Origin 2000 and a 12 processor Altix and you can see the results and my analysis of them in [http://acmg.seas.harvard.edu/geos/meetings/2005/ppt/Model_Overview/GEOS-CHEM_users_mtg_san2.ppt this Powerpoint presentation].
+
 
+
:Since then I’ve run tests at 4x5 resolution on dual core Opteron processors up to 16 cores and on modern Xeon systems up to 8p.  GEOS-Chem still runs about 1.5-1.6 times faster on 8 threads than on 4 threads.  In our environment, it matters more how many runs get completed.  Even if we could get a job to run 25% faster on 16 threads than on 8 threads, we’d be better off running 2 simultaneous jobs each using 8 threads.  Also, be aware that at 2x2.5 resolution GEOS-Chem doesn’t scale as well, since more time is spend doing transport, and the transport code doesn’t scale as well as the chemistry code.
+
 
+
:Finally, we’ve done very well using dual socket systems since for the past several years computers have been designed with high bandwidth to memory for pairs of processors.  Going to more than 2 sockets (e.g. 4 quad or hex core systems), the bandwidth between 2 or more pairs of processors drops, and I’d expect that to slow down SMP jobs whose threads don’t all run on the same pair.  So my recommendation would be to stick to 2 socket systems and use the savings to add more of them.  Plus, maybe I’ve convinced you that you’ll be getting a machine with 3 times the capability of an older dual quad-core system.
+
 
+
--[[User:Bmy|Bob Y.]] 13:09, 12 August 2010 (EDT)
+
 
+
=== Benchmarking results from MIT user group ===
+
 
+
'''''[mailto:heald@mit.edu Colette Heald] wrote:'''''
+
 
+
:I have been benchmarking GEOS-Chem on my new system here at MIT and I thought you might be interested in seeing the results for the scaling.  This is with a dual hex-core Xeon 3.07 GHz chip & 48 Gb RAM from Thinkmate.
+
 
+
::[[Image:mit_gc_benchmark.png]]
+
 
+
'''''[mailto:jhy@seas.harvard.edu Jack Yatteau] replied:'''''
+
 
+
:Note that there is no difference between 12 and 24.  It’s not just scaling.  With 12 real cores, jobs run faster than on old non-hyperthreaded cores at the same clock speed, but once you start relying on hyperthreading (>12) you don’t gain speed, even if the job scales.  But you could run two 12-core jobs at about the speed of the older processors, which is what we do when the cluster is busy.
+
 
+
'''''[mailto:heald@mit.edu Colette Heald] wrote:'''''
+
 
+
:Yup, hyperthreading doesn't appear to buy me anything.  But I did test submitting 2 12-core jobs to the same machine and the run time dropped from 36 min to 58 min.  I suppose not quite a doubling, but didn't seem like a worthwhile experiment on my system.
+
 
+
--[[User:Bmy|Bob Y.]] 12:00, 17 April 2012 (EDT)
+
 
+
=== Benchmarking results from University of Liege user group ===
+
 
+
'''''[mailto:emahieu@seas.harvard.edu Manu Mahieu] wrote:'''''
+
 
+
:[[GEOS-Chem v9-01-03]] is now installed and running at ULg.  I have performed a few benchmark runs on our server involving from 4 up to 32 CPUs. [http://acmg.seas.harvard.edu/geos/wiki_docs/machines/Timing_report_speedy_ULg.pdf  This PDF document] provides information about our server, the OS and compiler used as well as the running times for the various configurations tested up to now. By far, the best performance was obtained when submitting the GC simulation to all available CPUs (i.e. 32).
+
 
+
--[[User:Bmy|Bob Y.]] 11:44, 19 June 2013 (EDT)
+
 
+
=== Benchmarking results from University of York user group ===
+
 
+
[mailto:mat.evans@york.ac.uk Mat Evans] and his group at the University of York have done an analysis of how [[GEOS-Chem v9-02]] performs when compared to the previous version, [[GEOS-Chem v9-01-03]].  Please follow [[GEOS-Chem v9-02#Performance|this post on our ''GEOS-Chem'' v9-02 wiki page]] to view the results.
+
 
+
--[[User:Bmy|Bob Y.]] 10:57, 19 November 2013 (EST)
+
 
+
== Timing information from older GEOS-Chem versions ==
+
 
+
<span style="color:red">'''''The following information is mostly out-of-date.  We shall keep it here for future reference.'''''</span>
+
 
+
=== Adding additional tracers ===
+
 
+
'''''[mailto:ccarouge@seas.harvard.edu Claire Carouge] wrote:'''''
+
 
+
:I ran an ensemble of 5 identical runs for 43 and 54 tracers with [[GEOS-Chem v8-02-04]], compiled with IFORT 11.1.069 with GEOS-5 and in 4x5 resolution.  For each set of tracers, I've run with everything turned on and then again with the chemistry turned off.
+
 
+
:Here are the times (simulation length: 4 days).
+
 
+
<blockquote>
+
{| border=1 cellpadding=5 cellspacing=0
+
|-bgcolor="#CCCCCC"
+
!# of tracers
+
!Avg total time (s)
+
!Avg chemistry time (s)
+
!Avg transport time (s)
+
|-align="center"
+
|54 (SOA chemistry)
+
|760.27
+
|457.93           
+
|302.34
+
|-align="center"
+
|43 (no SOA chemistry)
+
|709.92
+
|427.55
+
|282.37
+
|-align="center"
+
|Diff 54-43 tracers
+
| +50.35
+
| +30.38
+
| +19.97
+
|}
+
</blockquote>
+
 
+
:So it gives an increase of 7% for transport, chemistry and total time when adding 11 tracers. The additional time for transport is then not linear, but a linear estimate (1% additional time per additional tracer) can give a high estimate of the additional time.
+
 
+
:The additional time in chemistry is very dependent on the type of tracer you add (aerosol, gas tracer with modifications to globchem.dat....) so the 7% increase in time is probably very particular to the SOA tracers.
+
 
+
--[[User:Bmy|Bob Y.]] 10:51, 15 April 2010 (EDT)
+
 
+
=== Intel Fortran Compiler ===
+
 
+
Please see the following links for some timing comparisons between the various versions of the Intel Fortran Compiler (aka "IFORT" compiler):
+
 
+
* [[Intel Fortran Compiler#Timing results: IFORT 11 vs. IFORT 10|IFORT 11 vs. IFORT 10]]
+
* [[Intel Fortran Compiler#Comparison between IFORT 9.1 and IFORT 10.1|IFORT 10 vs. IFORT 9]]
+
 
+
--[[User:Bmy|Bob Y.]] 10:51, 15 April 2010 (EDT)
+
 
+
=== Timing results from 1-month benchmarks ===
+
 
+
Please see our [[GEOS-Chem supported platforms and compilers]] page for a user-submitted list of timing results from GEOS-Chem 1-month benchmark simulation.  Several platform/compiler combinations are listed on this page.
+
 
+
--[[User:Bmy|Bob Y.]] 10:57, 15 April 2010 (EDT)
+

Latest revision as of 19:47, 25 June 2019