Difference between revisions of "GEOS-Chem benchmarking"

From Geos-chem
Jump to: navigation, search
(1-month benchmarks)
(1-year benchmarks)
(3 intermediate revisions by the same user not shown)
Line 18: Line 18:
 
=== 1-hour benchmarks ===
 
=== 1-hour benchmarks ===
  
1-hour benchmarks primarily serve as '''sanity checks'''. They are useful in determining if two successive updates to GEOS-Chem result in identical model output. 1-hour benchmarks are triggered when:
+
1-hour benchmarks primarily serve as '''sanity checks'''. They are useful in determining if two successive updates to GEOS-Chem result in identical model output. These are triggered when:
  
 
#A commit is pushed to any development branch[[#Types of benchmark simulations|<sup>2</sup>]] in the [https://github.com/geoschem/GCClassic <tt>geoschem/GCClassic</tt> "superproject" repository]
 
#A commit is pushed to any development branch[[#Types of benchmark simulations|<sup>2</sup>]] in the [https://github.com/geoschem/GCClassic <tt>geoschem/GCClassic</tt> "superproject" repository]
Line 29: Line 29:
 
=== 1-month benchmarks ===
 
=== 1-month benchmarks ===
  
1-month benchmarks (aka '''alpha benchmarks''') are primarily used to quantify the changes in model output that occur when adding a new science feature into GEOS-Chem.  These benchmarks are triggered when:
+
1-month benchmarks (aka '''alpha benchmarks''') are primarily used to quantify the changes in model output that occur when adding a new science feature into GEOS-Chem.  These are triggered when:
  
 
#An alpha tag[[#Types of benchmark simulations|<sup>3</sup>]] is pushed to any development branch[[#Notes|<sup>1</sup>]] in the [https://github.com/geoschem/GCClassic <tt>geoschem/GCClassic</tt> superproject repository]
 
#An alpha tag[[#Types of benchmark simulations|<sup>3</sup>]] is pushed to any development branch[[#Notes|<sup>1</sup>]] in the [https://github.com/geoschem/GCClassic <tt>geoschem/GCClassic</tt> superproject repository]
Line 40: Line 40:
 
=== 1-year benchmarks ===
 
=== 1-year benchmarks ===
  
1-year benchmarks are performed before every '''feature version (<tt>X.Y.0</tt>)'''.  They are used to compare the version currently in preparation against the the previous feature version.   
+
1-year benchmarks are performed before every '''feature version (<tt>X.Y.0</tt>)''' release.  They are used to compare the version currently in preparation against the the previous feature version.  Due to the size of the output and length of the simulation, the GCST runs 1-year benchmark simulations on the Harvard Cannon cluster.
  
1-year benchmarks are submitted manually to the Harvard Cannon computational cluster. Due to the size of the output and the length of simulation, it is not feasible to run these on the AWS cloud.
+
1-year benchmarks may be run for either the full-chemistry simulation or for the TransportTracers simulation. Full-chemistry 1-year benchmars are done for every new feature version '''feature version (<tt>X.Y.0</tt>)'''. On the other hand, 1-year TransportTracers benchmarks are only performed for feature versions containing changes to transport and/or wet deposition.  1-year TransportTracers benchmarks are spun up for 10 years before the evaluation year in order to make sure the model atmosphere is in steady-state.
  
1-year benchmarks may be run for either the full-chemistry simulation or for the TransportTracers simulation. The TransportTracers
+
Ad-hoc 1-year benchmarks for the [[Carbon simulation]] may also be performed in order to assess scientific updates made to that particular simulation.
benchmark can be useful in determining changes to transport and wet deposition with respect to the prior feature version.
+
 
 +
Benchmark output consists of similar plots and tables as in the 1=month simulation but for January, April, July, and October 2019, plus an annual mean.
 +
 
 +
=== 10-year benchmarks ===
 +
 
 +
10-year benchmarks are performed before every '''major version (<tt>X.0.0</tt>)''' release.  These benchmarks are intended to evaluate how well GEOS-Chem full-chemistry simulation is performing in the stratosphere.   Oxidant fields and prod/loss rates from the 10-year benchmarks are often used by other GEOS-Chem specialty simulations.
  
 
== Procedure ==
 
== Procedure ==

Revision as of 19:31, 22 May 2024

Objectives

Benchmarking supports the maintenance of GEOS-Chem as a robust state-of-the-science facility with a nimble grass-roots approach and strong version control. Benchmarking has four main objectives:

  1. Document a consistent GEOS-Chem model configuration, and the expected characteristics of that configuration.
  2. Support version control through traceability, and by confirming the expected behavior of model developments submitted by the community.
  3. Track the evolution of the model over the years.
  4. Promote scientific transparency of GEOS-Chem.

Types of benchmark simulations

The GEOS-Chem Support Team performs the following benchmark simulations.

NOTES for the sections below:

  1. GEOS-Chem uses semantic versioning (i.e. X.Y.Z version labels).
  2. Development branches are dev/X.Y.Z and dev/no-diff-to-benchmark.
  3. An alpha tag is a Git tag using the format X.Y.Z-alpha.N, where X.Y.Z is the version number and N is a sequential index starting at 0. These are used to indicate the locations in the Git revision history where 1-month benchmarks were run.

1-hour benchmarks

1-hour benchmarks primarily serve as sanity checks. They are useful in determining if two successive updates to GEOS-Chem result in identical model output. These are triggered when:

  1. A commit is pushed to any development branch2 in the geoschem/GCClassic "superproject" repository
  2. A commit is pushed to any development branch2 in the geoschem/GCHP "superproject" repository.

Evaluation tables are posted to gc-dashboard.org upon successful completion of each 1-hour benchmark simulation. The evaluation tables include information on OH metrics, emissions totals, global mass, and a summary table.

Automatic 1-hour benchmarks are only performed for the full-chemistry simulation.

1-month benchmarks

1-month benchmarks (aka alpha benchmarks) are primarily used to quantify the changes in model output that occur when adding a new science feature into GEOS-Chem. These are triggered when:

  1. An alpha tag3 is pushed to any development branch1 in the geoschem/GCClassic superproject repository
  2. An alpha tag3 is pushed to any development branch1 in the geoschem/GCHP "superproject" repository.

Evaluation plots and tables are posted to gc-dashboard.org upon successful completion of each 1-hour benchmark simulation. These include comparison plots of species concentrations, emissions, aerosol optical depth, J-Values, as well as the same tables produced for the 1-hour benchmarks.

Automatic 1-month benchmarks are only performed for the full-chemistry simulation.

1-year benchmarks

1-year benchmarks are performed before every feature version (X.Y.0) release. They are used to compare the version currently in preparation against the the previous feature version. Due to the size of the output and length of the simulation, the GCST runs 1-year benchmark simulations on the Harvard Cannon cluster.

1-year benchmarks may be run for either the full-chemistry simulation or for the TransportTracers simulation. Full-chemistry 1-year benchmars are done for every new feature version feature version (X.Y.0). On the other hand, 1-year TransportTracers benchmarks are only performed for feature versions containing changes to transport and/or wet deposition. 1-year TransportTracers benchmarks are spun up for 10 years before the evaluation year in order to make sure the model atmosphere is in steady-state.

Ad-hoc 1-year benchmarks for the Carbon simulation may also be performed in order to assess scientific updates made to that particular simulation.

Benchmark output consists of similar plots and tables as in the 1=month simulation but for January, April, July, and October 2019, plus an annual mean.

10-year benchmarks

10-year benchmarks are performed before every major version (X.0.0) release. These benchmarks are intended to evaluate how well GEOS-Chem full-chemistry simulation is performing in the stratosphere. Oxidant fields and prod/loss rates from the 10-year benchmarks are often used by other GEOS-Chem specialty simulations.

Procedure

The GEOS-Chem benchmarking procedure is described below. GEOS-Chem uses semantic versioning (i.e. X.Y.Z version labels).

  1. Any update to the GEOS-Chem source code or run directories will change the GEOS-Chem version number (X.Y.Z).
  2. Z versions will be released at intervals determined by the GEOS-Chem Support Team (GCST) and may include bug fixes or updates that do not impact the full-chemistry simulation.
  3. Any change impacting the standard full-chemistry simulation will require a Y version change and a dedicated 1-month benchmark. The benchmark results will be posted on the wiki and an email will be sent to the developer(s) and the GEOS-Chem Steering Committee (GCSC).
  4. The developer(s) and GCSC will assess the benchmark results and review a benchmark assessment form on the wiki. If there are any concerns about the benchmark results, the GCST will be notified and further investigation and/or benchmarking may be required.
  5. If the update is for a specialty simulation (e.g. CO2, CH4, Hg), then a further benchmark may be conducted by the appropriate Working Group.
  6. Once the developer is satisfied with the changes in the 1-month benchmark, GEOS-Chem Model Scientist Daniel Jacob will review the results and approve the new internal version.
  7. 1-year full-chemistry and/or transport tracer benchmarks for Y versions will be conducted only if justifiably requested by the developer or by GEOS-Chem Steering Committee members.
  8. Each new major version release (i.e. X version) will be subject to a 1-year benchmark to be inspected by the GEOS-Chem Steering Committee before approval.

List of GEOS-Chem benchmarks

Links to past 1-month and 1-year benchmark simulations can be found on the GEOS-Chem versions wiki page.

Benchmark output archive

Output files and evaluation plots for 1-month and 1-year benchmark simulations are archived at Harvard as summarized below. GEOS-Chem users may utilize these output for comparisons against their own simulations.

Directory Description
https://gc-dashboard.org/search?searchString=&1Hr=1Hr&GCHP=GCHP&GCC=GCC Contains the following data from the 1-hour benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Run log
  • Run directory (tarball)
  • Diagnostic files (tarball)
  • Restart Files (tarball)
https://gc-dashboard.org/search?searchString=&1Mon=1Mon&GCHP=GCHP&GCC=GCC Contains the following data from the 1-month benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Run log
  • Run directory (tarball)
  • Diagnostic files (tarball)
  • Restart Files (tarball)
http://ftp.as.harvard.edu/gcgrid/geos-chem/1yr_benchmarks/ Contains the following data from the 1-year benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots
  • Restart files (tarball)
  • Model output (tarball)
  • Log files (tarball)
  • Input files (tarball)
http://ftp.as.harvard.edu/gcgrid/geos-chem/10yr_benchmarks/ Contains the following data from the 10-year benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Restart files (tarball)
  • Model output (tarball)
  • Log files (tarball)
  • Input files (tarball)

NOTE: "tarball" refers to a *.tar.gz file. This is an archive of files & folders created with tar cvzf and can be extracted with tar xzvf.

Benchmark plotting routines

The benchmark plotting routines are included with GCPy, a Python took kit available for GEOS-Chem.