GEOS-Chem benchmarking

From Geos-chem
Jump to navigation Jump to search

Objectives

Benchmarking supports the maintenance of GEOS-Chem as a robust state-of-the-science facility with a nimble grass-roots approach and strong version control. Benchmarking has four main objectives:

  1. Document a consistent GEOS-Chem model configuration, and the expected characteristics of that configuration.
  2. Support version control through traceability, and by confirming the expected behavior of model developments submitted by the community.
  3. Track the evolution of the model over the years.
  4. Promote scientific transparency of GEOS-Chem.

Types of benchmark simulations

The GEOS-Chem Support Team (GCST) runs several types of benchmark simulations in order to assess the performance of GEOS-Chem. These are described in the following sections.

1-hour benchmarks

1-hour benchmarks primarily serve as sanity checks. They are useful in determining if two successive updates to GEOS-Chem result in identical model output. These are triggered when:

  1. A commit is pushed to any development branch2 in the geoschem/GCClassic "superproject" repository
  2. A commit is pushed to any development branch2 in the geoschem/GCHP "superproject" repository.

Evaluation tables are posted to gc-dashboard.org upon successful completion of each 1-hour benchmark simulation. The evaluation tables include information on OH metrics, emissions totals, global mass, and a summary table.

Automatic 1-hour benchmarks are only performed for the full-chemistry simulation.

1-month benchmarks

1-month benchmarks (aka alpha benchmarks) are primarily used to quantify the changes in model output that occur when adding a new science feature into GEOS-Chem. These are triggered when:

  1. An alpha tag3 is pushed to any development branch1 in the geoschem/GCClassic superproject repository
  2. An alpha tag3 is pushed to any development branch1 in the geoschem/GCHP "superproject" repository.

Evaluation plots and tables are posted to gc-dashboard.org upon successful completion of each 1-hour benchmark simulation. These include comparison plots of species concentrations, emissions, aerosol optical depth, J-Values, as well as the same tables produced for the 1-hour benchmarks.

Automatic 1-month benchmarks are only performed for the full-chemistry simulation.

1-year benchmarks

1-year benchmarks are performed before every feature version (X.Y.0) release. They are used to compare the version currently in preparation against the the previous feature version. Due to the size of the output and length of the simulation, the GCST runs 1-year benchmark simulations on the Harvard Cannon cluster.

1-year benchmarks may be run for either the [[Simulations using KPP-built mechanisms|full-chemistry simulation] or for the TransportTracers simulation. Ad-hoc 1-year benchmarks for the Carbon simulation may also be performed in order to assess scientific updates made to that particular simulation.

Full-chemistry 1-year benchmarks are performed before each feature version (X.Y.0) release. On the other hand, 1-year TransportTracers benchmarks are only performed for feature versions containing changes to transport and/or wet deposition. 1-year TransportTracers benchmarks are spun up for 10 years before the evaluation year in order to make sure the model atmosphere is in steady-state.

Benchmark output consists of similar plots and tables as in the 1=month simulation but for January, April, July, and October 2019, plus annual means.

10-year benchmarks

10-year benchmarks are performed before every major version (X.0.0) release. These benchmarks are intended to evaluate how well GEOS-Chem full-chemistry simulation is performing in the stratosphere. Oxidant fields and prod/loss rates from the 10-year benchmarks are also used as input to some GEOS-Chem specialty simulations (such as the Carbon simulation and Tagged O3 simulation).

Notes

  1. GEOS-Chem uses semantic versioning (i.e. X.Y.Z version labels).
  2. Development branches are dev/X.Y.0 and dev/no-diff-to-benchmark.
  3. An alpha tag is a Git tag using the format X.Y.Z-alpha.N, where X.Y.Z is the version number and N is a sequential index starting at 0.
    • Alpha tags indicate the locations in the Git revision history where 1-month full-chemistry benchmarks were run.
    • Alpha tags are used to link changes in `-month full chemistry benchmark simulation results to a specific update (or group of updates).

Procedure

The GEOS-Chem benchmarking procedure is described below. GEOS-Chem uses semantic versioning (i.e. X.Y.Z version labels).

  1. Any update to GEOS-Chem source code, input data, or run directories must be evaluated with a benchmark simulation.

  2. Updates to GEOS-Chem source code, input data, or run directories impacting the full-chemistry simulation are considered to be science updates.
    • Science updates are pushed to the dev/X.Y.0 branch of the geoschem/geos-chem repository.
    • Corresponding submodule hash updates are added to the dev/X.Y.0 branches of the geoschem/GCClassic and geoschem/GCHP repositories.
    • Benchmark results are automatically uploaded to gc-dashboard.org
    • The 1-hour GEOS-Chem Classic and GCHP benchmarks are examined to ensure they executed properly.

  3. Updates not impacting the full-chemistry simulation (including updates to specialty simulations) are considered to be no-diff updates.
    • No-diff updates are pushed to the dev/no-diff-to-benchmark branch of the geoschem/geos-chem repository.
    • Corresponding submodule hash updates are added to the dev/X.Y.0 branches of the geoschem/GCClassic and geoschem/GCHP repositories.
    • This triggers automatic 1-hour benchmarks for GEOS-Chem Classic and GCHP. which run on the AWS cloud.
    • Benchmark results are automatically uploaded to gc-dashboard.org
    • The 1-hour GEOS-Chem Classic and GCHP benchmarks are examined to ensure they finished without errors.
    • No-diff updates are considered to be mergeable at any time.

  4. Once it is determined that the 1-hour benchmarks for GEOS-Chem Classic and GCHP corresponding to a particular science update have executed properly, the 1-month benchmark simulations can be run.
    • An alpha tag3 (X.Y.0-alpha.N) is pushed to the dev/X.Y.0 branches of the geoschem/GCClassic and geoschem/GCHP repositories.
    • This will trigger 1-month benchmarks for GEOS-Chem Classic and GCHP.
    • Benchmark results are automatically uploaded to gc-dashboard.org
    • The GCST will note the changes in model output from each 1-hour alpha benchmark in a spreadsheet.

  5. Several alpha tags 3 are bundled into a feature version (X.Y.0).
    • Feature versions are released quarterly, roughly coinciding with GCSC meetings.
    • The last alpha tag before a planned feature version release is referred to as a release candidate.
    • No-diff updates can be merged into the next feature version or could be released into a bugfix version (X.Y.Z), as circumstances dictate.

  6. The GCST will post the links to the 1-month release candidate benchmark plots and tables on the GEOS-Chem X.Y.0 wiki page.
    • The GCST will add a benchmark assessment form to the wiki, with information about the benchmark setup and a summary of observed changes.

  7. The developer(s) and GCSC will assess the 1-month release candidate benchmark results and review the benchmark assessment form on the wiki.
    • If the update is for a specialty simulation (e.g. carbon, Hg, etc.), then a further benchmark may be conducted by the appropriate Working Group.
    • If there are no concerns about the results, the GEOS-Chem Model Scientist will approve the results.
    • A release candidate tag (X.Y.0-rc.0), which is pushed to the geoschem/GCClassic and geoschem/GCHP repositories.
    • A 1-year full-chemistry benchmark is run for release candidate X.Y.0-rc.0.
    • A 1-year TransportTracer benchmark will be run only if the transport, wet deposition, or met field inputs are impacted.
      • Due to the large amount of output produced, 1-year benchmark(s) will be run locally on a computer cluster instead of on AWS.

  8. Plots and tables from the 1-year benchmark(s) for release candidate X.Y.0-rc.0 will be added to the GEOS-Chem X.Y.0 wiki page.
    • Developers, the GCSC, the GEOS-Chem Model Scientist, and GEOS-Chem Co-Model Scientist will evaluate the benchmark results.
    • If there are any concerns about the benchmark results, the GCST will be notified and further investigation and/or benchmarking may be required.

      • This may result in one or more additional release candidates (X.Y.0-rc.N) to be considered.
      • New 1-year benchmark(s) will be prepared for approval
    • If there are no concerns about the results, the GEOS-Chem Model Scientist will approve the release candidate.
    • The GCST will proceed to release feature version (X.Y.0), and make any other preparations (e.g. updating documentation).

  9. A major version (X.Y.0) will be issued whenever a science update breaks backwards compatibility with the previous feature version.
    • Each major version will be evaluated with 1-month and [[#1-year benchmarks|1-year benchmarks as described above.
    • The major version will also be evaluated with a 10-year benchmark.

List of GEOS-Chem benchmarks

Links to past 1-month and 1-year benchmark simulations can be found on the GEOS-Chem versions wiki page.

Benchmark output archive

Output files and evaluation plots for 1-month and 1-year benchmark simulations are archived at Harvard as summarized below. GEOS-Chem users may utilize these output for comparisons against their own simulations.

Directory Description
https://gc-dashboard.org/search?searchString=&1Hr=1Hr&GCHP=GCHP&GCC=GCC Contains the following data from the 1-hour benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Run log
  • Run directory (tarball)
  • Diagnostic files (tarball)
  • Restart Files (tarball)
https://gc-dashboard.org/search?searchString=&1Mon=1Mon&GCHP=GCHP&GCC=GCC Contains the following data from the 1-month benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Run log
  • Run directory (tarball)
  • Diagnostic files (tarball)
  • Restart Files (tarball)
http://ftp.as.harvard.edu/gcgrid/geos-chem/1yr_benchmarks/ Contains the following data from the 1-year benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots
  • Restart files (tarball)
  • Model output (tarball)
  • Log files (tarball)
  • Input files (tarball)
http://ftp.as.harvard.edu/gcgrid/geos-chem/10yr_benchmarks/ Contains the following data from the 10-year benchmarks used to evaluate GEOS-Chem:
  • Evaluation plots & tables
  • Restart files (tarball)
  • Model output (tarball)
  • Log files (tarball)
  • Input files (tarball)

NOTE: "tarball" refers to a *.tar.gz file. This is an archive of files & folders created with tar cvzf and can be extracted with tar xzvf.

Benchmark plotting routines

The benchmark plotting routines are included with GCPy, a Python tool kit available for GEOS-Chem.