Performing Difference Tests with GEOS-Chem

From Geos-chem
Jump to: navigation, search

On this page we show you how to use the GEOS-Chem Unit Tester to generate and use difference tests to evaluate modifications to GEOS-Chem.

Overview

A difference test validates two versions of GEOS-Chem. It compares model output from a version of GEOS-Chem in which you have made updates (aka the development code, or "Dev") against a version of GEOS-Chem with known behavior (aka the reference code, or "Ref").

Typically you can only expect a difference test to pass if you expect the Dev code run to produce results identical to the Ref run code. This will be true if Dev differs from Ref only by structural changes (i.e. modifications in how data gets passed from one place to another, replacing old Fortran code with more modern equivalent code). If Dev contains scientific changes (new chemistry reactions, addition of tracers, new photolysis reactions, etc.), then you can still do a difference but the DiffTest will not pass. However, there are additional tools to explore the differences that may be useful for validation.

DiffTests are not meant to replace benchmarking. The GEOS-Chem Support Team will continue to rely on the 1-month and 1-year benchmarks to validate the code.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Downloading the GEOS-Chem Unit Tester

First, make sure that your system has these software packages installed. (Most of these come standard with your Unix-based operating system.)

Next, clone the GEOS-Chem Unit Tester package with the command:

git clone https://bitbucket.org/gcst/geos-chem-unittest UT

This will create a copy of the GEOS-Chem Unit Tester package in a directory named UT for short.

NOTE: The Git clone process may take a few minutes to complete depending on your connection speed.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Editing the DiffTest.input file

Once you have downloaded the GEOS-Chem Unit Tester to your disk space, switch to the perl/ directory:

cd UT/perl

In this directory there is a Perl script named gcCreateDiffTest that you will use to generate fresh copies of GEOS-Chem difference test run directories. This script uses an input file named DiffTest.input, which is also located in the perl directory.

Your DiffTest.input file will look something like this:

#------------------------------------------------------------------------------
#                  GEOS-Chem Global Chemical Transport Model                  !
#------------------------------------------------------------------------------
#BOP
#
# !DESCRIPTION: Input file that specifies configuration for creating a
#  DiffTest directory from the UnitTester. 
#\\
#\\
# !REMARKS:
#  For a complete description of how to customize the settings in the
#  INPUTS and RUNS sections, see the following wiki posts:
#
#   wiki.geos-chem.org/Creating_GEOS-Chem_run_directories#Section_1:_INPUTS
#   wiki.geos-chem.org/Creating_GEOS-Chem_run_directories#Section_2:_RUNS
#
# !REVISION HISTORY: 
#  01 Oct 2015 - E. Lundgren - Initial version, based on CopyRunDirs.input
#EOP
#------------------------------------------------------------------------------
#
# !INPUTS:
#
# %%% ID tags %%%
#
   VERSION        : v11-01
   DESCRIPTION    : Create run directory from UnitTest
#
# %%% Data path and HEMCO settings %%%
#
   DATA_ROOT      : /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData
   HEMCO_ROOT     : {DATAROOT}/HEMCO
   VERBOSE        : 0
   WARNINGS       : 1
#
# %%% Unit tester path names %%%
#
   UNIT_TEST_ROOT : {HOME}/UT
   RUN_ROOT       : {UTROOT}/runs
   RUN_DIR        : {RUNROOT}/{RUNDIR}
   PERL_DIR       : {UTROOT}/perl
#
# %%% Reference (Ref) and Development (Dev) codes %%%
#
   CODE_REF       : {HOME}/GC/Code.Ref
   CODE_DEV       : {HOME}/GC/Code.Dev
#
# %%% Target directory and copy command %%%
#
   COPY_PATH      : {HOME}/GC/DiffTest/v11-01
   COPY_CMD       : cp -rfL
#
# !RUNS:
#  Specify the runs directories that you want to make DiffTests out of.
#  Here we provide a few examples, but you may copy additional entries from
#  UnitTest.input and modify the dates as needed. You can deactivate
#  certain directories by commenting them out with "#".
#
#--------|-----------|------|------------|------------|------------|---------|
# MET    | GRID      | NEST | SIMULATION | START DATE | END DATE   | EXTRA?  |
#--------|-----------|------|------------|------------|------------|---------|
   geosfp   4x5         -      standard     2013070100   2013070101   -
#  geosfp   4x5         -      tropchem     2013070100   2013070101   -
#  geosfp   4x5         -      soa          2013070100   2013070101   -
#  geosfp   4x5         -      soa_svpoa    2013070100   2013070101   -
#  geosfp   4x5         -      aciduptake   2013070100   2013070101   -
#  geosfp   4x5         -      UCX          2013070100   2013070101   -
#  geosfp   4x5         -      RRTMG        2013070100   2013070101   -
#  geosfp   4x5         -      RnPbBe       2013070100   2013070101   -
#  geosfp   4x5         -      Hg           2013010100   2013010101   -
#  geosfp   4x5         -      POPs         2013070100   2013070101   -
#  geosfp   4x5         -      TOMAS40      2013070100   2013070101   -
#  geosfp   4x5         -      CH4          2013070100   2013070101   -
#  geosfp   4x5         -      tagO3        2013070100   2013070101   -
#  geosfp   4x5         -      tagCO        2013070100   2013070101   -
#  geosfp   2x25        -      CO2          2013070100   2013070101   -
#  geosfp   4x5         -      aerosol      2013070100   2013070101   - 
#  geosfp   025x03125   ch     tropchem     2013070100   201307010010 -
#  geosfp   025x03125   na     tropchem     2013070100   201307010010 -
!END OF RUNS:
#EOP
#------------------------------------------------------------------------------

NOTE: Lines starting with a # character will be treated as comments.

The DiffTest.input file has a layout that is very similar to the CopyRunDirs.input file located in the same directory. The only difference is the addition of CODE_REF and CODE_DEV lines in the INPUTS section. Those lines can be used to specify the paths of the reference (Ref) and development (Dev) codes that you would like to compare.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Generating a GEOS-Chem DiffTest Directory

Once you have edited the DiffTest.input script to your liking, you can use that to generate fresh copies of GEOS-Chem DiffTest directories. Make sure you are in the perl directory, and then type:

./gcCreateDiffTest

If you do not pass a file name to gcCreateDiffTest, then the gcCreateDiffTest script will use the DiffTest.input file that you just modified.

Executing gcCreateDiffTest will create a new GEOS-Chem DiffTest directory corresponding to each entry that you specified in the input file RUNS section. Each run directory will be created as a subdirectory of COPY_PATH that you specified in the input file INPUTS section.

Let's examine the contents of a sample DiffTest_geosfp_4x5_standard run directory. Issue the following commands:

cd ~/GC/rundirs/geosfp_4x5_fullchem  # Change to geosfp_4x5_fullchem run dir 
  
make fileclean                       # Remove any files left over from previous unit test runs

ls -1                                # Get directory listing

And you will see this directory listing:

ctm_summarizediff.pro
Dev/
locateDiagDiffs.sh
logs/
Makefile
plots/
README
Ref/
summarizeDiagDiffs.sh

These files and subdirectories are described below in more detail.

Name Description
Subdirectories
Dev/ Run directory containing input files for the simulation using the development (Dev) code.
Ref/ Run directory containing input files for the simulation using the reference (Ref) code. This directory contains mostly symbolic links to the Dev/ directory. If you have updated any of the input files in your Dev run, you will need to remove the symbolic link and manually place a copy of that input file without your modifications in this directory .
logs/ Directory where the log files for the Dev and Ref simulations are stored. If you run a complete difference test, a file with the results will be saved here as well.
plots/ Directory containing files used to create plots comparing the Dev and Ref simulations. The IDL routine plot_diff.pro is set up to create species concentration maps, difference maps, ratio maps, zonal difference plots, and zonal concentration plots. That routines uses the input file PlotDiffs.input, which is set up to compare output in the Dev/ and Ref/ run directories.
Files
README Text file containing a description of the DiffTest directory and instructions for running difference tests.
Makefile Makefile used to drive the entire difference test. For more information, use the command make help.
summarizeDiagDiffs.sh Bash script used to compare binary punch output files using IDL routine ctm_summarizediff.pro. Output is printed to the screen and saved to a file in the logs/ directory.
ctm_summarizediff.pro IDL script used to locate datablocks that differ in two binary punch files. This routine is called from summarizeDiagDiffs.sh.
locateDiagDiffs.sh Bash script used to compare binary punch output files using ctm_locatediff.pro. Output is printed to the screen and saved to a file in the logs/ directory. We recommend using summarizeDiff.sh first for an overview the differences because the differences returned by this script may be numerous, resulting in a very large log file.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Setup instructions

In the top-level Makefile of the DiffTest directory, check the following lines:

# Reference code directory
ifndef CODE_REF
 CODE_REF   :=$(HOME)/GC/Code.Ref
endif

# Development code directory
ifndef CODE_DEV
 CODE_DEV   :=$(HOME)/GC/Code.Dev
endif

Where:

  • CODE_REF specifies the GEOS-Chem source code directory for the Ref code.
  • CODE_DEV specifies the GEOS-Chem source code directory for the Dev code.

These lines are set according to CODE_REF and CODE_DEV in the DiffTest.input file used to create your DiffTest directory. You can change the paths manually here if needed.

In many cases, the Dev and Ref codes will only differ from each other by a few commits. To set up a Ref code directory you can make a copy the Dev code directory, and then open a new Git branch at a commit at the appropriate point in the past. (See this wiki post for instructions on how to revert to an older state of the code with Git.)

Finally, make sure that the input.geos file in the Dev/ directory has all of the proper settings for your run. The Ref/ directory links to this the input.geos file to ensure that your run settings are the same for both Dev and Ref.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Running difference tests

From top-level DiffTest directory type:

make superclean
make logclean

The make superclean command will do a make realclean in both the Dev and Ref code directories and remove all output files from the Dev/ and Ref/ subdirectories. Log files from DiffTests are stored in the logs/ directory and are removed using make logclean. NOTE: You don't have to do make superclean before each difference test but it is a good idea to do it periodically.

After cleaning the DiffTest directory, type the following to compile and run the Dev and Ref versions of GEOS-Chem and perform a difference test:

make -j4 difftest

NOTE: ISORROPIA has been known to produce numerical noise in GEOS-Chem model output. In the DiffTests, ISORROPIA is turned off by default for diff testing (NO_ISO=y). You can add debug flags (BOUNDS=y DEBUG=y FPE=y ...) if you wish.

The DiffTest Makefile has several other options for compiling and running Dev and Ref. Options include (other compile options are omitted for clarity):

make -j4 refonly      # run/compile Ref
make -j4 devonly      # run/compile Dev
make -j4 devlib       # compile Dev
make -j4 devrun       # run Dev
make -j4 devcheck     # run/compile Dev and compare to Ref (previously run)
make -j4 check        # compare Dev against Ref (both previously run)

You can also compile and run Ref and Dev code with the Totalview debugger using the commands below. The tvsp option refers to single-processor mode while tvmp refers to multi-processor mode.

make -j4 DEBUG=y TRACEBACK=y FPE=y BOUNDS=y tvsp_ref 
make -j4 DEBUG=y TRACEBACK=y FPE=y BOUNDS=y tvmp_ref  
make -j4 DEBUG=y TRACEBACK=y FPE=y BOUNDS=y tvsp_dev
make -j4 DEBUG=y TRACEBACK=y FPE=y BOUNDS=y tvmp_dev

There are several additional clean options in the Makefile that allow you to target what you would like to clean up. These include:

make refclean         # calls make realclean on Ref code only
make devclean         # calls make realclean on Dev code only
make realclean        # calls make reaclean on both Ref and Dev code
make fileclean        # removes all output files from Ref/ and Dev/
make logclean         # removes all log files from logs/
make superclean       # calls make realclean and make fileclean (keeps logs)

To find out information about your run, you can print information to the screen using:

make printruninforef  # print Ref run information
make printruninfodev  # print Dev run information
make printbuildinfo   # print common compile settings for Ref and Dev
make printallinforef  # print run and compile information for Ref
make printallinfodev  # print run and compile information for Dev

Finally, to see a summary of make options outlined above, type:

make help

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Examining the results

The results of the difference test will be sent to a file in the logs/ directory called log.{met}_{grid}_{sim}.Results. The Dev code passed the difference test if all Ref and Dev files are identical, as indicated by validation output that lists IDENTICAL for sizes, checksums, and diffs for the GEOS-Chem diagnostic files, GEOS-Chem restart files, and HEMCO restart file.

If the restart files (GEOSChem_restart.YYYYMMDDhhmm.nc.*) are identical, but the diagnostic files (trac_avg.YYYYMMDDhhmm*) differ, then there is a problem in the diagnostic output that needs to be further addressed.

You can explore differences in output files by using the summarizediff.sh and locatediff.sh shell scripts located in the top-level directory. These scripts use IDL routines to compare diagnostic and restart output files. We recommend using summarizediff.sh prior to using locatediff.sh to view a summary of differences. If differences are extensive then the output of locateDiff.sh will be very large.

--Melissa Sulprizio (talk) 17:47, 12 January 2017 (UTC)

Create plots of the results

The plots/ directory containing files used to create plots comparing the Dev and Ref simulations. The IDL routine plot_diff.pro is set up to create species concentration maps, difference maps, ratio maps, zonal difference plots, and zonal concentration plots. That routines uses the input file PlotDiffs.input, which is set up to compare output in the Dev/ and Ref/ run directories. To create the plots, open IDL and type

IDL> plot_diffs, /dyn

This will generate PDF files comparing concentrations in Dev and Ref for each species in your simulation.

--Melissa Sulprizio (talk) 17:45, 22 February 2017 (UTC)