Difference between revisions of "Downloading data with the GEOS-Chem dry-run option"

From Geos-chem
Jump to: navigation, search
Line 30: Line 30:
== Overview ==
== Overview ==
The GEOS-Chem dry-run option will be available in [[GEOS-Chem 12#12.7.0|GEOS-Chem 12.7.0]] and later versions.  This version should be released by mid-to-late January 2020 (pending successful benchmarking).
The GEOS-Chem dry-run option is available in [[GEOS-Chem 12#12.7.0|GEOS-Chem 12.7.0]] and later versions.   
=== What is a GEOS-Chem dry-run? ===
=== What is a GEOS-Chem dry-run? ===

Revision as of 15:54, 23 November 2020

Previous | Next | Getting Started with GEOS-Chem

  1. Minimum system requirements
  2. Installing required software
  3. Configuring your computational environment
  4. Downloading source code
  5. Downloading data directories
  6. Creating run directories
  7. Configuring runs
  8. Compiling
  9. Running
  10. Output files
  11. Visualizing and processing output
  12. Coding and debugging
  13. Further reading

Featured tutorial videos

The following tutorials are posted at our GEOS-Chem Youtube channel: youtube.geos-chem.org:

Dryrun thumbnail.png
Click HERE to view!!


The GEOS-Chem dry-run option is available in GEOS-Chem 12.7.0 and later versions.

What is a GEOS-Chem dry-run?

A "dry-run" is a is a GEOS-Chem "Classic" simulation that steps through time, but does not perform computations or read data files from disk. Instead, a dry-run simulation prints a list of all data files that a regular GEOS-Chem simulation would have read. The dry-run output also denotes whether each data file was found on disk, or if it is missing.

Why should I perform a GEOS-Chem dry-run?

A GEOS-Chem dry-run is a good way for you to check if you have properly configured your computational environment. This is especially important if you are porting GEOS-Chem to run on a new computer system, or on the AWS cloud.

For example:

Problem encountered in dry-run What this can indicate
GEOS-Chem "Classic" does not compile
  • Your system does not have a supported compiler
  • Your system is missing one or more required software libraries
  • You have invalid settings in your Unix startup scripts (e.g. .bashrc, etc.)
GEOS-Chem "Classic" compiles but does not run
  • Your GEOS-Chem configuration files (input.geos, HEMCO_Config.rc, etc.) contain errors
  • You have invalid settings in your Unix startup scripts (e.g. .bashrc, etc.)

Once you have can successfully finished a GEOS-Chem dry-run, you can have confidence that GEOS-Chem can run on your system.

More importantly, output from the GEOS-Chem dry-run simulation can be used to download required met field and emissions data for a GEOS-Chem simulation (see next section).

What can I do with the output of a GEOS-Chem dry-run?

When you run GEOS-Chem in dry-run mode, you must pipe the output to a log file (as you would do for any other GEOS-Chem simulation). The log file containing dry-run output looks similar to a regular GEOS-Chem log file, but will also contain text such as:

HEMCO: Opening /path/to/ExtData/HEMCO/EDGARv43/v2016-11/EDGAR_v43.BC.POW.0.1x0.1.nc
HEMCO: REQUIRED FILE NOT FOUND /path/to/ExtData/HEMCO/EDGARv43/v2016-11/EDGAR_v43.BC.POW.0.1x0.1.nc

NOTE: /path/to/ExtData denotes the full pathname of the ExtData folder. This is the root of the directory tree containing all GEOS-Chem met fields and emissions data. This will of course be different on each system.

This text lets you know if GEOS-Chem was able to find each input file on disk or not. This information can be parsed by a Python script (download_data.py, which is included in each run directory) to produce:

  1. A unique list of required data files (with all duplicates removed). This can be useful for documentation purposes.
  2. A bash script that will download all MISSING data files from one of the GEOS-Chem data repositories:
    • Compute Canada repository
    • Amazon Web Services s3://gcgrid repository

In the following sections, we will describe the commands that are needed to execute a complete GEOS-chem dry-run workflow.

Which GEOS-Chem simulations can execute in dry-run mode?

Any of the supported GEOS-Chem "Classic" simulations (e.g. standard, benchmark, complexSOA, CH4, TransportTracers, etc.) can be executed in dry-run mode.

As of this writing (Jan 2020), GCHP cannot execute in dry-run mode. Dry-run functionality may be added in the future, but this will require modifications to the NASA MAPL software library. Because GCHP uses the same data files as GEOS-Chem "Classic", you can use a GEOS-Chem "Classic" dry-run to facilitate downloading met fields and emissions data for a GCHP simulation.

--Bob Yantosca (talk) 17:00, 6 January 2020 (UTC)

Executing GEOS-Chem in dry-run mode

Follow these steps to perform a GEOS-Chem dry-run:

  1. Create a run directory for the type of GEOS-Chem simulation that you wish to perform (e.g. geosfp_4x5_standard, merra2_2x25_tropchem, etc.. For detailed instructions, please see our Creating run directories chapter.

  2. Change to the run directory (e.g. cd geosfp_4x5_standard, etc.) and compile GEOS-Chem as you normally would. For detailed instructions, please see our Compiling chapter.

  3. Check to see if these settings in the SIMULATION MENU and the GRID MENU of the input.geos file are as you expect:


  4. Make sure that the ROOT and METDIR settings in HEMCO_Config.rc use the same root data directory as is specified in input.geos:


  5. Make sure to select the emission inventories and extensions that you wish to use for your simulation:

    You can reduce the amount of data that needs to be downloaded for your GEOS-Chem simulation by turning off inventories that you don't need.

  6. Run GEOS-Chem with the --dryrun command-line argument, and pipe the output to a log file. You can type either:

    ./geos --dryrun > log.dryrun
    which will send the ouptut to the log.dryrun file, or:
    ./geos --dryrun | tee log.dryrun
    which will not only will pipe the output to log.dryrun, but will also show the output on the screen.

    The log.dryrun file will look somewhat like a regular GEOS-Chem log file but will also contain a list of data files and whether each file was found on disk or not.

    Also note, you can use whatever name you like for the dry-run output log file (we prefer log.dryrun).

--Bob Yantosca (talk) 17:04, 6 January 2020 (UTC)

Downloading data from dry-run output

Once you have successfully executed a GEOS-Chem dry-run (see previous section), you can use the output from the dry-run (contained in the log.dryrun file) to download the data files that GEOS-Chem will need to perform the corresponding "production" simulation. Follow one of these three options:

Downloading data from Compute Canada

If you are using GEOS-Chem on your institutional computer cluster, we recommend that you download data from the Compute Canada data repository (http://geoschemdata.computecanada.ca). Change to your GEOS-Chem run directory where you executed the dry-run and type:

./download_data.py log.dryrun --cc

The download_data.py Python program is included in each GEOS-Chem run directory that you create for GEOS-Chem 12.7.0 and later versions. It uses base Python 3 packages, and does not necessarily have to run within a Conda environment. This download_data.py program creates and executes a temporary bash script containing the appropriate wget commands to download the data files. (We have found that this is the fastest method.)

The download_data.py program will also generate a log of unique data files (i.e. with all duplicate listings removed), which looks similar to this:

!!! Start Date       : 20160701 000000
!!! End Date         : 20160701 010000
!!! Simulation       : standard
!!! Meteorology      : GEOSFP
!!! Grid Resolution  : 4.0x5.0
./GEOSChem.Restart.20160701_0000z.nc4 --> /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/GEOSCHEM_RESTARTS/v2018-11/initial_GEOSChem_rst.4x5_standard.nc
... etc ...

This name of this "unique" log file will be the same as the log file with dryrun ouptut, with .unique appended. In our above example, we passed log.dryrun to download_data.py, so the "unique" log file will be named log.dryrun.unique. This "unique" log file can be very useful for documentation purposes.

Downloading data from AWS s3://gcgrid

If you are running GEOS-Chem on the Amazon Web Services cloud, you can quickly download the necessary data for your GEOS-Chem simulation from the s3://gcgrid bucket to the Elastic Block Storage (EBS) volume attached to your cloud instance. Change to your GEOS-Chem run directory and type:

./download_data.py log.dryrun --aws

This will start the data download process using the aws s3 cp commands, which should execute much more quickly than if you were to download the data from Compute Canada. It will also produce the log of unique data files as described in the previous section.

NOTE: Copying data from s3://gcgrid to the EBS volume of an AWS cloud instance is always free. But if you download data from s3://gcgrid to your own local computer cluster, you will incur an egress fee (~ $90/TB). Use with caution!

Create the log of unique data files without downloading data

If you wish to only produce the log of unique data files (as described above) without downloading any data, then type the following command from within your GEOS-Chem run directory:

./download_data.py log.dryrun --skip-download

or for short:

./download_data.py log.dryrun --skip

This can be useful if you already have the necessary data downloaded to your system but wish to create the log of unique files for documentation purposes (such as for benchmark simulations, etc.)

--Bob Yantosca (talk) 18:05, 6 January 2020 (UTC)

Further reading

  1. Using GEOS-Chem on the AWS cloud (cloud.geos-chem.org)

Previous | Next | Getting Started with GEOS-Chem