Downloading data with the GEOS-Chem dry-run option
- Minimum system requirements
- Configuring your computational environment
- Downloading source code
- Downloading data directories
- Creating run directories
- Configuring runs
- Output files
- Visualizing and processing output
- Coding and debugging
- Further reading
- 1 Overview
- 2 Executing GEOS-Chem in dry-run mode
- 3 Downloading data from dry-run output
- 4 Further reading
The GEOS-Chem dry-run option will be available in GEOS-Chem 12.7.0 and later versions. This version should be released by mid-to-late January 2020 (pending successful benchmarking).
What is a GEOS-Chem dry-run?
A "dry-run" is a is a GEOS-Chem "Classic" simulation that steps through time, but does not perform computations or read data files from disk. Instead, a dry-run simulation prints a list of all data files that a regular GEOS-Chem simulation would have read. The dry-run output also denotes whether each data file was found on disk, or if it is missing.
Why should I perform a GEOS-Chem dry-run?
A GEOS-Chem dry-run is a good way for you to check if you have properly configured your computational environment. This is especially important if you are porting GEOS-Chem to run on a new computer system, or on the AWS cloud.
|Problem encountered in dry-run||What this can indicate|
|GEOS-Chem "Classic" does not compile||
|GEOS-Chem "Classic" compiles but does not run||
Once you have can successfully finished a GEOS-Chem dry-run, you can have confidence that GEOS-Chem can run on your system.
More importantly, output from the GEOS-Chem dry-run simulation can be used to download required met field and emissions data for a GEOS-Chem simulation (see next section).
What can I do with the output of a GEOS-Chem dry-run?
When you run GEOS-Chem in dry-run mode, you must pipe the output to a log file (as you would do for any other GEOS-Chem simulation). The log file containing dry-run output looks similar to a regular GEOS-Chem log file, but will also contain text such as:
... HEMCO: Opening /path/to/ExtData/HEMCO/EDGARv43/v2016-11/EDGAR_v43.BC.POW.0.1x0.1.nc ... HEMCO: REQUIRED FILE NOT FOUND /path/to/ExtData/HEMCO/EDGARv43/v2016-11/EDGAR_v43.BC.POW.0.1x0.1.nc ...
NOTE: /path/to/ExtData denotes the full pathname of the ExtData folder. This is the root of the directory tree containing all GEOS-Chem met fields and emissions data. This will of course be different on each system.
This text lets you know if GEOS-Chem was able to find each input file on disk or not. This information can be parsed by a Python script (download_data.py, which is included in each run directory) to produce:
- A unique list of required data files (with all duplicates removed). This can be useful for documentation purposes.
- A bash script that will download all MISSING data files from one of the GEOS-Chem data repositories:
- Compute Canada repository
- Amazon Web Services s3://gcgrid repository
In the following sections, we will describe the commands that are needed to execute a complete GEOS-chem dry-run workflow.
Which GEOS-Chem simulations can execute in dry-run mode?
Any of the supported GEOS-Chem "Classic" simulations (e.g. standard, benchmark, complexSOA, CH4, TransportTracers, etc.) can be executed in dry-run mode.
As of this writing (Jan 2020), GCHP cannot execute in dry-run mode. Dry-run functionality may be added in the future, but this will require modifications to the NASA MAPL software library. Because GCHP uses the same data files as GEOS-Chem "Classic", you can use a GEOS-Chem "Classic" dry-run to facilitate downloading met fields and emissions data for a GCHP simulation.
Executing GEOS-Chem in dry-run mode
Follow these steps to perform a GEOS-Chem dry-run:
Create a run directory for the type of GEOS-Chem simulation that you wish to perform (e.g. geosfp_4x5_standard, merra2_2x25_tropchem, etc.. For detailed instructions, please see our our Creating run directories chapter.
Change to the run directory (e.g. cd geosfp_4x5_standard, etc.) and compile GEOS-Chem as you normally would. For detailed instructions, please see our our Compiling chapter.
Make sure that the ROOT and METDIR settings in HEMCO_Config.rc use the same root data directory as is specified in input.geos:
Make sure to select the emission inventories and extensions that you wish to use for your simulation:
Run GEOS-Chem with the --dryrun command-line argument, and pipe the output to a log file. You can type either:
./geos --dryrun > log.dryrunwhich will send the ouptut to the log.dryrun file, or:
./geos --dryrun tee log.dryrunwhich will not only will pipe the output to log.dryrun, but will also show the output on the screen.
The log.dryrun file will look somewhat like a regular GEOS-Chem log file but will also contain a list of files and whether each file was found on disk or not.
Also note, you can use whatever name you like for the dry-run output log file (we prefer log.dryrun).
Downloading data from dry-run output
Once you have successfully executed a GEOS-Chem dry-run (see previous section), you can use the output from the dry-run (contained in the log.dryrun file) to download the data files that GEOS-Chem will need to perform the corresponding "production" simulation. Follow one of these three options:
Downloading data from Compute Canada
If you are using GEOS-Chem on your institutional computer cluster, we recommend that you download data from the Compute Canada data repository (http://geoschemdata.computecanada.ca). From the GEOS-Chem run directory, run the Python program download_data.py as follows:
./download_data.py log.dryrun --cc
The download_data.py is included in each GEOS-Chem run directory that you create (for GEOS-Chem 12.7.0 and later versions). It uses base Python 3 packages, and does not need to be run from within a Conda environment. This Python program creates and executes a temporary bash script containing the appropriate wget commands to download the data files. (We have found that this is the fastest method.)
The download_data.py program will also generate a log of unique data files (i.e. with all duplicate listings removed), which looks similar to this:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! LIST OF (UNIQUE) FILES REQUIRED FOR THE SIMULATION !!! Start Date : 20160701 000000 !!! End Date : 20160701 010000 !!! Simulation : standard !!! Meteorology : GEOSFP !!! Grid Resolution : 4.0x5.0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ./GEOSChem.Restart.20160701_0000z.nc4 --> /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/GEOSCHEM_RESTARTS/v2018-11/initial_GEOSChem_rst.4x5_standard.nc ./HEMCO_Config.rc ./HEMCO_Diagn.rc ./HEMCO_restart.201607010000.nc ./HISTORY.rc ./input.geos /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/FJX_j2j.dat /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/FJX_spec.dat /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/dust.dat /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/h2so4.dat /n/holylfs/EXTERNAL_REPOS/GEOS-CHEM/gcgrid/data/ExtData/CHEM_INPUTS/FAST_JX/v2019-10/jv_spec_mie.dat ... etc ...
This name of this "unique" log file will be the same as the log file with dryrun ouptut, with .unique appended. In our above example, we passed log.dryrun to download_data.py, so the "unique" log file will be named log.dryrun.unique. This "unique" log file can be very useful for documentation purposes.
Downloading data from AWS s3://gcgrid
If you are running GEOS-Chem on the Amazon Web Services cloud, you can quickly download the necessary data for your GEOS-Chem simulation from the s3://gcgrid bucket to the Elastic Block Storage (EBS) volume attached to your cloud instance. Change to your GEOS-Chem run directory and type:
./download_data.py log.dryrun --aws
This will start the data download process using the aws s3 cp commands, which should execute much more quickly than if you were to download the data from Compute Canada. It will also produce the log of unique data files as described in the previous section.
NOTE: Copying data from s3://gcgrid to the EBS volume of an AWS cloud instance is always free. But if you download data from s3://gcgrid to your own local computer cluster, you will incur an egress fee (~ $90/TB). Use with caution!
Skip downloading to produce the list of unique files