Setting up the ExtData directory

From Geos-chem
Jump to: navigation, search

On this page we describe the top-level root directory for all GEOS-Chem data files, which is called ExtData.


Data directory structure prior to v10-01

Emissions and meteorological data for GEOS-Chem is arranged into a directory tree. In versions of GEOS-Chem prior to v10-01, you would specify the directory where GEOS-Chem could find the emissions and meteorological field files for the given resolution that you were using by setting this line in your input.geos file:

Root data directory     : /dir/to/data/GEOS_4x5

In this case the root-level directory, or top of the directory tree, is /dir/to/data/ and the specific data directory for the 4° x 5° resolution data is GEOS_4x5/.

NOTE: The example root-level directory /dir/to/data will vary from system to system. For example, on the Harvard data servers, the root-level directory is /as/data/geos. If you are not sure what the root-level directory is on your disk server, then ask your sysadmin or IT staff.

This would indicate you were running GEOS-Chem at 4° x 5° resolution. GEOS-Chem would then look for the 4° x 5° meteorological field data via these data paths (all of which are subfolders of the root data directory):

/dir/to/data/GEOS_4x5/GEOS_FP/YYYY/MM       # 4 x 5 GEOS-FP met data
/dir/to/data/GEOS_4x5/MERRA/YYYY/MM         # 4 x 5 MERRA met data
/dir/to/data/GEOS_4x5/GEOS_5/YYYY/MM        # 4 x 5 GEOS-5 met data
/dir/to/data/GEOS_4x5/GEOS_4_v4/YYYY/MM     # 4 x 5 GEOS-4 met data
/dir/to/data/GCAP_4x5/AGRID/YYYY/MM         # 4 x 5 GCAP met data

If you wanted to run GEOS-Chem at 2° x 2.5° resolution, you would use this setting for the root data directory:

Root data directory     : /dir/to/data/GEOS_2x2.5/

and GEOS-Chem would read the 2° x 2.5° meteorological data via these paths:

/dir/to/data/GEOS_2x2.5/GEOS_FP/YYYY/MM     # 2 x 2.5 GEOS-FP met data
/dir/to/data/GEOS_2x2.5/MERRA/YYYY/MM       # 2 x 2.5 MERRA met data
/dir/to/data/GEOS_2x2.5/GEOS_5/YYYY/MM      # 2 x 2.5 GEOS-5 met data
/dir/to/data/GEOS_2x2.5/GEOS_4_v4/YYYY/MM   # 2 x 2.5 GEOS-4 met data

etc. for other horizontal resolutions.

In addition to specifying the root data directory, you would also have to specify the GEOS_NATIVE directory (formerly known as the GEOS_1x1 directory). This directory, which was parallel to the root data directory, contains emissions and other data files at 1° x 1° or finer horizontal resolution. You would specify the GEOS_NATIVE directory by setting this line in input.geos

Dir w/ 1x1 emissions etc: /dir/to/data/GEOS_NATIVE/

GEOS-Chem would store this value in the Input_Opt%DATA_DIR_1x1 variable.

In addition to the root data directory and GEOS_NATIVE directory, you had to specify a couple more data directories. These contained OH concentrations for the offline GEOS-Chem specialty simulations, and the O3 prod/loss data used by the tagged O3 simulation.

Dir w/ archived OH files: /dir/to/data/GEOS_MEAN/OHmerge/v5-07-08/

Dir w/ O3 P/L rate files: /dir/to/data/GEOS_MEAN/O3_PROD_LOSS/2003.v6-01-05/

So in summary, these data directories were arranged as:

/dir/to/data/                               # Top of the GEOS-Chem directory tree
/dir/to/data/GEOS_4x5/                      # Directory for 4 x 5 data 
/dir/to/data/GEOS_2x2.5/                    # Directory for 2 x 2.5 data
... etc for other horizontal resolutions ...    
/dir/to/data/GEOS_NATIVE/                   # 
/dir/to/data/GEOS_MEAN/                     # OH & O3 concentrations

--Bob Y. 15:49, 6 April 2015 (EDT)

ExtData: a new top-level directory tree

Several code updates made in GEOS-Chem v10-01 and later versions required a change in the data directory naming structure:

  • The HEMCO emissions component makes it possible to read emissions inventories and other relevant data sets at much higher resolution than 4° x 5° or 2° x 2.5°.
  • In addition, HEMCO has the capability to regrid data from its native resolution to the resolution of your GEOS-Chem simulation. We no longer need to store separate copies of emissions data on multiple grids.
  • Under these circumstances, referring to a top-level directory named GEOS_4x5 or GEOS_2x2.5 can lead to confusion.
  • When running GEOS-Chem in the ESMF/MAPL environment, accepted practice is to read data from a directory tree where all of the data folders are subdirectories of a folder named ExtData.

For all of these reasons, we have decided to restructure the GEOS-Chem data directory tree. Starting with GEOS-Chem v10-01, all of the GEOS-Chem data directories will be subdirectories of the ExtData directory. As explained in the next section, you can make symbolic links from ExtData to the existing GEOS-Chem data directories.

Note: Using wget to download directories within ExtData will not work for symbolically linked directories such as those in the ExtData/CHEM_INPUTS/ folder. When downloading data for the first time, you will need to use wget with non-symbolically linked data directories only. The locations of these directories for ExtData/CHEM_INPUTS/ are listed below.

Creating the ExtData directory structure

Follow these instructions to create the ExtData directory tree on your system. You will need to have write permission in your root data directory. If you don't have this permission, then ask your sysadmin or IT staff for assistance. Note that if you are downloading data for the first time

1. Change to the root-level data directory. For our example, we will call this /dir/to/data.

  > cd /dir/to/data

2. Get a listing of all the subdirectories in /dir/to/data:

  > ls -1 


Your actual listing will differ, depending on the data you have stored on your disk server. NOTE: In the above example, / denotes directories, and @ denotes symbolic links.

NOTE: If none of these data directories have been previously downloaded to your disk server, then you will have to download them from one of the GEOS-Chem data archives. See our Downloading GEOS-Chem source code and data wiki page for more instructions.

3. Cut-and-paste the directory output from Step 2 to a text editor. You'll need to use this again in a couple of steps.

4. Create the /dir/to/data/ExtData subdirectory and switch to it.

  > mkdir ExtData
  > cd ExtData
  > pwd

5. Create a symbolic link from ExtData to each directory in the listing that you saved from Step 2.

  > ln -s ../GEOS_0.25x0.3125_CH
  > ln -s ../GEOS_0.25x0.3125_NA
  > ln -s ../GEOS_0.25x0.3125_NA.d
  > ln -s ../GEOS_0.5x0.666_CH
  > ln -s ../GEOS_0.5x0.666_CH.d
  > ln -s ../GEOS_0.5x0.666_NA
  > ln -s ../GEOS_0.5x0.666_NA.d
  > ln -s ../GEOS_2x2.5
  > ln -s ../GEOS_2x2.5.d
  > ln -s ../GEOS_4x5
  > ln -s ../GEOS_4x5.d
  > ln -s ../GEOS_MEAN 
  > ln -s ../GEOS_NATIVE           .

6. Create the subdirectory ExtData/CHEM_INPUTS and switch to it. This directory will hold various input files needed for various chemistry modules.

  > mkdir CHEM_INPUTS
  > pwd

7. If you already have the GEOS_NATIVE data directory on your system, create symbolic links from ExtData/CHEM_INPUTS to the following directories in ../GEOS_NATIVE:

  ln -s ../GEOS_NATIVE/FastJ_201204
  ln -s ../GEOS_NATIVE/Linoz_200910
  ln -s ../GEOS_NATIVE/MODIS_LAI_201204 
  ln -s ../GEOS_NATIVE/Olson_Land_Map_201203
  ln -s ../GEOS_NATIVE/TOMAS_201402
  ln -s ../GEOS_NATIVE/UCX_201403

8. If you are downloading data for the first time and do not have the GEOS_NATIVE directory on your system, use wget to retrieve the following directories in ExtData/CHEM_INPUTS/.

     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""
     > wget -r -nH --cut-dirs=3 ""

8a. If you plan to use the RRTMG radiative transfer model option in GEOS-Chem, then download the following data directories, which are new to ExtData/CHEM_INPUTS:

     > wget -r -nH --cut-dirs=3 "" 
     > wget -r -nH --cut-dirs=3 ""

8b. If you use GEOS-4 or GCAP met fields, then also copy this data directory to ExtData/CHEM_INPUTS:

     > wget -r -nH --cut-dirs=3 ""
This data directory contains netCDF versions of the annual mean tropopause data files used for the GEOS-4 and GCAP simulations. If you do not use these met fields, then you can ignore this step.

9. Switch back to the ExtData directory:

  > cd ../ExtData
  > pwd

10. Download the HEMCO data directories into ExtData. You can do this with our 'hemco_data_download package, which can be obtained via Git.

10a: Obtain the hemco_data_download package by following these directions.
10b. Add the path /dir/to/data/ExtData/HEMCO to the hemcoDataDownload.rc configuration file, as shown below. See the text in RED.
     #                                                                             #
     #  Specify the remote and local HEMCO data paths, plus other options.         #
     #                                                                             #

     Remote HEMCO data path |
     Your HEMCO data path   | /dir/to/data/ExtData/HEMCO
     Verbose output         | yes
     Dryrun only?           | no
10c. Look at this list of emission inventories and data sets that you can use with HEMCO. Decide which of these you would like to download. Modify the hemcoDataDownload.rc configuration file according to these instructions.
10d. Once you have set up your configuration file, follow these instructions to download the HEMCO data directories to /dir/to/data/ExtData.

--Bob Y. 17:29, 6 April 2015 (EDT)

Data directories

Here is a list of the data directories in the ExtData structure. NOTE: Your listing may differ, depending on which met field data sets you have stored on your disk server.

Directory Description
CHEM_INPUTS Non-emissions data for GEOS-Chem chemistry modules, which cannot be read via HEMCO:

The following data directories are ONLY required if you are using the RRTMG radiative transfer model in GEOS-Chem:

  • RRTMG_201411/
    • Directory containing climatological N2O and CH4 profiles for input into RRTMG.
  • modis_surf_201210/
    • Directory containing surface albedo & emissivity for input into RRTMG.
GEOS_0.25x0.3125_CH and
Symbolic links to directories that store data on the GEOS-FP 0.25° x 0.3125° China nested grid:
  • ../GEOS_0.25x0.3125_CH
  • ../GEOS_0.25x0.3125_CH.d
GEOS_0.25x0.3125_EU and
GEOS_0.25x0.3125 EU.d
Symbolic links to directories that store data on the 0.25° x 0.3125° Europe nested grid:
  • ../GEOS_0.25_x_0.3125_EU
  • ../GEOS_0.25_x_0.3125_EU.d
GEOS_0.25x0.3125_NA and
GEOS_0.25x0.3125 NA.d
Symbolic links to directories that store data on the GEOS-FP 0.25° x 0.3125° North America nested grid:
  • ../GEOS_0.25_x_0.3125_NA
  • ../GEOS_0.25_x_0.3125_NA.d
GEOS_2x2.5 and GEOS_2x2.5.d Symbolic links to directories that store data on the GEOS-Chem 2° x 2.5° grid:
  • ../GEOS_2x2.5
  • ../GEOS_2x2.5.d
GEOS_4x5 and
Symbolic links to directories that store data on the GEOS-Chem 4° x 5° degree grid.
  • ../GEOS_4x5
  • ../GEOS_4x5.d
GEOS_NATIVE Symbolic link to the ../GEOS_NATIVE directory
GEOS_MEAN Symbolic link to the ../GEOS_MEAN directory
HEMCO Emissions inventories and other data sets for use with HEMCO

NOTE: Directories ending in .d (such as GEOS_4x5.d) contain only met field data. These are reachable by symbolic links from the corrresponding directories not ending in .d. For example, GEOS_4x5/GEOS_FP/ links to GEOS_4x5.d/GEOS_FP. The historical reason why this was done was to separate met field data (which can be several GB or TB in size) from other non-met field data files (e.g. emissions), in order to facilitate disk management. Please see this wiki post for more information.

--Bob Y. 11:25, 10 April 2015 (EDT)

Setting directories in input.geos

Please add the following settings in the SIMULATION MENU section of your input.geos file:

  Root data directory     : /dir/to/data/ExtData/
   => GEOS-FP subdir      : GEOS_FP/YYYY/MM/
   => MERRA-2   subdir    : MERRA2/YYYY/MM/

The Root data directory should be set to /dir/to/data/ExtData. Our example text /dir/to/data refers to the root-level data directory on your system.

GEOS-Chem will prefix the proper resolution-specific directory to the GEOS-FP and MERRA-2 subdirectories. For example, if you compile GEOS-Chem with

  make GRID=4x5 ...etc...

then you should see the following output in the log file:

G E O S - C H E M   U S E R   I N P U T

READ_INPUT_FILE: Reading input.geos 

Start time of run           : 20160701 000000
End time of run             : 20160801 000000
Run directory               : ./
Data Directory              : /dir/to/data/ExtData/
CHEM_INPUTS directory       : /dir/to/data/ExtData/CHEM_INPUTS/
Resolution-specific dir     : GEOS_4x5/
GEOS-FP    sub-directory    : GEOS_4x5/GEOS_FP/YYYY/MM/
MERRA-2    sub-directory    : GEOS_4x5/MERRA2/YYYY/MM/
... etc ...

GEOS-Chem will also automatically create a variable (Input_Opt%CHEM_INPUTS) that points to the ExtData/CHEM_INPUTS directory. You can use this instead of Input_Opt%DATA_DIR_1x1 where necessary.

--Bob Yantosca (talk) 19:10, 16 May 2018 (UTC)