Difference between revisions of "Downloading GEOS-Chem data directories"

From Geos-chem
Jump to: navigation, search
m (Describe --cut=X option for wget)
m
Line 352: Line 352:
 
  -nH  Will store all directories and subdirectories in DIRECTORY_NAME, not ftp.as.harvard.edu/DIRECTORY_NAME
 
  -nH  Will store all directories and subdirectories in DIRECTORY_NAME, not ftp.as.harvard.edu/DIRECTORY_NAME
  
If you wish to trim the name of the downloaded directory (i.e., so it downloads as DIRECTORY_NAME, not pub/geos-chem/data/DIRECTORY_NAME), then use the <tt>--cut</tt> option:
+
If you wish to trim the name of the downloaded directory (i.e., so it downloads as DIRECTORY_NAME, not pub/geos-chem/data/DIRECTORY_NAME), then use the <tt>--cut-dirs</tt> option:
 
   
 
   
 
  <nowiki>wget -r -nH --cut-dirs=X "ftp://ftp.as.harvard.edu/pub/geos-chem/data/DIRECTORY_NAME/"</nowiki>
 
  <nowiki>wget -r -nH --cut-dirs=X "ftp://ftp.as.harvard.edu/pub/geos-chem/data/DIRECTORY_NAME/"</nowiki>
Line 389: Line 389:
  
 
  <nowiki>wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5.d/GEOS_5/2008/" &</nowiki>
 
  <nowiki>wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5.d/GEOS_5/2008/" &</nowiki>
 +
  
 
5. Download the <tt>lightning_NOx_201101</tt> folder to an existing local data directory <tt>/data/gc/GEOS_4x5</tt> using the <tt>--cut</tt> option:
 
5. Download the <tt>lightning_NOx_201101</tt> folder to an existing local data directory <tt>/data/gc/GEOS_4x5</tt> using the <tt>--cut</tt> option:

Revision as of 22:37, 16 February 2011

This page describes where you can obtain the GEOS-Chem source code and required data files.

What you need to download before you can run GEOS-Chem

When setting up GEOS–Chem on your system, you will need install the following components:

  1. The GEOS–Chem source code directory. This is the directory where the Fortran-90 source code files (i.e. *.f, *.f90 files) and Makefiles reside. Your Fortran compiler will create an executable from these source code files.
  2. A GEOS–Chem run directory. Here is where you will run the compiled GEOS–Chem executable. Each run directory contains:
    1. Various input files that you can modify in order to select different options for your GEOS–Chem simulation
    2. Files that define the GEOS–Chem's chemical and photolysis mechanisms
    3. "Restart files" that hold the initial conditions for your GEOS–Chem simulation
  3. The GEOS–Chem shared data directories. This is the directory tree where the following types of data are stored:
    1. Meteorological data (a.k.a. the "met fields) used to drive GEOS–Chem
    2. Emissions inventories for GEOS–Chem
    3. Scale factors to used to scale emissions from a base year to a given year
    4. Oxidant (OH, O3) concentrations for both full-chemistry and offline simulations
    5. IPCC future scenarios (for GCAP simulatons)
    6. Other GEOS–Chem specific data files.

The GEOS–Chem source code and run directories are small enough to download directly to your own disk space in your Unix account. You can download these with the Git version control software.

On the other hand, the GEOS–Chem shared data directories contain many large files that probably cannot all fit into your own personal disk quota. Therefore, you (or your sysadmin) should download the shared data directories to a common disk space where all GEOS–Chem users in your group can access them. The volume of data contained in the shared data directories precludes using Git; you must instead download these files via FTP, wget, or similar file transfer programs.

For complete downloading instructions, please see:

  1. GEOS-Chem User's Guide: Chapter 2.2, Downloading the GEOS-Chem source code
  2. GEOS-Chem User's Guide: Chapter 2.3, Downloading the GEOS-Chem run directories
  3. GEOS-Chem User's Guide: Chapter 2.4, Downloading the GEOS-Chem shared data directories

For more information on the files contained in each of these directories, please see:

  1. GEOS-Chem User's Guide: Chapter 3, Compiling the GEOS-Chem source code
  2. GEOS-Chem User's Guide: Chapter 4, The GEOS-Chem shared data directories
  3. GEOS-Chem User's Guide: Chapter 5, The GEOS-Chem run directories

--Bob Y. 11:14, 10 February 2011 (EST)

TARBALL downloads are no longer supported

In the past, the GEOS–Chem source code and run directories were distributed to the user community as a series of TARBALL (i.e. *.tar.gz) files via anonymous FTP. The advantage of this method was that one would only have to download a single file. However, as the number of GEOS–Chem users (and submitted source code modifications) grew, this method became unwieldy. For example, if only a single file needed to be updated, the entire TARBALL file would have to be regenerated. This often became a source of confusion and error.

Given the large number of user code submissions, robust source code management techniques must be employed in order to ensure the integrity of the GEOS–Chem code. Therefore, the GEOS–Chem Support Team has selected the Git version control software for managing the GEOS–Chem source code and run directories.

--Bob Y. 11:14, 10 February 2011 (EST)

Data Directory Access

The GEOS–Chem shared data directories contain the various met fields, emissions, and other data that GEOS–Chem will read in during the course of a simulation. You must download the shared data directories via FTP or a similar utility (e.g. wget, FireFTP, SecureFX, etc.) The large volume of data makes it impossible to track this directory structure with Git.

If there are several GEOS–Chem users in your research group, then chances are someone (perhaps your sysadmin) has already downloaded the GEOS–Chem shared directory structure to a common disk space on your servers. Nevertheless, you should still check to make sure that the most recent subdirectories are present. See Chapter 4 for complete details about the contents of the GEOS–Chem shared data directories.

There are currently two data archives from which you may download the GEOS–Chem shared data directories:

  1. Harvard archive (ftp.as.harvard.edu)
  2. Dalhousie archive (rain.ucis.dal.ca)

The Dalhousie archive is not as comprehensive as the Harvard archive. However, the Dalhousie archive stores data files for the various nested grids and global 1° x 1.25° grids that are not available on the Harvard archive.

You may access either of these archives via the following methods:

  1. wget (Recommended)
  2. Anonymous FTP

Primary download site

The primary GEOS-Chem data download site is located at:

ftp ftp.as.harvard.edu
cd pub/geos-chem

We recommend that you use the wget utility to download these directories instead of anonymous FTP. Wget allows you to download multiple directories at once.

Downtime

The Harvard FTP archive will be unavailable during the weekly maintenance period every Monday between 7-10 AM ET. Please plan your data downloads accordingly.

Directory structure

The pub/geos-chem directory is further divided into the following subdirectories:

1month_plots/
1yr_benchmarks/
NRT/
NRT-ARCTAS/
NRT_archive/
beta_releases/
dao/
data/
downloads/
mean_OH/
patches/
public_releases/

The data/ subdirectory is the most important. This is the root under which the GEOS–Chem shared data directories reside. This is where you will find the GEOS-Chem met fields and emissions data.

Here is a quick look at the contents of these subdirectories of pub/geos-chem/

Data Directories under pub/geos-chem/ Description
data/ Root Data Directory
data/aerosol_optics/ Contains files which specify the aerosol optical properties for the FAST-J photolysis mechanism.
data/GEOS_MEAN/ Contains P(O3), L(O3) and mean OH data for offline simulations.
0.5° x 0.666° Data Directories Description
data/GEOS_0.5x0.666_CH/ Emissions etc. files for the China/SE Asia 0.5° x 0.666° nested-grid simulation
data/GEOS_0.5x0.666_CH/GEOS_5/YYYY/MM/ GEOS-5 met data for the China/SE Asia 0.5° x 0.666° nested grid simulation
data/GEOS_0.5x0.666_NA/ Emissions etc. files for the North American 0.5° x 0.666° nested grid simulation
data/GEOS_0.5x0.666_NA/GEOS_5/YYYY/MM/ GEOS-5 met data for the North American 0.5° x 0.666° nested grid simulation
1° x 1° Data Directories Description
data/GEOS_1x1/ 1° x 1° emissions etc. data files for use with GEOS-Chem global simulations
2° x 2.5° Data Directories Description
data/GEOS_2x2.5/ Emissions etc. data for GEOS-chem 2° x 2.5° global simulations
data/GEOS_2x2.5/GEOS_3/YYYY/MM/ GEOS-3 met data for 2° x 2.5° global simulations
data/GEOS_2x2.5/GEOS_4_v4/YYYY/MM/ GEOS-4 met data for 2° x 2.5° global simulations (late-look)
data/GEOS_2x2.5/GEOS_5/YYYY/MM GEOS-5 met data for 2° x 2.5° global simulations
4° x 5° Data Directories Description
data/GEOS_4x5/ Emissions etc. data for GEOS-chem 4° x 5° global simulations
data/GEOS_4x5/GEOS_3/YYYY/MM/ GEOS-3 met data for 4° x 5° global simulations
data/GEOS_4x5/GEOS_4_v4/YYYY/MM/ GEOS-4 met data for 4° x 5° global simulations (late-look)
data/GEOS_4x5/GEOS_5/YYYY/MM/ GEOS-5 met data for 4° x 5° global simulations
data/GEOS_4x5/MERRA/YYYY/MM/ MERRA met data for 4° x 5° global simulations
Other Subdirectories under pub/geos-chem/ Description
1yr_benchmarks/ Contains the following types of data from the 1-year benchmarks used to evaluate GEOS-Chem.
  • Restart files
  • Model output (bpch and netCDF formats)
  • Log files
  • Input files
  • Evaluation plots
dao/ Internal use only
mean_OH/ Directory containing 3-D mean OH fields archived from previous GEOS-Chem simulations.

Please view the catalog of met data at the Harvard archive to determine if the data period you wish to download is available.

--Bob Y. 15:39, 10 February 2011 (EST)

Obsolete data directories

The following data directories contain iles that are now considered obsolete.

Obsolete directories under pub/geos-chem/ Description
1month_plots/ Formerly used to store plots from 1-month benchmark simulations. We now use Git to manage the benchmark run directories.
/beta_releases Contains TARBALL files with source code and run directories for GEOS-Chem beta versions prior to v8-03-01. We now use Git to manage the GEOS-Chem source code & run directories.
data/GEOS_1x1_CH/ Emissions etc. data for the China/SE Asia 1° x 1° simulation. This simulation has been superseded by the GEOS-5 0.5° x 0.666° nested simulation.
data/GEOS_1x1_CH/GEOS_3/YYYY/MM/ GEOS-3 met data for the China/SE Asia 1° x 1° nested grid simulation This simulation has been superseded by the GEOS-5 0.5° x 0.666° nested simulation.
data/GEOS_1x1_NA/ Emissions etc. data for the North American 1° x 1° nested grid simulation. This simulation has been superseded by the GEOS-5 0.5° x 0.666° nested simulation.
data/GEOS_1x1_NA/GEOS_3/YYYY/MM/ GEOS-3 met data for the North American 1° x 1° nested grid simulation. This simulation has been superseded by the GEOS-5 0.5° x 0.666° nested simulation.
data/GEOS_2x2.5/GEOS_4_flk/YYYY/MM/ GEOS-4 met data for 2° x 2.5° global simulations (1st-look). These data were only used for the ITCT/2k2 campaing. They have been replaced by the data in GEOS_4_v4.
data/GEOS_4x5/GEOS_4_flk/YYYY/MM/ GEOS-4 met data for 4° x 5° global simulations (1st-look). These data were only used for the ITCT/2k2 campaing. They have been replaced by the data in GEOS_4_v4.
downloads/ Formerly used to contain the following code packages:
  • Code to read HDF4 and HDF4-EOS data files
  • Code to read HDF5 and HDF5-EOS data files
  • Code to read netCDF data files
  • Code to process GEOS-5 met data

These are now maintained as Git repositories. Contact Bob Yantosca for more information.

NRT/ Contains some data from the GEOS-Chem near-real-time simulations for ICARTT/ITCT 2k2. We maintain this directory for archival purposes. Not all data have been preserved.
NRT-ARCTAS/ Contains output from the GEOS-Chem Near-Real-Time simulations for ARCTAS. We maintain this directory for archival purposes.
patches/ Directory that formerly contained bug-fix software patches (if necessary). We now use Git to issue GEOS-Chem software patches.
public_releases/ Contains TARBALL files with source code and run directories for GEOS-Chem public versions prior to v8-03-01. We now use Git to manage the GEOS-Chem source code & run directories.

--Bob Y. 15:39, 10 February 2011 (EST)

Alternative download site

The GEOS-Chem data and meteorological fields used by Dalhousie University are also available via anonymous FTP from:

ftp rain.ucis.dal.ca

We recommend that you use the wget utility to download these directories instead of anonymous FTP. Wget allows you to download multiple directories at once.

Directory structure

The Dalhousie archive overlaps with many of the above directories from the Harvard site, but it is not as extensive. This site, however, additionally hosts the following unique datasets:

0.5° x 0.666° Data Directories Description
/GEOS_0.5x0.666_EU/ 1/2 x 2/3 European nested grid emission etc files
/GEOS_0.5x0.666_EU.d/ 1/2 x 2/3 European nested grid met fields (GEOS-5)
/GEOS_0.5x0.666_NA/ 1/2 x 2/3 North American nested grid emission etc files
/GEOS_0.5x0.666_NA.d/ 1/2 x 2/3 North American nested grid met fields (GEOS-5)
1° x 1.25° Data Directories Description
/GEOS_1x1.25/ 1 x 1.25 Global GEOS4 emission etc files
/GEOS_1x1.25.d/ 1 x 1.25 Global GEOS4 met fields

A catalog of available data may be found HERE.

--Bob Y. 11:49, 10 February 2011 (EST)

For GCAP users

For those users who wish to run GEOS-Chem with the GISS/GCAP met fields, please contact Loretta Mickley.

--Bob Y. 09:52, 8 March 2010 (EST)

Question about directory structure

Shanna Shaked wrote:

We are working again on trying to run GEOS-Chem. However, we are encountering some errors that may be due to the directory structure. We find a discrepancy between the directory structure described in the GEOS-Chem manual and that available on the ftp site.
The GEOS-Chem manual describes a directory structure of:
   data/GEOS_4x5/GEOS_5/YYYY/MM
However, on the ftp site, we find a directory structure with an extra '.d':
   data/GEOS_4x5.d/GEOS_5/YYYY/MM
(the GEOS_5 folder is in GEOS_4x5.d rather than GEOS_4x5). There does exist a GEOS_4x5 that contains many of the emissions data, but does not contain GEOS_5.
If we leave the structure as is, and enter ../data/GEOS_4x5/ as our root data directory in input.geos, we get a file not found error when it looks for GEOS_5 within this directory (obviously).
If we instead enter ../data/GEOS_4x5.d as our root data directory, we get a file not found error when the program looks for emissions within this directory (lightning NOx emissions, in this case).
QUESTION: To solve this problem, we have moved the GEOS_5 folder into the GEOS_4x5 directory. [Is this] okay?

Bob Yantosca replied:

The only difference on our system between e.g. GEOS_4x5 and GEOS_4x5.d is that our sysadmin (Jack Yatteau) set up the ".d" directories separately so that they only contain met data (which is much larger than the emissions etc. data). That way he could separate the disks that just had met data from the disk that have the emissions data to facilitate our configuration here. There are symbolic links from GEOS_4x5 to GEOS_4x5.d etc. (i.e. the directory GEOS_4x5/GEOS_5 is actually a symbolic link to the corresponding directory in GEOS_4x5.d/GEOS_5/ and etc. for the other met field resolutions & directories).
You don't necessarily have to do this on your end, but this is what we did here. You can just make the GEOS_4x5/GEOS_5 etc. real subdirectories and not symbolic links and store the data there. The solution you picked above is OK.
Also to facilitate FTP file transfer, you could do the following:
  • Write a script or an FTP macro
  • Use a 3rd-party GUI program like FireFTP in Mozilla Firefox.
  • Or even better yet, use the Unix wget utility (see below)
Each user is responsible for their own file transfers.

--Bob Y. 11:04, 5 February 2009 (EST)

Using wget to download files

Probably the simplest way to download the GEOS-Chem emissions and met field data is to use the Unix wget utility. This allows you to download multiple files and directories at a time.

The wget utility is free and open-source (published by GNU), and comes standard with pretty much all builds of *nix (Linux, Ubuntu, Fedora, Centos, etc.). You can check out the user manual for more information.

Syntax

Most of the time, the syntax you will use to download multiple directories is as follows:

Downloading data from Harvard:

wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/DIRECTORY_NAME/"

Downloading data from Dalhousie:

wget -r -nH "ftp://rain.ucis.dal.ca/DIRECTORY_NAME/"

The options to wget are as follows:

-r   Specifies recursive directory transfer (i.e. will download all subdirectories)
-nH  Will store all directories and subdirectories in DIRECTORY_NAME, not ftp.as.harvard.edu/DIRECTORY_NAME

If you wish to trim the name of the downloaded directory (i.e., so it downloads as DIRECTORY_NAME, not pub/geos-chem/data/DIRECTORY_NAME), then use the --cut-dirs option:

wget -r -nH --cut-dirs=X "ftp://ftp.as.harvard.edu/pub/geos-chem/data/DIRECTORY_NAME/"

where X is the number of directories to trim.

NOTE: The URL must be enclosed in quotes for file transfer to occur. If you omit the quotes then wget will just return a directory listing in a file named index.html without any files being downloaded.



Examples

1. Download all emissions files in the GEOS_2x2.5/ data directory structure:

wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5/" &

The & character will make sure the file transfer happens in the Unix background.


2. Download all available GEOS-Chem 2° x 2.5° met field data files in the GEOS_2x2.5.d directory structure:

wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5.d/" &

NOTE: Due to the huge volume of data involved, this is not recommended, as the file downloads may swamp your system. It's better to do this:


3. Download all GEOS-5 met data at 2° x 2.5° resolution:

wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5.d/GEOS_5/" &

And it may even be better to download one year at a time:


4. Download all GEOS-5 met data at 2° x 2.5° resolution for 2008:

wget -r -nH "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_2x2.5.d/GEOS_5/2008/" &


5. Download the lightning_NOx_201101 folder to an existing local data directory /data/gc/GEOS_4x5 using the --cut option:

cd /data/gc/GEOS_4x5 
wget -r -nH --cut-dirs=4 "ftp://ftp.as.harvard.edu/pub/geos-chem/data/GEOS_4x5/lightning_NOx_201101"

etc.

--Bob Y. 14:27, 8 September 2010 (EDT) --Daven 17:35, 16 February 2011 (EST)