Difference between revisions of "Preparing data files for use with HEMCO"

From Geos-chem
Jump to: navigation, search
(Ordering of the data)
(Ordering of the data)
Line 449: Line 449:
 
=== Ordering of the data ===
 
=== Ordering of the data ===
  
2D and 3d array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error "start+count exceeds dimension bound". You can check the dimension ordering of your arrays by using <tt>ncdump</tt> with the -h option, e.g. <tt>ncdump file.nc -h</tt>. Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the <tt>ncdump</tt> output.  
+
2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error "start+count exceeds dimension bound". You can check the dimension ordering of your arrays by using <tt>ncdump</tt> with the -h option, e.g. <tt>ncdump file.nc -h</tt>. Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the <tt>ncdump</tt> output.  
  
 
The following dimension orders are acceptable:
 
The following dimension orders are acceptable:

Revision as of 19:38, 12 September 2018

On this page we discuss how you can generate netCDF data files in the proper format for HEMCO.

The COARDS netCDF standard

Overview

The HEMCO emissions component reads data stored in the netCDF file format, which is a common data format used in atmospheric and climate sciences. NetCDF files contain data arrays as well as the metadata—that is, a description of the data, such as its units, the axes on which it is defined, if there are missing data values, etc.

Several netCDF conventions have been developed in order to facilitate data exchange and visualization. The Cooperative Ocean/Atmosphere Research Data Service (COARDS) standard defines regular conventions for naming dimensions as well as the attributes describing the data. You will find more information about these conventions in the sections below. HEMCO requires its input data to be adhere to the COARDS standard.

NOTE: The Climate and Forecast (CF) Convention, which generalizes and extends the COARDS standard, is also compatible with HEMCO.

--Bob Y. 16:27, 2 March 2015 (EST)

Examining the contents of a netCDF file

An easy way to examine the contents of a netCDF file is to use this command:

ncdump -ct my_sample_data_file.1x1

You will see output similar to this:

netcdf my_sample_data_file.1x1 {
dimensions:
        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      
variables:
        float time(time) ;
                time:long_name = "time" ;
                time:units = "hours since 1985-01-01 00:00:00" ;
                time:calendar = "standard" ;
                time:axis = "T";
        int lev(lev) ;
                lev:long_name = "GEOS-Chem level" ;
                lev:units = "level" ;
                lev:positive = "up" ;
                lev:axis = "Z" ;
        float lat(lat) ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
        float lon(lon) ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X"
        float PRPE(time, lev, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:add_offset = 0.f ;
                PRPE:scale_factor = 1.f ;
                PRPE:FillValue = 1.e+15f ;
                PRPE:missing_value = 1.e+15f ;
                PRPE:gamap_category = "ANTHSRCE" ;
        float CO(time, lev, lat, lon) ;
                CO:long_name = "CO" ;
                CO:units = "kg/m2/s" ;                
                CO:add_offset = 0.f ;
                CO:scale_factor = 1.f ;
                CO:FillValue = 1.e+15f ;
                CO:missing_value = 1.e+15f ;
                CO:gamap_category = "ANTHSRCE" ;

         .... etc ....

// global attributes:              
               :Title = "COARDS/netCDF file containing X data"
               :Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
               :References = "www.geos-chem.org; wiki.geos-chem.org" ;
               :Conventions = "COARDS" ;
               :Filename = "my_sample_data_file.1x1"
               :History = "Mon Mar 17 16:18:09 2014 GMT" ;
               :ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
               :ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
               :VersionID = "1.2" ;
               :Format = "NetCDF-3" ;
               :Model = "GEOS5" ;
               :Grid = "GEOS_1x1" ;
               :Delta_Lon = 1.f ;
               :Delta_Lat = 1.f ;
               :SpatialCoverage = "global" ;
               :NLayers = 72 ;            
               :Start_Date = 20050101 ;
               :Start_Time = 00:00:00.0 ;
               :End_Date = 20051231 ;
               :End_Time = 23:59:59.99999 ;

data:

 lon = -180, -179, -178, -177,  ...

 lat = -90, -89, -88, -87, -86, ...

 lev = 1, 2, 3, 4, 5 ...

 time = "2005-01-01", "2005-02-01", "2005-03-01", "2005-04-01", "2005-05-01", 
    "2005-06-01", "2005-07-01", "2005-08-01", "2005-09-01", "2005-10-01", 
    "2005-11-01", "2005-12-01" ;

These will be explained in more detail in the sections below.

--Bob Y. 16:12, 2 March 2015 (EST)

COARDS dimensions

The dimensions of a netCDF file define how many grid boxes there are along a given direction. While the COARDS standard does not require any specific names for dimensions, accepted practice is to use these names:

Dimension Description
time Specifies the number of points along the time (T) axis.
lev Specifies the number of points along the vertical level (Z) axis.
lat Specifies the number of points along the latitude (Y) axis.
lon Specifies the number of points along the longitude (X) axis.

NOTES:

  1. The time dimension must always be specified. When you create the netCDF file, you may declare time to be UNLIMITED and then later define its size. This allows you to append further time points into the file later on.
  2. The lev dimension may be omitted if all of the data arrays in the netCDF file only contain a single level of data.
  3. The recommended ordering of dimensions in the file is time, lev (if necessary), lat, lon. Also see the Ordering of the data section below.

--Bob Y. 16:45, 2 March 2015 (EST)

COARDS coordinate vectors

Coordinate vectors (aka index variables or axis variables) are 1-dimensional arrays that define the values along each axis—the longitudes, latitudes, levels, and times in the file.

The only COARDS requirement for coordinate vectors are these:

  1. Each coordinate vector must be given the same name as the dimension that is used to define it.
  2. All of the values contained within a coordinate vector must be either monotonically increasing or monotonically decreasing.

In practice, coordinate vectors in COARDS-compliant netCDF files generally use the same naming convention as for dimensions: time, lev, lon, lat. These are discussed in more detail below.

time

In our example above, the time coordinate vector has these features:

dimensions:
        time = UNLIMITED ; // (12 currently)
variables:
        float time(time) ;
                time:long_name = "time" ;
                time:units = "hours since 1985-01-01 00:00:00" ;
                time:calendar = "standard" ;
                time:axis = "T";

Here, time is a 4-byte floating point (aka REAL*4) of dimension time = 12 time points. You can also declare time to be an 8-byte floating point array (aka REAL*8) if you so choose.

The COARDS standard also defines several netCDF attributes for use with the time coordinate vector:

Attribute Type Description
long_name REQUIRED Gives a detailed description of the contents of this array.
  • Set this to Time.
units REQUIRED Specifies the number of hours, minutes, seconds, etc. that has elapsed with respect to a reference time. Set this to one of the following:
  • hours since YYYY-MM-DD 00:00:00 (RECOMMENDED)
  • minutes since YYYY-MM-DD 00:00:00
  • seconds since YYYY-MM-DD 00:00:00
  • days since YYYY-MM-DD 00:00:00

We recommend that you choose the reference time YYYY-MM-DD to correspond with the first time value contained in the file. This will make the first index of the time coordinate array, time(0) = 0.

For more information about the COARDS time standard, please visit the COARDS web page.

calendar REQUIRED Specifies the calendar used to define the time system.
  • Set this to one of the following values:
    • standard
    • gregorian
standard_name OPTIONAL Can be used instead of long_name.
axis OPTIONAL (but recommended) Identifies which axis (X,Y,Z,T) this array corresponds to. Many software packages use this attribute to facilitate plotting.
  • Set this to T.

--Bob Y. 10:48, 27 February 2015 (EST)

lev

In our example above, the lev coordinate vector has these features:

 dimensions:
         lev = 72 ;
 variables:
         int lev(lev) ;
                 lev:long_name = "GEOS-Chem level" ;
                 lev:units = "level" ;
                 lev:positive = "up" ;
                 lev:axis = "Z" ;

Here, lev is a 4-byte integer of dimension lev = 72 time points. You can also declare lev to be an 4-byte (aka REAL*4) or 8-byte floating point array (aka REAL*8) if you so choose.

The COARDS standard also defines several netCDF attributes for use with the time coordinate vector:

Attribute Type Description
long_name REQUIRED Gives a detailed description of the contents of this array.
  • Set this to one of the following:
    • GEOS-Chem levels
    • Eta Centers,
    • Sigma centers
  • Your choice will depend on how you have defined the vertical axis.

IMPORTANT! If you give long_name the value of GEOS-Chem levels, then HEMCO will be able to vertically regrid the data from one GEOS-Chem grid to another.

units REQUIRED Specifies the units of longitude.
  • Set this to one of the following:
    • level
    • eta_level
    • sigma_level
  • Your choice will depend on how you have defined the vertical axis.

IMPORTANT! If you give long_name the value of level, then HEMCO will be able to vertically regrid the data from one GEOS-Chem grid to another.

standard_name OPTIONAL You may use this instead of long_name.
axis OPTIONAL (but recommended) Identifies which axis (X,Y,Z,T) this array corresponds to. Many software packages use this attribute to facilitate plotting.
  • Set this to Z.
positive OPTIONAL (but recommended) Specifies in which direction that the vertical levels are indexed. Many software packages use this attribute to facilitate plotting.
  • Set this to up. Most data used by HEMCO is indexed from the surface upwards.
  • NOTE: GEOS-Chem HP needs this attribute to identify the vertical ordering of the data.

--Bob Yantosca (talk) 21:57, 27 February 2017 (UTC)

lat

In our example above, the lat coordinate vector has these features:

dimensions:
        lat = 181 ;
variables:
        float lat(lat) ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;

Here, lat is a 4-byte floating point (aka REAL*4) of dimension lat = 181 latitudes. You can also declare lat to be an 8-byte floating point array (aka REAL*8) if you so choose.

The COARDS standard also defines several netCDF attributes for use with the lat coordinate vector:

Attribute Type Description
long_name REQUIRED Gives a detailed description of the contents of this array.
  • Set this to Latitude.
units REQUIRED Specifies the units of latitude.
  • Set this to degrees_north.
standard_name OPTIONAL You may use this attribute instead of long_name.
axis OPTIONAL (but recommended) Identifies which axis (X,Y,Z,T) this array corresponds to. Many software packages use this attribute to facilitate plotting.
  • Set this to Y.

--Bob Y. 16:30, 2 March 2015 (EST)

lon

In the example above, the lon coordinate vector has these features:

dimensions:
        lon = 360 ; 
variables:
        float lon(lon) ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;

Here, lon is a 4-byte floating point (aka REAL*4) of dimension lon = 360 longitudes. You can also declare lon to be an 8-byte floating point array (aka REAL*8) if you so choose.

The COARDS standard also defines several netCDF attributes for use with the lon coordinate vector:

Attribute Type Description
long_name REQUIRED Gives a detailed description of the contents of this array.
  • Set this to Longitude.
units REQUIRED Specifies the units of longitude
  • Set this to degrees_east.
standard_name OPTIONAL You may use this instead of long_name.
axis OPTIONAL (but recommended) Identifies which axis (X,Y,Z,T) this array corresponds to. Many software packages use this attribute to facilitate plotting.
  • Set this to X.

Longitudes may be represented modulo 360. Thus, for example, -180, 180, and 540 are all valid representations of the International Dateline and 0 and 360 are both valid representations of the Prime Meridian. Note, however, that the sequence of numerical longitude values stored in the netCDF file must be monotonic in a non-modulo sense.

Practical guidelines:

  1. If your grid begins at the International Dateline (-180°), then place your longitudes into the range -180..180.
  2. If your grid begins at the Prime Meridian (0°), then place your longitudes into the range 0..360.

--Bob Y. 16:31, 2 March 2015 (EST)

COARDS data arrays

A COARDS-compliant netCDF file may contain several data arrays. In our example file shown above, there are two data arrays:

dimensions:
        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      
variables:
        float PRPE(time, lev, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:add_offset = 0.f ;
                PRPE:scale_factor = 1.f ;
                PRPE:_FillValue = 1.e+15f ;
                PRPE:missing_value = 1.e+15f ;
                PRPE:gamap_category = "ANTHSRCE" ;
        float CO(time, lev, lat, lon) ;
                CO:long_name = "CO" ;
                CO:units = "kg/m2/s" ;                
                CO:add_offset = 0.f ;
                CO:scale_factor = 1.f ;
                CO:_FillValue = 1.e+15f ;
                CO:missing_value = 1.e+15f ;
                CO:gamap_category = "ANTHSRCE" ;

These arrays contain emissions for GEOS-Chem tracers PRPE (lumped < C3 alkenes) and CO.

Attributes

Data arrays in COARDS-compliant netCDF typically use these netCDF attributes.

Attribute Type Description
long_name REQUIRED Gives a detailed description of the contents of the array.
units REQUIRED Specifies the units in the array. In general, SI units are preferred.

Special usage for HEMCO:

  1. Emissions fluxes
    • For species such as PRPE that are tracked as equivalent carbons, use kgC/m2/s or kgC m-2 s-1.
    • For all other species, use kg/m2/s or kg m-2 s-1.
  2. Concentration data
    • Use kg/m3 or kg m-3
  3. Dimensionless data
    • Use 1.
    • Do not use unitless, that is non-standard. (HEMCO will recognize it, but it is not recommended.)
add_offset OPTIONAL (but recommended) Specifies an offset used to store floating-point data as packed integer. Ignored otherwise.
  • Set this to 0.
scale_factor OPTIONAL (but recommended) Specifies the scale factor used to store floating-point data as packed integer. Ignored otherwise.
  • Set this to 1.
standard_name OPTIONAL (but recommended) You may use this instead of long_name.
missing_value OPTIONAL (but recommended) Specifies the value that represents missing data. This should be set to a number that will not be mistaken for a valid data value. Typical missing data values are 1e15, +/-1e32, or +/-1e-32.

NOTE: The missing_value attribute should not exceed the maximum or minimum allowable value for 4-byte (aka REAL*4) precision (i.e. ~ +/-1e32 or +/-1e-32) . This should avoid floating point errors in HEMCO caused by type conversion.

_FillValue OPTIONAL (but recommended) Synonym for missing_value. It is recommended to set both missing_value and _FillValue attributes to the same value. Some data visualization packages look for the missing_value attribute, while others look for _FillValue.
gamap_category OPTIONAL Specifies the GAMAP diagnostic category name. This makes it easier for the GAMAP visualization package to read the file.

--Bob Y. (talk) 17:56, 1 June 2015 (UTC)

Ordering of the data

2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error "start+count exceeds dimension bound". You can check the dimension ordering of your arrays by using ncdump with the -h option, e.g. ncdump file.nc -h. Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the ncdump output.

The following dimension orders are acceptable:

    array(time,lat,lon)
    array(time,lat,lon,lev)

The rest of this section explains why the dimension ordering of arrays matters.

When you use ncdump utility to examine the contents of a netCDF file, you will notice that it displays the dimensions of the data in the opposite order with respect to Fortran. In our sample file, ncdump says that the CO and PRPE arrays have these dimensions:

    CO(time,lev,lat,lon)
    PRPE(time,lev,lat,lon)

But if you tried to read this netCDF file into GEOS-Chem (or any other program written in Fortran), you must use data arrays that have these dimensions:

    CO(lon,lat,lev,time)
    PRPE(lon,lat,lev,time)

Here's why:

Fortran is a column-major language, which means that arrays are stored in memory by columns first, then by rows. If you have declared an arrays such as:

    INTEGER            :: I, J, L, T
    INTEGER, PARAMETER :: N_LON  = 360
    INTEGER, PARAMETER :: N_LAT  = 181
    INTEGER, PARAMETER :: N_LEV  = 72
    INTEGER, PARAMTER  :: N_TIME = 12 
    REAL*4             :: CO  (N_LON,N_LAT,N_LEV,N_TIME)
    REAL*4             :: PRPE(N_LON,N_LAT,N_LEV,N_TIME)

then for optimal efficiency, the leftmost dimension (I) needs to vary the fastest, and needs to be accessed by the innermost DO-loop. Then the next leftmost dimension (J) should be accessed by the next innermost DO-loop, and so on. Therefore, the proper way to loop over these arrays is:

    DO T = 1, N_TIME
    DO L = 1, N_LEV
    DO J = 1, N_LAT
    DO I = 1, N_LON
       CO  (I,J,L,N) = ...
       PRPE(I,J,L,N) = ...
    ENDDO
    ENDDO
    ENDDO
    ENDDO

Note that the I index is varying most often, since it is the innermost DO-loop, then J, L, and T. This is opposite to how a car's odometer reads.

If you loop through an array in this fashion, with leftmost indices varying fastest, then the code minimizes the number of times it has to load subsections of the array into cache memory. In this optimal manner of execution, all of the array elements sitting in the cache memory are read in the proper order before the next array subsection needs to be loaded into the cache. But if you step through array elements in the wrong order, the number of cache loads is proportionally increased. Because it takes a finite amount of time to reload array elements into cache memory, the more times you have to access the cache, the longer it will take the code to execute. This can slow down the code dramatically.

On the other hand, C is a row-major language, which means that arrays are stored by rows first, then by columns. This means that the outermost do loop (I) is varying the fastest. This is identical to how a car's odometer reads.

If you use a Fortran program to write data to disk, and then try to read that data from disk into a program written in C (or NCL), then unless you reverse the order of the DO loops, you will be reading the array in the wrong order. In C you would have to use this ordering scheme (using Fortran-style syntax to illustrate the point):

    DO I = 1, N_LON     
    DO J = 1, N_LAT
    DO L = 1, N_LEV
    DO T = 1, N_TIME     
       CO(T,L,J,I)   = ...
       PRPE(T,L,J,I) = ...
    ENDDO
    ENDDO
    ENDDO
    ENDDO

Because ncdump is written in C, the order of the array appears opposite with respect to Fortran. The same goes for any code written in the NCAR command language (NCL), which is also written in C.

--Bob Y. 17:27, 26 February 2015 (EST)

COARDS Global attributes

Global attributes are netCDF attributes that contain information about a netCDF file, as opposed to information about an individual data array. From our example above, the output from ncdump showed that our sample netCDF file has several global attributes:

// global attributes:               
               :Title = "COARDS/netCDF file containing X data"
               :Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
               :References = "www.geos-chem.org; wiki.geos-chem.org" ;
               :Conventions = "COARDS" ;
               :Filename = "my_sample_data_file.1x1"
               :History = "Mon Mar 17 16:18:09 2014 GMT" ;
               :ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
               :ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
               :VersionID = "1.2" ;
               :Format = "NetCDF-3" ;
               :Model = "GEOS5" ;
               :Grid = "GEOS_1x1" ;
               :Delta_Lon = 1.f ;
               :Delta_Lat = 1.f ;
               :SpatialCoverage = "global" ;
               :NLayers = 72 ;            
               :Start_Date = 20050101 ;
               :Start_Time = 00:00:00.0 ;
               :End_Date = 20051231 ;
               :End_Time = 23:59:59.99999 ;
              

You can add as many global attributes as you wish. The following are the most commonly used:

Attribute Type Description
Title REQUIRED Provides a short description of the the file.
  • If the file was converted from binary punch format by GAMAP routine BPCH2COARDS, then Title will be set to COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+).
Contact OPTIONAL (but recommended) Provides contact information about the person(s) who created the netCDF file.
References OPTIONAL (but recommended) Provides references (or links to a web or wiki page) for the data contained in the netCDF file.
Conventions REQUIRED Indicates if the netCDF file adheres to a standard (e.g. COARDS, CF, etc.)
  • Set this to COARDS.
Filename OPTIONAL (but recommended) Specifies the name of the netCDF file.
History OPTIONAL (but recommended) Lists the date of file creation, and subsequent dates of modification.
  • If you use the netCDF operators (NCO) or Climate Data Operators (CDO) to modify the file, the History attribute will be modified to display the commands that were used to modify the file.
ProductionDateTime OPTIONAL (but recommended) Specifies the date and time on which the file was originally created.
ModificationDateTime OPTIONAL (but recommended) Specifies the dates and times on which the file was modified.
VersionID OPTIONAL (but recommended) Specifies a version number corresponding to the data in the netCDF file.
  • For example, GMAO met field files use this attribute to denote the version number of the GEOS-DAS system (e.g. 5.7.2, 5.13.1) that was used to create the data.
Format OPTIONAL (but recommended) Specifies the format of the netCDF file. Possible options are:
  • NetCDF-3
  • NetCDF-4
Model OPTIONAL Specifies the vertical grid (e.g. GEOS-5, MERRA, GEOS-FP) of the GEOS-Chem simulation that was used to generate this data.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • For GMAO met field data, this indicates the version of the (e.g. GEOS-5) used to assimilate the data.
Delta_Lat OPTIONAL Specifies the spacing between points along the longitude axis.
  • This attribute is added by GAMAP routine BPCH2COARDS.
Delta_Lon OPTIONAL Specifies the spacing between points along the longitude axis.
  • This attribute is added by GAMAP routine BPCH2COARDS.
SpatialCoverage OPTIONAL Specifies the horizontal extent of the data. Possible values are:
  • global
  • regional
NLayers OPTIONAL Specifies the number of vertical levels in the grid.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • If the file contains only surface data, then BPCH2COARDS sets NLayers to 1.
  • Sometimes you will see this attribute named Nlayers.
Start_Date OPTIONAL Specifies the starting date of the data in the file.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • You can also manually add this attribute.
End_Date OPTIONAL Specifies the ending date of the data in the file.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • You can also manually add this attribute.
Start_Time OPTIONAL Specifies the starting date of the data in the file.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • You can also manually add this attribute.
  • This attribute often has the value of 00:00:00.0.
End_Date OPTIONAL Specifies the ending date of the data in the file.
  • This attribute is added by GAMAP routine BPCH2COARDS.
  • You can also manually add this attribute.
  • This attribute often has the value of 23:59:59.9.

--Bob Y. 12:52, 3 March 2015 (EST)

Converting files from binary punch format to netCDF

GEOS-Chem versions prior to GEOS-Chem v10-01 read emissions data from binary punch data format. You will need to convert these data files to COARDS-compliant netCDF for use with HEMCO. Follow these instructions:

Step 1: Use GAMAP routine BPCH2COARDS

You can use the GAMAP routine BPCH2COARDS to create netCDF files from a GEOS-Chem binary punch file. For example, start IDL and then type this command at the IDL prompt:

IDL> bpch2coards, 'uvalbedo.geos.2x25', 'uvalbedo.geos.2x25.%DATE%.nc' 

will create the following netCDF files:

uvalbedo.geos.2x25.19850101.nc
uvalbedo.geos.2x25.19850201.nc
uvalbedo.geos.2x25.19850301.nc
uvalbedo.geos.2x25.19850401.nc
uvalbedo.geos.2x25.19850501.nc
uvalbedo.geos.2x25.19850601.nc
uvalbedo.geos.2x25.19850701.nc
uvalbedo.geos.2x25.19850801.nc
uvalbedo.geos.2x25.19850901.nc
uvalbedo.geos.2x25.19851001.nc
uvalbedo.geos.2x25.19851101.nc
uvalbedo.geos.2x25.19851201.nc

Note that BPCH2COARDS will create a new file for each time slice. The %DATE% token in the output file name will be replaced with the year-month-day value for each time stamp. In the above example, the binary punch file uvalbedo.geos.2x25 contains monthly data, therefore BPCH2COARDS will create 12 individual netCDF files.

Special note for timeseries data: To use BPCH2COARDS to convert timeseries (e.g. hourly, 3-hourly, etc) data to netCDF format, add the %TIME% token to the netCDF file name. For example:

IDL> bpch2coards, 'timeseries.geos.2x25', 'timeseries.geos.2x25.%DATE%.%TIME%.nc'

This will create one new netCDF file for each timestamp in the bpch file. You can then proceed to Step 2 and Step 3 below to concatenate these files into a single netCDF file.

--Bob Y. (talk) 18:11, 1 June 2015 (UTC)

Step 2: Concatenate the netCDF files

You can use the ncrcat commmand of the netCDF Operators (nco) to concatenate the 12 individual files created by BPCH2COARDS into a single netCDF file. Make sure you have exited IDL, and then type the following command at the Unix prompt:

ncrcat -hO uvalbedo.geos.2x25.1985*.nc  uvalbedo.geos.2x25.nc

You can then discard the uvalbedo.geos.2x25.1985*.nc files that were created directly by BPCH2COARDS.

--Bob Y. 12:10, 3 March 2015 (EST)

Step 3: Edit variable names and atributes according to COARDS conventions

After following Step 1 and Step 2 above, you may still need to hand-edit the variable names and attributes in the netCDF file to make it COARDS-compatible. Christoph Keller has provided these useful commands for editing netCDF files. He writes:

NetCDF files must always adhere to the COARDS conventions and some hand-editing may be required before using them in HEMCO. Fortunately, there are a number of excellent software toolkits available to quickly and efficiently manipulate netCDF files, such as nco, cdo, and ncl. Below is a short list of commands that I use almost every day:
Operation Command
Display the header and the coordinate values of a netCDF file, with the time coordinate displayed in human-readable format: ncdump -ct file.nc
Compress a netCDF File. This can considerably reduce the file size! nccopy -d0 in.nc out.nc # No compression
nccopy -d1 in.nc out.nc # Minimum compression (sufficent for most purposes)
nccopy -d5 in.nc out.nc # Medium compression
nccopy -d9 in.nc out.nc # Maximum compression
Change variable name from IJ_AVG_S__NO to NO: ncrename -v IJ_AVG_S__NO,NO file.nc
Change the time from 1 Jan 1985 to 1 Jan 2000: cdo settime,2000-01-01 in.nc out.nc
Set all missing values to zero: cdo setmisstoc,0 in.nc out.nc
Add/change the long_name attribute of the vertical coordinate (lev) to "GEOS-Chem levels". This will ensure that HEMCO recognizes the vertical levels of the input file as GEOS-Chem model levels. ncatted -a long_name,lev,o,c,"GEOS-Chem levels" file.nc
Add/change the units attribute of the latitude coordinate (lat) to degrees_north: ncatted -a units,lat,o,c,"degrees_north" file.nc
Add/change the References global attribute: ncatted -a References,global,o,c,"www.geos-chem.org; wiki.geos-chem.org" file.nc
Add a time dimension to a file with missing time dimension: ncap2 -h -s 'defdim(“time”,1);time[time]=0.0;time@long_name=“time”;time@calendar=“standard”;time@units=“days since 2007-01-01 00:00:00”' -O in.nc out.nc
Convert the units of the CHLA variable from mg/m3 to kg/m3 (i.e. divide by 1e6): ncap2 -v -s "CHLA=CHLA/1000000.0f" in.nc out.nc
ncatted -a units,CHLA,o,c,"kg/m3" out.nc
mv out.nc in.nc

Here are some specific commands that we used on the uvalbedo.geos.2x25.nc file from our example in the preceding sections. If you need to apply these commands to more than one file, you can place them into a script.

Make the data array compatible with GAMAP:

  ncrename -v UVALBEDO__UVALBD,UVALBD uvalbedo.geos.2x25.nc

  ncatted -a gamap_category,UVALBD,o,c,"UVALBEDO" uvalbedo.geos.2x25.nc

Add several global attributes that BPCH2COARDS did not create (or that you want to change):

  ncatted -a History,global,o,c,"Tue Mar  3 12:18:38 EST 2015" uvalbedo.geos.2x25.nc
 
  ncatted -a ProductionDateTime,global,o,c,"Tue Mar  3 12:18:38 EST 2015" uvalbedo.geos.2x25.nc

  ncatted -a ModificationDateTime,global,o,c,"Tue Mar  3 12:18:38 EST 2015" uvalbedo.geos.2x25.nc

  ncatted -a Contact,global,o,c,"GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" uvalbedo.geos.2x25.nc

  ncatted -a References,global,o,c,"www.geos-chem.org; wiki.geos-chem.org" uvalbedo.geos.2x25.nc

Replace the default Title global attribute that was written by BPCH2COARDS:

  ncatted -a Title,global,o,c,"UV albedo data from Hermann & Celarier (1997)" uvalbedo.geos.2x25.nc

--Bob Y. 11:55, 5 March 2015 (EST)

Chunking and deflating the netCDF file to improve I/O

We recommend that you chunk the data in your netCDF file. Chunking specifies the order in along which the data will be read from disk. The Unidata web site has a good overview of why chunking a netCDF file matters.

For GEOS-Chem with the high-performance option (aka GCHP), the best file I/O performance occurs when the file is split into one chunk per layer. This allows each individual vertical layer of data to be read in parallel,

You can use the nccopy command of the netCDF Operators (NCO) library to do the chunking. For example, say you have a netCDF file called myfile.nc with these dimensions:

dimensions:
        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      

Then you can issue this command to apply the optimal chunking along levels:

nccopy -c lon/360,lat/181,lev/1,time/1 -d1 myfile.nc tmp.nc
mv tmp.nc myfile.nc

This will create a new file called tmp.nc that has the proper chunking. We then replace myfile.nc with this temporary file.

You can specify the chunk sizes that will be applied to the variables in the netCDF file with the -c argument to nccopy. To obtain the optimal chunking, the lon chunksize must be identical to the number of values along the longitude dimension (e.g. lon/360 and the lat chunksize must be equal to the number of points in the latitude dimension (e.g. lat/181).

We also recommend that you deflate (i.e. compress) the netCDF data variables at the same time you apply the chunking. Deflating can substantially reduce the file size, especially for emissions data that are only defined over the land but not over the oceans. You can deflate the data in a netCDF file by specifying the -d argumetnt to nccopy. There are 10 possible deflation levels, ranging from 0 (no deflation) to 9 (max deflation). For most purposes, a deflation level of 1 -d1 is sufficient.

You can use the ncdump -cts myfile.nc command to view the chunk size and deflation level in the file. After applying the chunking and compression to myfile.nc, you would see output such as this:

dimensions:
        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      
variables:
        float PRPE(time, lev, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:add_offset = 0.f ;
                PRPE:scale_factor = 1.f ;
                PRPE:_FillValue = 1.e+15f ;
                PRPE:missing_value = 1.e+15f ;
                PRPE:gamap_category = "ANTHSRCE" ;
                PRPE:_Storage = "chunked" ;
                PRPE:_ChunkSizes = 1, 1, 181, 360 ;
                PRPE:_DeflateLevel = 1 ;
                PRPE:_Endianness = "little" ;
        float CO(time, lev, lat, lon) ;
                CO:long_name = "CO" ;
                CO:units = "kg/m2/s" ;                
                CO:add_offset = 0.f ;
                CO:scale_factor = 1.f ;
                CO:_FillValue = 1.e+15f ;
                CO:missing_value = 1.e+15f ;
                CO:gamap_category = "ANTHSRCE" ;
                CO:_Storage = "chunked" ;
                CO:_ChunkSizes = 1, 1, 181, 360 ;
                CO:_DeflateLevel = 1 ;
                CO:_Endianness = "little" ;

The attributes listed in BLUE, and which begin with an _ character are "hidden" netCDF attributes. They represent file properties instead of user-defined properties (like the long name, units, etc.). The "hidden" attributes can be shown by adding the -s argument to ncdump.

--Bob Yantosca (talk) 15:31, 13 April 2018 (UTC)

Required information when submitting data to the HEMCO data repository

If you are submitting a new emissions inventory or atmospheric data set for inclusion into the standard HEMCO data repository, then please send the following information to the GEOS-Chem Support Team:

  1. Is this data meant to replace another existing data set? Or it meant to be used as a research option?
  2. What scale factors (annual, monthly, daily, weekday/weekend) need to be applied to the data?
  3. What is the source of the data? Provide citations.
  4. If the data is regional, please provide the relevant mask files.
  5. Provide any relevant citations that describe the data.
  6. Provide a README file with specific information about how the data files were prepared. You can follow this example.

We also recommend that you can your netCDF files with the isCoards script (described in more detail below) before submitting them to the GCST.

--Bob Y. 11:58, 5 March 2015 (EST)

Script for determining if a netCDF file is COARDS-compliant

The isCoards script will ship with GEOS-Chem v11-01 and higher versions.

The GEOS-Chem Support Team has created a script named isCoards that will let you easily determine if a netCDF file is COARDS-compliant. You may obtain this script from our NcdfUtilities repository. We also recommend that you copy isCoards into a folder that is in your search path (such as ~/bin) so that it will be available to you in whatever directory you are working in.

git clone https://github.com/geoschem/ncdfutil NcdfUtil
cp NcdfUtil/perl/isCoards ~/bin

Starting with GEOS-Chem v11-01, isCoards will also be added to the NcdfUtil/perl subfolder of the GEOS-Chem source code directory.

The isCoards will give you detailed output of which elements of a netCDF file are COARDS-compliant and which are not. Here is an example:

> cd /mnt/gcgrid/data/ExtData/HEMCO/GFED4/v2015-10/2013
> isCoards GFED4_3hrfrac_gen.025x025.201301.nc

===========================================================================
Filename: GFED4_3hrfrac_gen.025x025.201301.nc
===========================================================================

The following items adhere to the COARDS standard:
---------------------------------------------------------------------------
-> time(time)
-> time is monotonically increasing
-> time:units = "hours since 1985-01-01 00:00:00" 
-> lon(lon)
-> lon is monotonically increasing
-> lon:units = "degrees_east" 
-> lat(lat)
-> lat is monotonically increasing
-> lat:units = "degrees_north" 
-> GFED_FRAC3HR(time,lat,lon)
-> GFED_FRAC3HR:units = "1" 

The following items DO NOT ADHERE to the COARDS standard:
---------------------------------------------------------------------------
-> time:calendar is missing
-> time:long_name (or time:standard_name) is missing
-> lon:long_name (or lon:standard_name) is missing
-> lat:long_name (or lat:standard_name) is missing
-> GFED_FRAC3HR:long_name (or GFED_FRAC3HR:standard_name) is missing
-> The "Conventions" global attribute is missing
-> The "History" global attribute is missing
-> The "Title" global attribute is missing

The following optional items are RECOMMENDED:
---------------------------------------------------------------------------
-> Consider adding time:axis = "T"
-> Consider adding lon:axis ="X"
-> Consider adding lat:axis = "Y"
-> Consider adding GFED_FRAC3HR:_FillValue
-> Consider adding GFED_FRAC3HR:missing_value
-> Consider adding GFED_FRAC3HR:add_offset
-> Consider adding GFED_FRAC3HR:scale_factor
-> Consider adding the "Format" global attribute
-> Consider adding the "References" global attribute

For more information how to fix non COARDS-compliant items, see:
http://wiki.geos-chem.org/Preparing_data_files_for_use_with_HEMCO

--Bob Yantosca (talk) 16:58, 6 January 2016 (UTC)