Difference between revisions of "The COARDS netCDF conventions for earth science data"

From Geos-chem
Jump to: navigation, search
(Examining the variables and attributes in an netCDF file)
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__FORCETOC__
 
__FORCETOC__
'''''[[Working with netCDF data files|Next]] | [[Installing libraries for GEOS-Chem|Libraries home]] | [[Getting_Started_with_GEOS-Chem|User Manual Home]] | [[Main_Page|GEOS-Chem Main Page]]'''''
+
'''''[[Use Spack to install netCDF on your system|Previous]] | [[Working with netCDF data files|Next]] | [[Guide to netCDF in GEOS-Chem]]'''''
  
 
#[[Introduction to netCDF]]
 
#[[Introduction to netCDF]]
Line 7: Line 7:
 
#<span style="color:blue">'''The COARDS netCDF conventions for earth science data</span>
 
#<span style="color:blue">'''The COARDS netCDF conventions for earth science data</span>
 
#[[Working with netCDF data files]]
 
#[[Working with netCDF data files]]
 +
#[[Creating netCDF data files for GEOS-Chem]]
 +
#[[Other libraries used by GEOS-Chem]]
  
  
== Overview ==
+
This content has been migrated to the [https://geos-chem.readthedocs.io/en/latest/geos-chem-shared-docs/supplemental-guides/coards-guide.html '''Prepare COARDS-compliant netCDF files''' guide at <tt>geos-chem.readthedocs.io</tt>].
  
=== What are the COARDS conventions? ===
 
  
Both [[Getting Started with GEOS-Chem|GEOS-Chem "Classic"]] and [[GEOS-Chem HP|GCHP]] read data stored in [http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#whatisit the netCDF file format], which is a common data format used in atmospheric and climate sciences.  NetCDF files contain data arrays as well as the metadata&mdash;that is, a description of the data, such as its units, the axes on which it is defined, if there are missing data values, etc. 
+
----
 
+
'''''[[Use Spack to install netCDF on your system|Previous]] | [[Working with netCDF data files|Next]] | [[Guide to netCDF in GEOS-Chem]]'''''
Several netCDF conventions have been developed in order to facilitate data exchange and visualization.  The [http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html '''C'''ooperative '''O'''cean/'''A'''tmosphere '''R'''esearch '''D'''ata '''S'''ervice (COARDS) conventions] defines regular conventions for naming dimensions as well as the [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html attributes] describing the data.  You will find more information about these conventions in the sections below.  The COARDS conventions for have become a standard for sharing Earth Science data in netCDF files. 
+
 
+
Some netCDF files also use the the [http://cfconventions.org/ Climate and Forecast (CF) Convention], which generalizes and extends the COARDS conventions by defining standard names for variables and attributes.  Any netCDF data file that is CF-compliant by definition is also COARDS-compliant. 
+
 
+
Both GEOS-Chem and GCHP require that all input data in netCDF format adhere at least to the COARDS conventions (the CF conventions are optional).
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:17, 12 June 2019 (UTC)
+
 
+
=== Examining the variables and attributes in an netCDF file ===
+
 
+
The GEOS-Chem Support Team has also prepared a script called <tt>isCoards</tt>, which is a quick way to determine if a netCDF file adheres to the COARDS conventions.  [[#Determining if a netCDF file is COARDS-compliant|Please see this section]] for more information about <tt>isCoards</tt>.
+
 
+
You can also [[Working_with_netCDF_data_files#Examining_the_contents_of_a_netCDF_file|use the <tt>ncdump</tt> command]] to examine the contents of a netCDF file visually.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:45, 12 June 2019 (UTC)
+
 
+
== COARDS dimensions ==
+
 
+
=== Cartesian grids ===
+
 
+
The '''dimensions''' of a netCDF file define how many grid boxes there are along a given direction.  While the COARDS standard does not require any specific names for dimensions, accepted practice is to use these names for cartesian (lat-lon) grids:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="10px"|Dimension
+
!width="500px"|Description
+
|-valign="top"
+
|<tt>time</tt>
+
|Specifies the number of points along the time (<tt>T</tt>) axis.
+
|-valign="top"
+
|<tt>lev</tt>
+
|Specifies the number of points along the vertical level (<tt>Z</tt>) axis.
+
|-valign="top"
+
|<tt>lat</tt>
+
|Specifies the number of points along the latitude (<tt>Y</tt>) axis.
+
|-valign="top"
+
|<tt>lon</tt>
+
|Specifies the number of points along the longitude (<tt>X</tt>) axis.
+
|-valign="top"
+
|}
+
 
+
NOTES:
+
#The <tt>time</tt> dimension must always be specified.  When you create the netCDF file, you may declare <tt>time</tt> to be <tt>UNLIMITED</tt> and then later define its size.  This allows you to append further time points into the file later on.
+
#The <tt>lev</tt> dimension may be omitted if all of the data arrays in the netCDF file only contain a single level of data.
+
#The recommended ordering of dimensions in the file is <tt>time</tt>, <tt>lev</tt> (if necessary), <tt>lat</tt>, <tt>lon</tt>.  Also see the [[#Ordering of the data|''Ordering of the data'' section below]].
+
 
+
=== Non-cartesian grids ===
+
 
+
Non-cartesian grids (e.g. cubed-sphere grids) might use generic '''dimension''' names, because the horizontal dimensions for such grids typically do not correspond to longitude or latitude:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="10px"|Dimension
+
!width="500px"|Description
+
|-valign="top"
+
|<tt>time</tt>
+
|Specifies the number of points along the time (<tt>T</tt>) axis.
+
|-valign="top"
+
|<tt>Z</tt>
+
|Specifies the number of points along the vertical (<tt>Z</tt>) axis.
+
|-valign="top"
+
|<tt>Y</tt>
+
|Specifies the number of points along the (<tt>Y</tt>) horizonatal axis.
+
|-valign="top"
+
|<tt>X</tt>
+
|Specifies the number of points along the (<tt>X</tt>) horizontal axis.
+
|-valign="top"
+
|}
+
 
+
The choice of dimension names are typically implementation-dependent.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
== COARDS coordinate vectors ==
+
 
+
'''Coordinate vectors''' (aka index variables or axis variables) are 1-dimensional arrays that define the values along each axis&mdash;the longitudes, latitudes, levels, and times in the file. 
+
 
+
The only COARDS requirement for coordinate vectors are these:
+
 
+
#Each coordinate vector must be given the same name as the dimension that is used to define it.
+
#All of the values contained within a coordinate vector must be either monotonically increasing or monotonically decreasing.
+
 
+
In practice, coordinate vectors in COARDS-compliant netCDF files generally use the same naming convention as for dimensions: <tt>time</tt>, <tt>lev</tt>, <tt>lon</tt>, <tt>lat</tt>.  These are discussed in more detail below.
+
 
+
=== time ===
+
 
+
The <tt>time</tt> coordinate vector will typically have these features:
+
 
+
dimensions:
+
        time = UNLIMITED ; // (12 currently)  # or however many timestamps there are along this dimension
+
variables:
+
        float time(time) ;
+
                time:long_name = "time" ;
+
                time:units = "hours since 1985-01-01 00:00:00" ;
+
                time:calendar = "standard" ;
+
                time:axis = "T";
+
 
+
Here, <tt>time</tt> is a 4-byte floating point (aka <tt>REAL*4</tt>) of dimension <tt>time</tt> = 12 time points.  You can also declare <tt>time</tt> to be an 8-byte floating point array (aka <tt>REAL*8)</tt> if you so choose.
+
 
+
*''NOTE: The name time is typically used regardless of whether the data is placed on a cartesian or non-cartesian grid.''
+
 
+
The COARDS standard also defines several [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html netCDF attributes] for use with the <tt>time</tt> coordinate vector:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>long_name</tt>
+
|REQUIRED
+
|Gives a detailed description of the contents of this array. 
+
*Set this to <tt>Time</tt>.
+
|-valign="top"
+
|<tt>units</tt>
+
|REQUIRED
+
|Specifies the number of hours, minutes, seconds, etc. that has elapsed with respect to a reference time.  Set this to one of the following:
+
*<tt>hours since YYYY-MM-DD 00:00:00</tt>  '''''(RECOMMENDED)'''''
+
*<tt>minutes since YYYY-MM-DD 00:00:00</tt>
+
*<tt>seconds since YYYY-MM-DD 00:00:00</tt>
+
*<tt>days since YYYY-MM-DD 00:00:00</tt>
+
 
+
We recommend that you choose the reference time <tt>YYYY-MM-DD</tt> to correspond with the first time value contained in the file.  This will make the first index of the time coordinate array, <tt>time(0) = 0</tt>.
+
 
+
For more information about the COARDS time standard, [http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html please visit the COARDS web page].
+
|-valign="top"
+
|<tt>calendar</tt>
+
|REQUIRED
+
|Specifies the calendar used to define the time system.
+
*Set this to one of the following values:
+
**<tt>standard</tt>
+
**<tt>gregorian</tt>
+
|-valign="top"
+
|<tt>standard_name</tt>
+
|OPTIONAL
+
|Can be used instead of <tt>long_name</tt>.
+
|-valign="top"
+
|<tt>axis</tt>
+
|OPTIONAL (but recommended)
+
|Identifies which axis (X,Y,Z,T) this array corresponds to.  Many software packages use this attribute to facilitate plotting.
+
*Set this to <tt>T</tt>.
+
|}
+
 
+
=== lev ===
+
 
+
The <tt>lev</tt> coordinate vector will typically have the features shown below.
+
 
+
  dimensions:
+
          lev = 72 ;  # or however many points there along this dimension
+
  variables:
+
          int lev(lev) ;
+
                  lev:long_name = "GEOS-Chem level" ;  # Or whatever the level name is
+
                  lev:units = "level" ;
+
                  lev:positive = "up" ;
+
                  lev:axis = "Z" ;
+
 
+
Here, <tt>lev</tt> is a 4-byte integer of dimension <tt>lev</tt> = 72 time points.  You can also declare <tt>lev</tt> to be an 4-byte (aka <tt>REAL*4</tt>) or 8-byte floating point array (aka <tt>REAL*8)</tt> if you so choose.
+
 
+
*''NOTE: For non-cartesian grids, the variable name "lev" might not be used.  If this is the case, the name of this coordinate variable would have the same name as the name of the dimension used to define it, e.g.'' <code>int Z(Z) ;</code>
+
 
+
The COARDS standard also defines several [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html netCDF attributes] for use with the <tt>time</tt> coordinate vector:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>long_name</tt>
+
|REQUIRED
+
|Gives a detailed description of the contents of this array. 
+
*Set this to one of the following:
+
**<tt>GEOS-Chem levels</tt>
+
**<tt>Eta Centers</tt>,
+
**<tt>Sigma centers</tt>
+
*Your choice will depend on how you have defined the vertical axis.
+
 
+
'''''IMPORTANT!'''''  If you give <tt>long_name</tt> the value of <tt>GEOS-Chem levels</tt>, then HEMCO will be able to vertically regrid the data from one GEOS-Chem grid to another.
+
|-valign="top"
+
|<tt>units</tt>
+
|REQUIRED
+
|Specifies the units of longitude. 
+
*Set this to one of the following:
+
**<tt>level</tt>
+
**<tt>eta_level</tt>
+
**<tt>sigma_level</tt>
+
*Your choice will depend on how you have defined the vertical axis.
+
 
+
'''''IMPORTANT!'''''  If you give <tt>long_name</tt> the value of <tt>level</tt>, then HEMCO will be able to vertically regrid the data from one GEOS-Chem grid to another.
+
|-valign="top"
+
|<tt>standard_name</tt>
+
|OPTIONAL
+
|You may use this instead of <tt>long_name</tt>.
+
|-valign="top"
+
|<tt>axis</tt>
+
|OPTIONAL (but recommended)
+
|Identifies which axis (X,Y,Z,T) this array corresponds to.  Many software packages use this attribute to facilitate plotting.
+
*Set this to <tt>Z</tt>.
+
|-valign="top"
+
|<tt>positive</tt>
+
|OPTIONAL (but recommended)
+
|Specifies in which direction that the vertical levels are indexed.  Many software packages use this attribute to facilitate plotting.
+
*Set this to <tt>up</tt>.  Most data used by HEMCO is indexed from the surface upwards.
+
*NOTE: [[GEOS-Chem HP]] needs this attribute to identify the vertical ordering of the data.
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:29, 12 June 2019 (UTC)
+
 
+
=== lat ===
+
 
+
The <tt>lat</tt> coordinate vector typically has these features:
+
 
+
dimensions:
+
        lat = 181 ;  # or however many points there are in that dimension
+
variables:
+
        float lat(lat) ;
+
                lat:long_name = "Latitude" ;
+
                lat:units = "degrees_north" ;
+
                lat:axis = "Y" ;
+
 
+
Here, <tt>lat</tt> is a 4-byte floating point (aka <tt>REAL*4</tt>) of dimension <tt>lat</tt> = 181 latitudes.  You can also declare <tt>lat</tt> to be an 8-byte floating point array (aka <tt>REAL*8)</tt> if you so choose.
+
 
+
*''NOTE: For non-cartesian grids, the variable name "lat" might not be used.  If this is the case, the name of this coordinate variable would have the same name as the name of the dimension used to define it, e.g.'' <code>float Y(Y) ;</code>
+
 
+
The COARDS standard also defines several [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html netCDF attributes] for use with the <tt>lat</tt> coordinate vector:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>long_name</tt>
+
|REQUIRED
+
|Gives a detailed description of the contents of this array. 
+
*Set this to <tt>Latitude</tt>.
+
|-valign="top"
+
|<tt>units</tt>
+
|REQUIRED
+
|Specifies the units of latitude. 
+
*Set this to <tt>degrees_north</tt>.
+
|-valign="top"
+
|<tt>standard_name</tt>
+
|OPTIONAL
+
|You may use this attribute instead of <tt>long_name</tt>.
+
|-valign="top"
+
|<tt>axis</tt>
+
|OPTIONAL (but recommended)
+
|Identifies which axis (X,Y,Z,T) this array corresponds to.  Many software packages use this attribute to facilitate plotting.
+
*Set this to <tt>Y</tt>.
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
=== lon ===
+
 
+
The <tt>lon</tt> coordinate vector typically has these features:
+
 
+
dimensions:
+
        lon = 360 ;  # or however many points there are along this dimension
+
variables:
+
        float lon(lon) ;
+
                lon:long_name = "Longitude" ;
+
                lon:units = "degrees_east" ;
+
                lon:axis = "X" ;
+
 
+
Here, <tt>lon</tt> is a 4-byte floating point (aka <tt>REAL*4</tt>) of dimension <tt>lon</tt> = 360 longitudes.  You can also declare <tt>lon</tt> to be an 8-byte floating point array (aka <tt>REAL*8)</tt> if you so choose.
+
 
+
*''NOTE: For non-cartesian grids, the variable name "lon" might not be used.  If this is the case, the name of this coordinate variable would have the same name as the name of the dimension used to define it, e.g.'' <code>float X(X) ;</code>
+
 
+
The COARDS standard also defines several [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html netCDF attributes] for use with the <tt>lon</tt> coordinate vector:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>long_name</tt>
+
|REQUIRED
+
|Gives a detailed description of the contents of this array. 
+
*Set this to <tt>Longitude</tt>.
+
|-valign="top"
+
|<tt>units</tt>
+
|REQUIRED
+
|Specifies the units of longitude 
+
*Set this to <tt>degrees_east</tt>.
+
|-valign="top"
+
|<tt>standard_name</tt>
+
|OPTIONAL
+
|You may use this instead of <tt>long_name</tt>.
+
|-valign="top"
+
|<tt>axis</tt>
+
|OPTIONAL (but recommended)
+
|Identifies which axis (X,Y,Z,T) this array corresponds to.  Many software packages use this attribute to facilitate plotting.
+
*Set this to <tt>X</tt>.
+
|}
+
 
+
Longitudes may be represented modulo 360. Thus, for example, -180, 180, and 540 are all valid representations of the International Dateline and 0 and 360 are both valid representations of the Prime Meridian.  Note, however, that the sequence of numerical longitude values stored in the netCDF file must be monotonic in a non-modulo sense.
+
 
+
Practical guidelines:
+
# If your grid begins at the International Dateline (-180&deg;), then place your longitudes into the range -180..180.
+
# If your grid begins at the Prime Meridian (0&deg;), then place your longitudes into the range 0..360. 
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
== COARDS data arrays ==
+
 
+
A COARDS-compliant netCDF file may contain several '''data arrays''', as shown here:
+
 
+
dimensions:
+
        time = UNLIMITED ; // (12 currently)
+
        lev = 72 ;
+
        lat = 181 ;
+
        lon = 360 ;     
+
variables:
+
        float PRPE(time, lev, lat, lon) ;
+
                PRPE:long_name = "Propene" ;
+
                PRPE:units = "kgC/m2/s" ;
+
                PRPE:add_offset = 0.f ;
+
                PRPE:scale_factor = 1.f ;
+
                PRPE:_FillValue = 1.e+15f ;
+
                PRPE:missing_value = 1.e+15f ;
+
                PRPE:gamap_category = "ANTHSRCE" ;
+
        float CO(time, lev, lat, lon) ;
+
                CO:long_name = "CO" ;
+
                CO:units = "kg/m2/s" ;               
+
                CO:add_offset = 0.f ;
+
                CO:scale_factor = 1.f ;
+
                CO:_FillValue = 1.e+15f ;
+
                CO:missing_value = 1.e+15f ;
+
                CO:gamap_category = "ANTHSRCE" ;
+
+
These arrays contain emissions for GEOS-Chem tracers PRPE (lumped < C3 alkenes) and CO. 
+
 
+
=== Attributes ===
+
 
+
Data arrays in COARDS-compliant netCDF typically use these netCDF attributes.
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>long_name</tt>
+
|REQUIRED
+
|Gives a detailed description of the contents of the array.
+
|-valign="top"
+
|<tt>units</tt>
+
|REQUIRED
+
|Specifies the units in the array.  In general, SI units are preferred. 
+
 
+
Special usage for HEMCO:
+
#Emissions fluxes
+
#*For species such as PRPE that are tracked as equivalent carbons, use <tt>kgC/m2/s</tt> or <tt>kgC m-2 s-1</tt>.
+
#*For all other species, use <tt>kg/m2/s</tt> or <tt>kg m-2 s-1</tt>.
+
#Concentration data
+
#*Use <tt>kg/m3</tt> or <tt>kg m-3</tt>
+
#Dimensionless data
+
#*Use <tt>1</tt>. 
+
#*Do not use <tt>unitless</tt>, that is non-standard.  (HEMCO will recognize it, but it is not recommended.)
+
|-valign="top"
+
|<tt>add_offset</tt>
+
|OPTIONAL (but recommended)
+
|Specifies an offset used to store floating-point data as packed integer.  Ignored otherwise.
+
*Set this to <tt>0</tt>.
+
|-valign="top"
+
|<tt>scale_factor</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the scale factor used to store floating-point data as packed integer.  Ignored otherwise.
+
*Set this to <tt>1</tt>. 
+
|-valign="top"
+
|<tt>standard_name</tt>
+
|OPTIONAL (but recommended)
+
|You may use this instead of <tt>long_name</tt>.
+
|-valign="top"
+
|<tt>missing_value</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the value that represents missing data.  This should be set to a number that will not be mistaken for a valid data value.  Typical missing data values are 1e15, +/-1e32, or +/-1e-32.
+
'''''NOTE: The missing_value attribute should not exceed the maximum or minimum allowable value for 4-byte (aka REAL*4) precision (i.e. ~ +/-1e32 or +/-1e-32) .  This should avoid floating point errors in HEMCO caused by type conversion.'''''
+
|-valign="top"
+
|<tt>_FillValue</tt>
+
|OPTIONAL (but recommended)
+
|Synonym for <tt>missing_value</tt>.  It is recommended to set both <tt>missing_value</tt> and <tt>_FillValue</tt> attributes to the same value.  Some data visualization packages look for the <tt>missing_value</tt> attribute, while others look for <tt>_FillValue</tt>.
+
|-valign="top"
+
|<tt>gamap_category</tt>
+
|OPTIONAL
+
|Specifies the GAMAP diagnostic category name.  This makes it easier for the [http://acmg.seas.harvard.edu/gamap/ GAMAP visualization package] to read the file.
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
=== Ordering of the data ===
+
 
+
2D and 3D array variables in netCDF files must have specific dimension order. If the order is incorrect you will encounter netCDF read error "start+count exceeds dimension bound". You can check the dimension ordering of your arrays by using <tt>ncdump</tt> with the -h option, e.g. <tt>ncdump file.nc -h</tt>. Be sure to check the dimensions listed next to the array name rather than the ordering of the dimensions listed at the top of the <tt>ncdump</tt> output.
+
 
+
The following dimension orders are acceptable:
+
 
+
    array(time,lat,lon)
+
    array(time,lat,lon,lev)
+
 
+
The rest of this section explains why the dimension ordering of arrays matters.
+
 
+
When you use <tt>ncdump</tt> utility to examine the contents of a netCDF file, you will notice that it displays the dimensions of the data in the opposite order with respect to Fortran.  In our sample file, <tt>ncdump</tt> says that the CO and PRPE arrays have these dimensions:
+
 
+
    CO(time,lev,lat,lon)
+
    PRPE(time,lev,lat,lon)
+
 
+
But if you tried to read this netCDF file into GEOS-Chem (or any other program written in Fortran), you must use data arrays that have these dimensions:
+
 
+
    CO(lon,lat,lev,time)
+
    PRPE(lon,lat,lev,time)
+
 
+
Here's why:
+
 
+
Fortran is a column-major language, which means that arrays are stored in memory by columns first, then by rows. If you have declared an arrays such as:
+
    INTEGER            :: I, J, L, T
+
    INTEGER, PARAMETER :: N_LON  = 360
+
    INTEGER, PARAMETER :: N_LAT  = 181
+
    INTEGER, PARAMETER :: N_LEV  = 72
+
    INTEGER, PARAMTER  :: N_TIME = 12
+
    REAL*4            :: CO  (N_LON,N_LAT,N_LEV,N_TIME)
+
    REAL*4            :: PRPE(N_LON,N_LAT,N_LEV,N_TIME)
+
 
+
then for optimal efficiency, the leftmost dimension (<tt>I</tt>) needs to vary the fastest, and needs to be accessed by the innermost DO-loop. Then the next leftmost dimension (<tt>J</tt>) should be accessed by the next innermost DO-loop, and so on.  Therefore, the proper way to loop over these arrays is:
+
 
+
    DO T = 1, N_TIME
+
    DO L = 1, N_LEV
+
    DO J = 1, N_LAT
+
    DO I = 1, N_LON
+
        CO  (I,J,L,N) = ...
+
        PRPE(I,J,L,N) = ...
+
    ENDDO
+
    ENDDO
+
    ENDDO
+
    ENDDO
+
 
+
Note that the <tt>I</tt> index is varying most often, since it is the innermost DO-loop, then <tt>J</tt>, <tt>L</tt>, and <tt>T</tt>. This is opposite to how a car's odometer reads.
+
 
+
If you loop through an array in this fashion, with leftmost indices varying fastest, then the code minimizes the number of times it has to load subsections of the array into cache memory. In this optimal manner of execution, all of the array elements sitting in the cache memory are read in the proper order before the next array subsection needs to be loaded into the cache. But if you step through array elements in the wrong order, the number of cache loads is proportionally increased. Because it takes a finite amount of time to reload array elements into cache memory, the more times you have to access the cache, the longer it will take the code to execute. This can slow down the code dramatically.
+
 
+
On the other hand, C is a row-major language, which means that arrays are stored by rows first, then by columns.  This means that the outermost do loop (<tt>I</tt>) is varying the fastest.  This is identical to how a car's odometer reads.
+
 
+
If you use a Fortran program to write data to disk, and then try to read that data from disk into a program written in C  (or NCL), then unless you reverse the order of the DO loops, you will be reading the array in the wrong order.  In C you would have to use this ordering scheme (using Fortran-style syntax to illustrate the point):
+
 
+
    DO I = 1, N_LON   
+
    DO J = 1, N_LAT
+
    DO L = 1, N_LEV
+
    DO T = 1, N_TIME   
+
        CO(T,L,J,I)  = ...
+
        PRPE(T,L,J,I) = ...
+
    ENDDO
+
    ENDDO
+
    ENDDO
+
    ENDDO
+
 
+
Because <tt>ncdump</tt> is written in C, the order of the array appears opposite with respect to Fortran.  The same goes for any code written in the NCAR command language (NCL), which is also written in C.
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
== COARDS Global attributes ==
+
 
+
'''Global attributes''' are [https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attributes.html netCDF attributes] that contain information about a netCDF file, as opposed to information about an individual data array.  [[#Examining the contents of a netCDF file|From our example above]], the output from <tt>ncdump</tt> showed that our sample netCDF file has several global attributes:
+
 
+
// global attributes:             
+
                :Title = "COARDS/netCDF file containing X data"
+
                :Contact = "GEOS-Chem Support Team (geos-chem-support@as.harvard.edu)" ;
+
                :References = "www.geos-chem.org; wiki.geos-chem.org" ;
+
                :Conventions = "COARDS" ;
+
                :Filename = "my_sample_data_file.1x1"
+
                :History = "Mon Mar 17 16:18:09 2014 GMT" ;
+
                :ProductionDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
+
                :ModificationDateTime = "File generated on: Mon Mar 17 16:18:09 2014 GMT" ;
+
                :VersionID = "1.2" ;
+
                :Format = "NetCDF-3" ;
+
                :Model = "GEOS5" ;
+
                :Grid = "GEOS_1x1" ;
+
                :Delta_Lon = 1.f ;
+
                :Delta_Lat = 1.f ;
+
                :SpatialCoverage = "global" ;
+
                :NLayers = 72 ;           
+
                :Start_Date = 20050101 ;
+
                :Start_Time = 00:00:00.0 ;
+
                :End_Date = 20051231 ;
+
                :End_Time = 23:59:59.99999 ;
+
             
+
You can add as many global attributes as you wish.  The following are the most commonly used:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Attribute
+
!width="100px"|Type
+
!width="750px"|Description
+
|-valign="top"
+
|<tt>Title</tt>
+
|REQUIRED
+
|Provides a short description of the the file.
+
*If the file was converted from binary punch format by [http://acmg.seas.harvard.edu/gamap/doc/by_alphabet/gamap_b.html#BPCH2COARDS GAMAP routine <tt>BPCH2COARDS</tt>], then <tt>Title</tt> will be set to <tt>COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-17+)</tt>.
+
|-valign="top"
+
|<tt>Contact</tt>
+
|OPTIONAL (but recommended)
+
|Provides contact information about the person(s) who created the netCDF file.
+
|-valign="top"
+
|<tt>References</tt>
+
|OPTIONAL (but recommended)
+
|Provides references (or links to a web or wiki page) for the data contained in the netCDF file.
+
|-valign="top"
+
|<tt>Conventions</tt>
+
|REQUIRED
+
|Indicates if the netCDF file adheres to a standard (e.g. COARDS, CF, etc.)
+
*Set this to <tt>COARDS</tt>.
+
|-valign="top"
+
|<tt>Filename</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the name of the netCDF file.
+
|-valign="top"
+
|<tt>History</tt>
+
|OPTIONAL (but recommended)
+
|Lists the date of file creation, and subsequent dates of modification.
+
*If you use the netCDF operators (NCO) or Climate Data Operators (CDO) to modify the file, the <tt>History</tt> attribute will be modified to display the commands that were used to modify the file.
+
|-valign="top"
+
|<tt>ProductionDateTime</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the date and time on which the file was originally created.
+
|-valign="top"
+
|<tt>ModificationDateTime</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the dates and times on which the file was modified.
+
|-valign="top"
+
|<tt>VersionID</tt>
+
|OPTIONAL (but recommended)
+
|Specifies a version number corresponding to the data in the netCDF file. 
+
*For example, GMAO met field files use this attribute to denote the version number of the GEOS-DAS system (e.g. 5.7.2, 5.13.1) that was used to create the data.
+
|-valign="top"
+
|<tt>Format</tt>
+
|OPTIONAL (but recommended)
+
|Specifies the format of the netCDF file.  Possible options are:
+
*<tt>NetCDF-3</tt>
+
*<tt>NetCDF-4</tt>
+
|-valign="top"
+
|<tt>Model</tt>
+
|OPTIONAL
+
|Specifies the vertical grid (e.g. GEOS-5, MERRA, GEOS-FP) of the GEOS-Chem simulation that was used to generate this data.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*For GMAO met field data, this indicates the version of the (e.g. <tt>GEOS-5</tt>) used to assimilate the data.
+
|-valign="top"
+
|<tt>Delta_Lat</tt>
+
|OPTIONAL
+
|Specifies the spacing between points along the longitude axis.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
|-valign="top"
+
|<tt>Delta_Lon</tt>
+
|OPTIONAL
+
|Specifies the spacing between points along the longitude axis.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
|-valign="top"
+
|<tt>SpatialCoverage</tt>
+
|OPTIONAL
+
|Specifies the horizontal extent of the data.  Possible values are:
+
*<tt>global</tt>
+
*<tt>regional</tt>
+
|-valign="top"
+
|<tt>NLayers</tt>
+
|OPTIONAL
+
|Specifies the number of vertical levels in the grid.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*If the file contains only surface data, then <tt>BPCH2COARDS</tt> sets <tt>NLayers</tt> to 1.
+
*Sometimes you will see this attribute named <tt>Nlayers</tt>.
+
|-valign="top"
+
|<tt>Start_Date</tt>
+
|OPTIONAL
+
|Specifies the starting date of the data in the file.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*You can also manually add this attribute.
+
|-valign="top"
+
|<tt>End_Date</tt>
+
|OPTIONAL
+
|Specifies the ending date of the data in the file.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*You can also manually add this attribute.
+
|-valign="top"
+
|<tt>Start_Time</tt>
+
|OPTIONAL
+
|Specifies the starting date of the data in the file.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*You can also manually add this attribute.
+
*This attribute often has the value of <tt>00:00:00.0</tt>.
+
|-valign="top"
+
|<tt>End_Date</tt>
+
|OPTIONAL
+
|Specifies the ending date of the data in the file.
+
*This attribute is added by GAMAP routine <tt>BPCH2COARDS</tt>.
+
*You can also manually add this attribute.
+
*This attribute often has the value of <tt>23:59:59.9</tt>.
+
|}
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 15:06, 12 June 2019 (UTC)
+
 
+
== Determining if a netCDF file is COARDS-compliant ==
+
 
+
The [[GEOS-Chem Support Team]] has created a script named <tt>isCoards</tt> that will let you easily determine if a netCDF file is COARDS-compliant.  In [[GEOS-Chem v11-01]] and later versions, <tt>isCoards</tt> is included in the <tt>NcdfUtil/perl</tt> subfolder of the GEOS-Chem source code directory.
+
 
+
The <tt>isCoards</tt> will give you detailed output of which elements of a netCDF file are COARDS-compliant and which are not.  Here is an example:
+
 
+
> cd /mnt/gcgrid/data/ExtData/HEMCO/GFED4/v2015-10/2013
+
> isCoards GFED4_3hrfrac_gen.025x025.201301.nc
+
+
===========================================================================
+
Filename: GFED4_3hrfrac_gen.025x025.201301.nc
+
===========================================================================
+
+
The following items adhere to the COARDS standard:
+
---------------------------------------------------------------------------
+
-> time(time)
+
-> time is monotonically increasing
+
-> time:units = "hours since 1985-01-01 00:00:00"
+
-> lon(lon)
+
-> lon is monotonically increasing
+
-> lon:units = "degrees_east"
+
-> lat(lat)
+
-> lat is monotonically increasing
+
-> lat:units = "degrees_north"
+
-> GFED_FRAC3HR(time,lat,lon)
+
-> GFED_FRAC3HR:units = "1"
+
+
The following items DO NOT ADHERE to the COARDS standard:
+
---------------------------------------------------------------------------
+
-> time:calendar is missing
+
-> time:long_name (or time:standard_name) is missing
+
-> lon:long_name (or lon:standard_name) is missing
+
-> lat:long_name (or lat:standard_name) is missing
+
-> GFED_FRAC3HR:long_name (or GFED_FRAC3HR:standard_name) is missing
+
-> The "Conventions" global attribute is missing
+
-> The "History" global attribute is missing
+
-> The "Title" global attribute is missing
+
+
The following optional items are RECOMMENDED:
+
---------------------------------------------------------------------------
+
-> Consider adding time:axis = "T"
+
-> Consider adding lon:axis ="X"
+
-> Consider adding lat:axis = "Y"
+
-> Consider adding GFED_FRAC3HR:_FillValue
+
-> Consider adding GFED_FRAC3HR:missing_value
+
-> Consider adding GFED_FRAC3HR:add_offset
+
-> Consider adding GFED_FRAC3HR:scale_factor
+
-> Consider adding the "Format" global attribute
+
-> Consider adding the "References" global attribute
+
+
For more information how to fix non COARDS-compliant items, see:
+
<nowiki>http://wiki.geos-chem.org/Preparing_data_files_for_use_with_HEMCO</nowiki>
+
 
+
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 16:58, 6 January 2016 (UTC)
+
 
+
 
+
'''''[[Working with netCDF data files|Next]] | [[Installing libraries for GEOS-Chem|Libraries home]] | [[Getting_Started_with_GEOS-Chem|User Manual Home]] | [[Main_Page|GEOS-Chem Main Page]]'''''
+

Latest revision as of 20:28, 4 August 2022

Previous | Next | Guide to netCDF in GEOS-Chem

  1. Introduction to netCDF
  2. Check if netCDF is already installed on your system
  3. Use Spack to install netCDF on your system
  4. The COARDS netCDF conventions for earth science data
  5. Working with netCDF data files
  6. Creating netCDF data files for GEOS-Chem
  7. Other libraries used by GEOS-Chem


This content has been migrated to the Prepare COARDS-compliant netCDF files guide at geos-chem.readthedocs.io.



Previous | Next | Guide to netCDF in GEOS-Chem