Working with netCDF data files

From Geos-chem
Revision as of 21:06, 20 December 2019 by Lizzie Lundgren (Talk | contribs) (Regridding netCDF files)

Jump to: navigation, search

Previous | Next | Guide to netCDF in GEOS-Chem

  1. Introduction to netCDF
  2. Check if netCDF is already installed on your system
  3. Use Spack to install netCDF on your system
  4. The COARDS netCDF conventions for earth science data
  5. Working with netCDF data files
  6. Creating netCDF data files for GEOS-Chem
  7. Other libraries used by GEOS-Chem

On this page we provide some useful information about working with data files in netCDF format.

Useful tools

There are many free and open-source software packages readily available for visualizing and manipulating netCDF files. These tools will reduce the need for the GEOS-Chem user community to rely on IDL (and GAMAP), which can be prohibitively expensive for some user groups. Some recommend tools are listed below.

Name Description
ncdump This command-line tool generates a text representation of netCDF data and can be used to quickly view the variables contained in a netCDF file. The ncdump utility is installed with your netCDF library distribution.
ncview Visualization package for netCDF files. Ncview has limited features, but is great for getting a quick look at the contents of netCDF files..
Panoply Data viewer for netCDF files. This package offers an alternative to ncview. From our experience, Panoply works nicely when installed on the desktop, but is slow to respond in the Linux environment.
nco and cdo Command-line tools for manipulating and analyzing netCDF files. Useful for renaming variables and attributes, and for regridding data.
xarray Python package that lets you read the contents of a netCDF file into a data structure. The data can then be further manipulated or converted to e.g. numpy arrays for further processing.
xbpch Python package that lets you read the contents of a binary punch file into an xarray Dataset object.
Panoply Data viewer for netCDF files. This package offers an alternative to ncview. From our experience, Panoply works nicely when installed on the desktop, but is slow to respond in the Linux environment.
GCPy Python based package for visualizing and analyzing GEOS-Chem output. Currently under development. GCPy will soon be used to produce the plots from GEOS-Chem 1-month and 1-year benchmark output.
GAMAP Data visualzation package written in IDL. Although GAMAP is currently being phased out, GAMAP routine BPCH2COARDS is still useful for converting data stored in the GEOS-Chem "binary punch" format to netCDF format. See our Converting files from binary punch format to netCDF section below:

Some of the tools listed above, such as ncdump and ncview, may come pre-installed on your system. Others may need to be installed or loaded (e.g. via the module load command). Check with your system administrator or IT staff to see what is available on your system.

--Bob Yantosca (talk) 21:49, 29 November 2018 (UTC)

Converting files from binary punch format to netCDF

GEOS-Chem versions prior to GEOS-Chem v10-01 read emissions data from binary punch data format. You will need to convert these data files to COARDS-compliant netCDF for use with HEMCO.

Using Python

Perhaps the simplest way to create a netCDF file from a bpch file is to use the xbpch and xarray Python packages. (If you would like to change the variable names, then you will also need our gcpy package.) This can be done in only a few lines of Python! Please see our example script.

NOTE: The xbpch package might not be able to properly read the ND21 optical depth diagnostics (GAMAP category OD-MAP-$). This might be due to the fact that this diagnostic is only archived up to the top of the stratosphere. We recommend that you use the IDL BPCH2COARDS program to convert this data to netCDF for the time being.

Using IDL

You can use the GAMAP routine BPCH2COARDS to create netCDF files from a GEOS-Chem binary punch file. For example, start IDL and then type this command at the IDL prompt:

IDL> bpch2coards, 'uvalbedo.geos.2x25', '' 

will create the following netCDF files:

Note that BPCH2COARDS will create a new file for each time slice. The %DATE% token in the output file name will be replaced with the year-month-day value for each time stamp. In the above example, the binary punch file uvalbedo.geos.2x25 contains monthly data, therefore BPCH2COARDS will create 12 individual netCDF files.

NOTE: You might sometimes have better luck using the BPCH_SEP routine to split the bpch files into smaller bpch files (e.g. one per month) band then using bpch2coards on the smaller files.

Special note for timeseries data: To use BPCH2COARDS to convert timeseries (e.g. hourly, 3-hourly, etc) data to netCDF format, add the %TIME% token to the netCDF file name. For example:

IDL> bpch2coards, 'timeseries.geos.2x25', ''

This will create one new netCDF file for each timestamp in the bpch file. You can then proceed to Step 2 and Step 3 below to concatenate these files into a single netCDF file.

Once you have created these individual netCDF files, you can then concatenate them together into a single file.

--Bob Yantosca (talk) 14:55, 10 July 2019 (UTC)

Further Edit variable names and attributes

Whether you use Python or IDL to create a netCDF file from a bpch file, you will still need to edit the variable attributes in order to make the file COARDS-compliant. Please see this section below for more information.

--Bob Yantosca (talk) 15:20, 3 December 2018 (UTC)

Examining the contents of a netCDF file

An easy way to examine the contents of a netCDF file is to use this command:

ncdump -cts EMEP.geos.1x1

You will see output similar to this:

netcdf EMEP.geos.1x1 {
        lon = 360 ;
        lat = 181 ;
        time = UNLIMITED ; // (17 currently)
        float lon(lon) ;
                lon:standard_name = "longitude" ;
                lon:long_name = "Longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:_Storage = "chunked" ;
                lon:_ChunkSizes = 360 ;
                lon:_DeflateLevel = 1 ;
        float lat(lat) ;
                lat:standard_name = "latitude" ;
                lat:long_name = "Latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:_Storage = "chunked" ;
                lat:_ChunkSizes = 181 ;
                lat:_DeflateLevel = 1 ;
        double time(time) ;
                time:standard_name = "time" ;
                time:units = "hours since 1985-01-01 00:00:00" ;
                time:calendar = "standard" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 524288 ;
                time:_DeflateLevel = 1 ;
        float PRPE(time, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:gamap_category = "ANTHSRCE" ;
                PRPE:_Storage = "chunked" ;
                PRPE:_ChunkSizes = 1, 181, 360 ;
                PRPE:_DeflateLevel = 1 ;
        float ALK4(time, lat, lon) ;
                ALK4:long_name = "Alkanes(>C4)" ;
                ALK4:units = "kgC/m2/s" ;
                ALK4:gamap_category = "ANTHSRCE" ;
                ALK4:_Storage = "chunked" ;
                ALK4:_ChunkSizes = 1, 181, 360 ;
                ALK4:_DeflateLevel = 1 ;
        ... etc ...
// global attributes:
                :CDI = "Climate Data Interface version 1.5.5 (" ;
                :Conventions = "COARDS" ;
                :history = "Wed Apr 23 17:36:28 2014: cdo mulc,10000\n",                       
                :Title = "COARDS/netCDF file created by BPCH2COARDS (GAMAP v2-03+)" ;
                :Model = "GEOS3" ;
                :Grid = "GEOS_1x1" ;
                :Delta_Lon = 1.f ;
                :Delta_Lat = 1.f ;
                :NLayers = 48 ;
                :Start_Date = 19800101 ;
                :Start_Time = 0 ;
                :End_Date = 19810101 ;
                :End_Time = 0 ;
                :Delta_Time = 240000 ;
                :Temp_Res = "CONSTANT" ;
                :CDO = "Climate Data Operators version 1.5.5 (" ;

 lon = 180.5, 181.5, 182.5 ... etc... ;

 lat = -89.75, -89, -88, -87 ... etc ... ;

 time = "1980-01-01", "1985-01-01", "1986-01-01", "1987-01-01", "1988-01-01", 
    "1989-01-01", "1990-01-01", "1991-01-01", "1992-01-01", "1993-01-01", 
    "1994-01-01", "1995-01-01", "1996-01-01", "1997-01-01", "1998-01-01", 
    "1999-01-01", "2000-01-01" ;

You can also use ncdump to display the data values for a given variable in the netCDF file. This command will display the values in the SpeciesRst_NO variable to the screen:

ncdump -v SpeciesRst_NO GEOSChem_restart.20160701_0000z.nc4 | less

Or you can redirect the output to a file:

ncdump -v SpeciesRst_NO GEOSChem_restart.20160701_0000z.nc4

--Bob Yantosca (talk) 21:06, 29 November 2018 (UTC)

Reading the contents of a netCDF file

Reading data in Python

The easiest way to read a netCDF file is to use the xarray Python package.

#!/usr/bin/env python

# Imports
import xarray as xr    
import numpy as np

# Read a restart file into an xarray Dataset object
ds = xr.open_dataset("GEOSChem.Restart.20160101_0000z.nc4")
# Print the contents of the DataSet

# Print the units of the SpeciesRst_O3 field

# Convert the SpeciesRst_O3 (O3 concentration) to 
# a numpy array so that we can take the sum
O3_values = ds["SpeciesRst_O3"].values

# Take the sum of SpeciesRst_O3
sum_O3 = np.sum(O3_values)
print("Sum of SpeciesRst_O3: {}".format(sum_O3))

... etc ...

This above script will print the following output:

Dimensions:              (lat: 46, lev: 72, lon: 72, time: 1)
  * lon                  (lon) float64 -180.0 -175.0 -170.0 -165.0 -160.0 ...
  * lat                  (lat) float64 -89.0 -86.0 -82.0 -78.0 -74.0 -70.0 ...
  * lev                  (lev) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * time                 (time) datetime64[ns] 2016-07-01
Data variables:
    AREA                 (lat, lon) float64 ...
    SpeciesRst_RCOOH     (time, lev, lat, lon) float32 ...
    SpeciesRst_O2        (time, lev, lat, lon) float32 ...
    ... etc...
    SpeciesRst_O3        (time, lev, lat, lon) float32 ...
    SpeciesRst_NO        (time, lev, lat, lon) float32 ...
    title:        GEOSChem  restart
    history:      Created by routine NC_CREATE (in ncdf_mod.F90)
    format:       NetCDF-4
    conventions:  COARDS
Units of SpeciesRst_O3: mol/mol
Sum of SpeciesRst_O3: 0.40381380915641785

--Bob Yantosca (talk) 15:25, 4 December 2018 (UTC)

Reading data from multiple files in Python

The xarray package will also let you read data from multiple files into a single Dataset object. This is done with the open_mfdataset (open multi-file-dataset) function as shown below:

#!/usr/bin/env python
# Imports
import xarray as xr    
# Create a list of files to open
filelist = ['GEOSChem.SpeciesConc.20160101_0000z.nc4', 'GEOSChem.SpeciesConc_20160201_0000z.nc4', ...]

# Read a restart file into an xarray Dataset object
ds = xr.open_mfdataset(filelist)

--Bob Yantosca (talk) 14:11, 24 April 2019 (UTC)

Determining if a netCDF file is COARDS-compliant

This section has been moved to The COARDS conventions for earth science data wiki page.

--Bob Yantosca (talk) 15:50, 12 June 2019 (UTC)

Edit variable names and attributes

If you have obtained a netCDF file from a data archive (or have created a netCDF file from an data file in bpch format), then you will probably have to further edit certain attributes and variable names in order to make your file COARDS-compliant. You can use the isCoards script mentioned above to determine which elements of your netCDF file need to be edited.

Christoph Keller has provided these useful commands for editing netCDF files. He writes:

NetCDF files must always adhere to the COARDS conventions and some hand-editing may be required before using them in HEMCO. Fortunately, there are a number of excellent software toolkits available to quickly and efficiently manipulate netCDF files, such as nco, cdo, and ncl. Below is a short list of commands that I use almost every day:
Operation Command
Display the header and the coordinate values of a netCDF file, with the time coordinate displayed in human-readable format: ncdump -cts
Compress a netCDF File. This can considerably reduce the file size!

NOTE: Also see our section on compressing and chunking netCDF files!

nccopy -d0 # No compression
nccopy -d1 # Minimum compression (sufficent for most purposes)
nccopy -d5 # Medium compression
nccopy -d9 # Maximum compression
Change variable name from IJ_AVG_S__NO to NO: ncrename -v IJ_AVG_S__NO,NO
Change the time from 1 Jan 1985 to 1 Jan 2000: cdo settime,2000-01-01
Set all missing values to zero: cdo setmisstoc,0
Add/change the long_name attribute of the vertical coordinate (lev) to "GEOS-Chem levels". This will ensure that HEMCO recognizes the vertical levels of the input file as GEOS-Chem model levels. ncatted -a long_name,lev,o,c,"GEOS-Chem levels"
Add/change the axis and positive attributes to the the vertical coordinate (lev) ncatted -a axis,lev,o,c,"Z"
ncatted -a positive,lev,o,c,"up"
Add/change the units attribute of the latitude coordinate (lat) to degrees_north: ncatted -a units,lat,o,c,"degrees_north"
Add/change the references global attribute: ncatted -a references,global,o,c,";"
Remove the references global attribute: ncatted -a references,global,d,,
Add a time dimension to a file with missing time dimension: ncap2 -h -s 'defdim(“time”,1);time[time]=0.0;time@long_name=“time”;time@calendar=“standard”;time@units=“days since 2007-01-01 00:00:00”' -O
Convert the units of the CHLA variable from mg/m3 to kg/m3 (i.e. divide by 1e6): ncap2 -v -s "CHLA=CHLA/1000000.0f"
ncatted -a units,CHLA,o,c,"kg/m3"

Here are some further examples. If you need to apply these commands to more than one file, you can place them into a script.

  # Add history global attribute
  ncatted -a history,global,o,c,"Tue Mar  3 12:18:38 EST 2015"
  # Add contact info
  ncatted -a contact,global,o,c,"GEOS-Chem Support Team ("

  # Add references
  ncatted -a references,global,o,c,";"

  # Update title
  ncatted -a title,global,o,c,"UV albedo data from Hermann & Celarier (1997)"

--Bob Yantosca (talk) 22:02, 29 November 2018 (UTC)

Concatenating netCDF files

There are a couple of ways to concatenate multiple netCDF files into a single netCDF file, as shown in the sections below.

Concatenating with the netCDF operators

You can use the ncrcat commmand of the netCDF Operators (nco) to concatenate the 12 individual files created by BPCH2COARDS into a single netCDF file. Make sure you have exited IDL, and then type the following command at the Unix prompt:

ncrcat -hO uvalbedo.geos.2x25.1985*.nc

You can then discard the uvalbedo.geos.2x25.1985*.nc files that were created directly by BPCH2COARDS.

Concatenating with Python

You can use the xarray Python package to create a single netCDF file from multiple files. Click HERE to view a sample Python script that does this.

--Bob Yantosca (talk) 14:50, 10 July 2019 (UTC)

Regridding netCDF files

The following tools can be used to regrid netCDF data files (such as GEOS-Chem restart files and GEOS-Chem diagnostic files):

Name Description
cdo The Climate Data Operators include tools for regridding netCDF files. For example:
     # Apply distance-weighted regridding:
     cdo remapdis,gridfile

     # Apply bicubic interpolation:
     cdo remapbic,gridfile

For gridfile, you can use the files in that end in .grid. See the Interpolation section in the CDO Guide for more information. See also useful resource.

nco The netCDF Operators also include tools for regridding. See the Regridding section of the NCO User Guide for more information.
xESMF Jiawei Zhuang has created xESMF, universal regridding tool for geospatial data, which is written in Python. It can be used to regrid data not only on cartesian grids, but also on cubed-sphere and unstructured grids. more information, see:
  • NOTE: xESMF only handles horizontal regridding.
xarray 1d "regridding" can be easily done by xarray's built-in interpolation method. It wraps Scipy's interpolation module.
  • NOTE: This can be used for vertical regridding.

--Bob Yantosca (talk) 17:47, 22 July 2019 (UTC)

Issue with CDO remapdis regridding tool

Bram Maasakkers wrote:

I have noticed a problem regridding a 4x5 restart file to 2x2.5 using cdo 1.9.4. When I use:
     cdo remapdis,geos.2x25.grid
The last latitudinal band (-89.5) remains empty and gets filled with the standard missing value of cdo, which is really large. This leads to immediate problems in the methane simulation as enormous concentrations enter the domain from the South Pole. For now I’ve solved this problem by just using bicubic interpolation
     cdo remapbic,geos.2x25.grid

--Bob Yantosca (talk) 20:02, 29 November 2018 (UTC)

Cropping netCDF files

If needed, regrid a coarse netCDF file (such as a GEOS-Chem restart file) can be cropped to a subset of the globe with the nco or cdo utilities (as mentioned above).

For example, cdo has a SELBOX operator for selecting a box by specifying the lat/lon bounds:

cdo sellonlatbox,lon1,lon2,lat1,lat2 infile outfile

See page 44 of the CDO guide for more information.

--Melissa Sulprizio (talk) 19:04, 3 May 2017 (UTC)

Adding a new variable to a netCDF file

You have a couple of options for adding a new variable to a netCDF file (for example, when having to add a new species to an existing GEOS-Chem restart file

(1) You can use cdo and nco to copy the the data from one variable to another variable. For example:

   module load nco
   module load cdo

   # Extract field SpeciesRst_PMN from the original restart file
   cdo selvar,SpeciesRst_PMN NPMN.nc4

   # Rename selected field to SpeciesRst_NPMN
   ncrename -h -v SpeciesRst_PMN,Species_Rst_NPMN NMPN.nc4

   # Append new species to existing restart file
   ncks -h -A -M NMPN.nc4

(2) Sal Farina wrote a simple Python script for adding a new species to a netCDF restart file:

   #!/usr/bin/env python
   import netCDF4 as nc
   import sys
   import os

   for nam in sys.argv[1:]:
       f = nc.Dataset(nam,mode='a')
               o = f['SpeciesRst_OCPI']
               print "SpeciesRst_OCPI not defined"

       soap = f['SpeciesRst_SOAP']
       soap[:] = 0.0
       soap.long_name= 'SOAP species'
       soap.units =  o.units
       soap.add_offset = 0.0
       soap.scale_factor = 1.0
       soap.missing_value = 1.0e30


--Melissa Sulprizio (talk) 23:33, 7 November 2017 (UTC)

Chunking and deflating a netCDF file to improve I/O

We recommend that you chunk the data in your netCDF file. Chunking specifies the order in along which the data will be read from disk. The Unidata web site has a good overview of why chunking a netCDF file matters.

For GEOS-Chem with the high-performance option (aka GCHP), the best file I/O performance occurs when the file is split into one chunk per level (assuming your data has a lev dimension). This allows each individual vertical level of data to be read in parallel,

You can use the nccopy command of the netCDF Operators (NCO) library to do the chunking. For example, say you have a netCDF file called with these dimensions:

        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      

Then you can issue this command to apply the optimal chunking along levels:

nccopy -c lon/360,lat/181,lev/1,time/1 -d1

This will create a new file called that has the proper chunking. We then replace with this temporary file.

You can specify the chunk sizes that will be applied to the variables in the netCDF file with the -c argument to nccopy. To obtain the optimal chunking, the lon chunksize must be identical to the number of values along the longitude dimension (e.g. lon/360 and the lat chunksize must be equal to the number of points in the latitude dimension (e.g. lat/181).

We also recommend that you deflate (i.e. compress) the netCDF data variables at the same time you apply the chunking. Deflating can substantially reduce the file size, especially for emissions data that are only defined over the land but not over the oceans. You can deflate the data in a netCDF file by specifying the -d argumetnt to nccopy. There are 10 possible deflation levels, ranging from 0 (no deflation) to 9 (max deflation). For most purposes, a deflation level of 1 -d1 is sufficient.

The GEOS-Chem Support Team has created a script named that will automatically chunk and compress data for you. You may obtain this script from our NcdfUtilities repository. We also recommend that you copy into a folder that is in your search path (such as ~/bin) so that it will be available to you in whatever directory you are working in.

git clone NcdfUtil
cp NcdfUtil/perl/ ~/bin

To use the script, type:    # Chunk netCDF file 1  # Chunk and compress file using deflate level 1

You can use the ncdump -cts command to view the chunk size and deflation level in the file. After applying the chunking and compression to, you would see output such as this:

        time = UNLIMITED ; // (12 currently)
        lev = 72 ;
        lat = 181 ;
        lon = 360 ;      
        float PRPE(time, lev, lat, lon) ;
                PRPE:long_name = "Propene" ;
                PRPE:units = "kgC/m2/s" ;
                PRPE:add_offset = 0.f ;
                PRPE:scale_factor = 1.f ;
                PRPE:_FillValue = 1.e+15f ;
                PRPE:missing_value = 1.e+15f ;
                PRPE:gamap_category = "ANTHSRCE" ;
                PRPE:_Storage = "chunked" ;
                PRPE:_ChunkSizes = 1, 1, 181, 360 ;
                PRPE:_DeflateLevel = 1 ;
                PRPE:_Endianness = "little" ;
        float CO(time, lev, lat, lon) ;
                CO:long_name = "CO" ;
                CO:units = "kg/m2/s" ;                
                CO:add_offset = 0.f ;
                CO:scale_factor = 1.f ;
                CO:_FillValue = 1.e+15f ;
                CO:missing_value = 1.e+15f ;
                CO:gamap_category = "ANTHSRCE" ;
                CO:_Storage = "chunked" ;
                CO:_ChunkSizes = 1, 1, 181, 360 ;
                CO:_DeflateLevel = 1 ;
                CO:_Endianness = "little" ;

The attributes listed in BLUE, and which begin with an _ character are "hidden" netCDF attributes. They represent file properties instead of user-defined properties (like the long name, units, etc.). The "hidden" attributes can be shown by adding the -s argument to ncdump.

--Bob Yantosca (talk) 15:31, 13 April 2018 (UTC)

Further reading

Please also see the following references for more information:

  1. Vertical coordinates in netCDF files produced by GEOS-Chem
  2. Guide to GEOS-Chem History diagnostics
  3. Guide to HEMCO emissions diagnostics
  4. Guide to visualization and analysis tools for GEOS-Chem

Previous | Next | Guide to netCDF in GEOS-Chem