The NcdfUtilities package

From Geos-chem
Jump to navigation Jump to search

The NcdfUtilities package contains Fortran modules that you can use to write data to and read data from netCDF files. This package is contained within GEOS-Chem (in the NcdfUtil/ folder, but may also be downloaded as a separate standalone package.

List of modules

NcdfUtilities contains the Fortran source code files listed below. These same files are used both in GEOS-Chem, and as a standalone distribution.

Module Description
charpak_mod.F Contains routines from the CHARPAK string and character manipulation package.
julday_mod.F Contains routines used to convert from month/day/year to Astronomical Julian Date and back again.
m_netcdf_io_checks.F90 Contains routines to check if a netCDF file contains a specified variable.
m_netcdf_io_close.F90 Contains routines to close a netCDF file.
m_netcdf_io_create.F90 Contains routines for creating and synchronizing netCDF files.
m_netcdf_io_define.F90 Contains netCDF utility routines to define dimensions, variables, and attributes.
m_netcdf_io_get_dimlen.F90 Contains routines to obtain the length of a given dimension.
m_netcdf_io_handle_err.F90 Contains routines to handle error messages.
m_netcdf_io_open.F90 Contains routines to open a netCDF file.
m_netcdf_io_readattr.F90 Contains netCDF utility routines to read both netCDF global attributes and variable attributes.
m_netcdf_io_read.F90 Contains routines to read variables from a netCDF file.
m_netcdf_io_write.F90 Contains routines to write variables into a netCDF file.
ncdf_mod.F90 Contains routines to read data from and write data to a netCDF file. These routines are convenience wrappers for the routines in the m_netcdf*.F9- modules listed above.

--Bob Yantosca (talk) 19:30, 8 March 2017 (UTC)

NcdfUtilities within GEOS-Chem

The NcdfUtilities code is used by GEOS-Chem "Classic" simulations to perform netCDF file I/O. The source code modules listed above are contained in the NcdfUtil folder of the GEOS-Chem source code directory. They are compiled along with the rest of GEOS-Chem.

NOTE: When using the NcdfUtilities within GEOS-Chem, you must make sure to set the proper environment variables in your .bashrc or .cshrc.

--Bob Yantosca (talk) 19:27, 8 March 2017 (UTC)

NcdfUtilities as a standalone distribution

The following sections describe how you can download and run the NcdfUtilities as a standalone package that you can incorporate into your own Fortran programs.

Setting environment variables for the standalone NcdfUtilities distribution

The NcdfUtilities library requires that you set the following environment variables listed below in your system startup file (e.g. .bashrc or .cshrc).

NOTE: The environment variable names for the standalone NcdfUtilities distribution are different than the ones you need to set if you use NcdfUtilities within GEOS-Chem.

ALSO NOTE: If you load a netCDF library into your Unix environment with the module command, then very often the root path to the netCDF library will be automatically set for you. Then you can use this to define the variables listed below. Ask your IT staff for more information.

Variable Description
NETCDF_BIN The bin/ folder of the netCDF installation, where utilities such as nc-config, ncdump, etc. are stored.
NETCDF_INCLUDE The include/ folder of the netCDF installation, where the netcdf.inc and netcdf_mod.F90 are found.
NETCDF_LIB The lib/ or lib64/ folder of the netCDF installation, where the netCDF library files (e.g. libnetcdf.a) are found.

NOTE: In netCDF-4.2 and higher versions, the netCDF Fortran libraries are built from a separate distribution. If on your system, the netCDF-Fortran libraries have been installed into a different folder than the rest of the netCDF libaries, you will also need to set these environment variables in your system startup file:

Variable Description
NETCDF_FORTRAN_BIN The bin/ folder of the netCDF-Fortran installation, where utilities such as nf-config is stored.
NETCDF_FORTRAN_INCLUDE The include/ folder of the netCDF-Fortran installation, where the netcdf.inc and netcdf_mod.F90 are found.
NETCDF_FORTRAN_LIB The lib/ or lib64/ folder of the netCDF-Fortran installation, where the netCDF-Fortran library files (e.g. libnetcdff.a) are found.

--Bob Yantosca (talk) 18:43, 8 March 2017 (UTC)

Downloading the standalone NcdfUtilities distribution

You can download a copy of the NcdfUtilities standalone package with the Git source code management system. The master NcdfUtilities code repository is hosted on Bitbucket.org. To download the code, type:

git clone https://bitbucket.org/gcst/ncdfutilities.git NcdfUtil

This will download the NcdfUtilities into a folder named NcdfUtil in your disk space.

--Bob Yantosca (talk) 18:56, 8 March 2017 (UTC)

Directory structure of the standalone NcdfUtilities distribution

If you download the NcdfUtilities as a standalone package, the root-level directory will contain the following sub-directories:

Sub-directory Description
Code/ The Fortran source code files (*.F *.F90) reside here.
bin/ The TestNcdfUtilities.x executable will be created here.
doc/ The NcdfUtilities documentation will be created here.
lib/ The NetCdfUtilities library file (libNcUtils.a) will be created here.
mod/ Compiled module files (*.mod) will be created here.
perl/ Several perl scripts that can be useful in creating netCDF files are contained here.

System requirements for using NcdfUtilities

  1. In order to use NcdfUtilities, you will first have to check to see if the netCDF library is installed on your system. You may find that there are several netCDF library versions to select from. Or you can ask your IT staff to build you a version.

  2. In order to build the reference documents (described below), you must have the LaTeX utilities (i.e. latex, dvips, dvipdf) installed on your system.

Compiling the standalone NcdfUtilities distribution

The NcdfUtilities/Code directory contains the Fortran source code modules as well as two Makefiles (named Makefile and Makefile_header.mk).

The file Makefile_header.mk is a sub-makefile which is used to define the compilation options for different compilers. At present, the Intel Fortran Compiler, GNU Fortran compiler, and PGI Fortran compiler are supported.

Once you have set the proper environment variables for your system (as described above), you are ready to build the executable. Make sure you are in the Code/ subdirectory and type:

make lib

This should start building the source code and create a library file named libNcUtils.a in the lib/ subdirectory.

If you would like to build the NcdfUtilities for a HPC-environment, then type:

make lib HPC=yes

This will compile the code using the mpif90 compiler wrapper rather than the underlying compilers themselves. This will activate the various MPI settings.

--Bob Yantosca (talk) 18:55, 8 March 2017 (UTC)

Ensuring that the NcdfUtilities code was correctly compiled

Once the libNcUtils.a file has been created in the lib/ subdirectory, you can test to see if the library was created (and can link to) the netCDF library correctly. Type:

make check

This will create an executable file named TestNcdfUtilities.x in the bin subdirectory, and will also execute the file.

If you would like to compile and run the test for HPC environments, then type:

make check HPC=yes

which will use the mpif90 compiler wrapper.

The output of the TestNcdfUtilities.x should look similar to this;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  Testing libNcdfUtilities.a  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
=== Begin netCDF file creation test ===
Writing time  (dim     ) to netCDF file
Writing lev   (dim     ) to netCDF file
Writing lat   (dim     ) to netCDF file
Writing lon   (dim     ) to netCDF file
Writing cdim1 (dim     ) to netCDF file
Writing cdim2 (dim     ) to netCDF file
Testing re-opening of define mode
Writing lon   (1D array) to netCDF file
Writing lat   (1D array) to netCDF file
Writing lev   (1D array) to netCDF file
Writing time  (1D array) to netCDF file
Writing PS    (3D array) to netCDF file
Writing T     (4D array) to netCDF file
Writing DESC  (2D char ) to netCDF file
=== End netCDF file creation test ===
=== Begin netCDF file reading test ===
Reading lon   (dim  )  back from netCDF file...........PASSED
Reading lat   (dim  )  back from netCDF file...........PASSED
Reading lev   (dim  )  back from netCDF file...........PASSED
Reading time  (dim  )  back from netCDF file...........PASSED
Reading cdim1 (dim  )  back from netCDF file...........PASSED
Reading cdim2 (dim  )  back from netCDF file...........PASSED
Reading lon   (array)  back from netCDF file...........PASSED
Reading lat   (array)  back from netCDF file...........PASSED
Reading lev   (array)  back from netCDF file...........PASSED
Reading time  (array)  back from netCDF file...........PASSED
Reading PS             back from netCDF file...........PASSED
Reading PS:units       back from netCDF file...........PASSED
Reading PS:long_name   back from netCDF file...........PASSED
Reading PS:_FillValue  back from netCDF file...........PASSED
Reading PS:valid_range back from netCDF file...........PASSED
Reading T              back from netCDF file...........PASSED
Reading T:units        back from netCDF file...........PASSED
Reading T:long_name    back from netCDF file...........PASSED
Reading T:_FillValue   back from netCDF file...........PASSED
Reading T:valid_range  back from netCDF file...........PASSED
Reading DESC           back from netCDF file...........PASSED
Reading DESC:units     back from netCDF file...........PASSED
Reading DESC:long_name back from netCDF file...........PASSED
Reading title          back from netCDF file...........PASSED
Reading start_date     back from netCDF file...........PASSED
Reading start_time     back from netCDF file...........PASSED
=== End of netCDF file read test! ===

If all of the tests return with "PASSED" then the libNcUtils.a file was created correctly.

Setting up automatic checks for several netCDF library installations

The GEOS-Chem Support Team has created a set of scripts to check several netCDF configurations by using the TestNcdfUtilities.x program. For more information, please see the README file at our NcdfUnitTest code repository on Bitbucket].

--Bob Yantosca (talk) 19:00, 8 March 2017 (UTC)

Generating reference documentation

The NcdfUtilities Fortran source code and Makefiles use the ProTeX automatic documentation system. This enables you to create reference documents in *.pdf and *.ps format from the comments in the subroutine headers.

To build the reference documents, make sure you are in the doc/ subdirectory, then type:

  make doc

This will create the following documents in the doc/ subdirectory:

  NcdfUtilities.pdf               
  NcdfUtilities.ps
  NcdfUtilities.tex

-- Reference document for the NcdfUtilities Fortran code

          in *.pdf, *.ps, and LaTeX formats


  NcdfUtilities_Makefiles.pdf
  NcdfUtilities_Makefiles.ps
  NcdfUtilities_Makefiles.tex

-- Reference document for the NcdfUtilities Makefiles

          in *.pdf, *.ps, and LaTeX formats


The reference documents contain a description of each subroutine and function, the variables that are passed to it as input & output arguments, and the revision history. The Makefile reference document displays the full text of the Makefiles. These documents will come in handy if you need to modify or update the Fortran code or Makefiles.

If you wish to remove the NcdfUtilities reference documentation files, then make sure you are in the doc directory and type:

  make clean

--Bob Yantosca (talk) 18:45, 8 March 2017 (UTC)

Cleaning up

To remove all of the *.o, *.mod and executable file in the Code/ subdirectory only, type:

  make clean

However, if you wish to also remove the contents of the bin/ and lib/ subdirectories (as well as removing the *.ps, *.pdf, and *.txt files from the doc/ subdirectory), then type:

  make realclean

--Bob Yantosca (talk) 18:46, 8 March 2017 (UTC)

Previous issues that have since been resolved

Routine DO_ERR_OUT now returns a non-zero error code

This update was included in v11-02a and approved on 12 May 2017.

Andy Jacobson (NOAA) wrote:

GEOS-Chem shouldn’t exit with status 0 when something goes wrong, but the current NcdfUtil/m_do_err_out.F90 does just that. May I suggest that the existing

       if (err_do_stop) then
          stop 'Code stopped from Do_Err_Out.'
       end if

(which does return 0 to my shell) be replaced. Consider this a needed bandaid for the interim. Also, I believe that the STOP statement behavior is system- and compiler-dependent, so maybe it’s just our ifort that returns 0 as is.

Bob Yantosca wrote:

Thanks for letting us know about the netCDF exit issue. That was in a part of GEOS-Chem that we originally inherited from NASA. I don’t know if you have a very new compiler version, but it could be that the default behavior of STOP was changed recently w/r/t older compiler versions. I've fixed it in both the standalone NcdfUtilities and also in the GEOS-Chem code with this check:

       if (err_do_stop) then
   !-------------------------------------------------------------------
   ! Prior to 3/7/17:
   ! Call the EXIT function with a non-zero error code (bmy, 3/7/17) 
   !       stop "Code stopped from Do_Err_Out."
   !-------------------------------------------------------------------
         WRITE( 6, 100 )
   100   FORMAT( 'Code stopped from DO_ERR_OUT (in module m_do_err_out.F90)' )
         CALL EXIT( 999 )
       end if   

This will for sure return a non-zero error code, which your run script can trap.

--Bob Yantosca (talk) 19:16, 8 March 2017 (UTC)

Enable compression in netCDF-4 output files

This update was included in v11-02a and approved on 12 May 2017.

Chris Holmes wrote:

I noticed that netCDF-4 files created with GEOS-Chem were not using compression, which is one of the major benefits of netCDF-4. I don’t know if that was intentional, but I added this feature.

Along the way I found that the files called netCDF-4 in the HEMCO comments were actually netCDF-3 files with 64bit support (i.e. large file support), so I made the files real netCDF-4 classic model then added compression support.

I have sent a patch to the GCST (applied to v11-01-public-release) that enables these changes. NetCDF3 users should see no effect.

With the lowest level of compression enabled, the restart files are about half of their previous size, so a big benefit.

     Enable compression for netCDF-4 files
     
     NetCDF-4 files created by HEMCO now have lossless compression enabled.
     Uses lowest compression level (deflate_level=1).
     Informal testing and netCDF discussion forums suggest that higher compression
     provides little additional benefit, but slower file writing.
   
     Restart files are about 50% smaller.
     Write time increases about 1 second out of about 5 seconds total.
   
     Non-fatal errors are displayed if compression doesn't work.
     No error is displayed for netCDF-3 files that don't support compression.

The GCST has added an extra check on top of Chris Holmes' update in order to prevent errors if the netCDF library cannot support file compression.

Bob Yantosca replied:

Some netCDF-4 library installations might not have been built with compression enabled. We now first check the include file netcdf.inc (which is in the netCDF include folder) to see if the function nf_def_var_deflate is defined. If it is, then we set a C-preprocessor switch named NC_HAS_COMPRESSION, which will activate the code to compress the netCDF output files. Otherwise the compression code is left disabled. This workaround was necessary in order to avoid compile-time errors.

Tests with the geosfp_4x5_standard simulation show a decrease in file size from approx. 200 MB (uncompressed) to 120 MB (compressed).

We also display a message at the top of the log file indicating if this netCDF library build supports file compression.

--Bob Yantosca (talk) 21:44, 1 March 2017 (UTC)

Improve write speed of netCDF output files

This update was included in GEOS-Chem v11-01 public release

Chris Holmes wrote:

I have found that GEOS-Chem v11-01 requires a very long time to write netCDF restart files at the end of a simulation. On my system it takes over 8 minutes to write a 170MB restart file. I’m doing 1-hour simulations for development and everything else in the simulation (chemistry, transport, emissions, etc.) requires just 2 minutes.

For comparison the trac_avg file, which is bpch format, requires just 1-3 seconds to write 111MB. Clearly the long write times are specific to the netCDF output, not my hardware speed, since the bpch output is fast. I’ve looked through the netCDF output subroutines, but there are so many layers involving multiple libraries that I can’t get an overall view of which steps might be the bottleneck.

[This update] resolves the slow restart write times by minimizing the number of times that an open netcdf file must switch between define and data modes. Only a few lines of code were changed. Most of the changes you will see in the patch are simple indentation changes. I verified that the restart files were bitwise identical before and after my changes.

With these changes the write time for the restart file dropped from 6-8 minutes to 4 seconds on my system. 100X faster!

I don’t exactly know why switching between define and data modes is so slow, but the netcdf library documentation explains that metadata is not written to disk until define mode ends. On our system we have good sustained write speeds for large files, but relatively slow write speeds for many tiny files. I think that’s why we see a big benefit on our system.

--Melissa Sulprizio (talk) 15:03, 23 January 2017 (UTC)