Difference between revisions of "The NcdfUtilities package"
(→List of modules)
(→Enable compression in netCDF-4 output files)
|Line 350:||Line 350:|
=== Enable compression in netCDF-4 output files ===
=== Enable compression in netCDF-4 output files ===
<span style="color:">'''''This update included in [[GEOS-Chem v11-02#v11-02a|v11-02a]] .'''''</span>
'''''[[User:Chris Holmes|Chris Holmes]] wrote:'''''
'''''[[User:Chris Holmes|Chris Holmes]] wrote:'''''
Revision as of 20:10, 22 May 2017
The NcdfUtilities package contains Fortran modules that you can use to write data to and read data from netCDF files. This package is contained within GEOS-Chem (in the NcdfUtil/ folder, but may also be downloaded as a separate standalone package.
- 1 List of modules
- 2 NcdfUtilities within GEOS-Chem
- 3 NcdfUtilities as a standalone distribution
- 3.1 Setting environment variables for the standalone NcdfUtilities distribution
- 3.2 Downloading the standalone NcdfUtilities distribution
- 3.3 Directory structure of the standalone NcdfUtilities distribution
- 3.4 System requirements for using NcdfUtilities
- 3.5 Compiling the standalone NcdfUtilities distribution
- 3.6 Ensuring that the NcdfUtilities code was correctly compiled
- 3.7 Generating reference documentation
- 3.8 Cleaning up
- 4 Previous issues that have since been resolved
List of modules
NcdfUtilities contains the Fortran source code files listed below. These same files are used both in GEOS-Chem, and as a standalone distribution.
|charpak_mod.F||Contains routines from the CHARPAK string and character manipulation package.|
|julday_mod.F||Contains routines used to convert from month/day/year to Astronomical Julian Date and back again.|
|m_netcdf_io_checks.F90||Contains routines to check if a netCDF file contains a specified variable.|
|m_netcdf_io_close.F90||Contains routines to close a netCDF file.|
|m_netcdf_io_create.F90||Contains routines for creating and synchronizing netCDF files.|
|m_netcdf_io_define.F90||Contains netCDF utility routines to define dimensions, variables, and attributes.|
|m_netcdf_io_get_dimlen.F90||Contains routines to obtain the length of a given dimension.|
|m_netcdf_io_handle_err.F90||Contains routines to handle error messages.|
|m_netcdf_io_open.F90||Contains routines to open a netCDF file.|
|m_netcdf_io_readattr.F90||Contains netCDF utility routines to read both netCDF global attributes and variable attributes.|
|m_netcdf_io_read.F90||Contains routines to read variables from a netCDF file.|
|m_netcdf_io_write.F90||Contains routines to write variables into a netCDF file.|
|ncdf_mod.F90||Contains routines to read data from and write data to a netCDF file. These routines are convenience wrappers for the routines in the m_netcdf*.F9- modules listed above.|
NcdfUtilities within GEOS-Chem
The NcdfUtilities code is used by GEOS-Chem "Classic" simulations to perform netCDF file I/O. The source code modules listed above are contained in the NcdfUtil folder of the GEOS-Chem source code directory. They are compiled along with the rest of GEOS-Chem.
NOTE: When using the NcdfUtilities within GEOS-Chem, you must make sure to set the proper environment variables in your .bashrc or .cshrc.
NcdfUtilities as a standalone distribution
The following sections describe how you can download and run the NcdfUtilities as a standalone package that you can incorporate into your own Fortran programs.
Setting environment variables for the standalone NcdfUtilities distribution
The NcdfUtilities library requires that you set the following environment variables listed below in your system startup file (e.g. .bashrc or .cshrc).
NOTE: The environment variable names for the standalone NcdfUtilities distribution are different than the ones you need to set if you use NcdfUtilities within GEOS-Chem.
ALSO NOTE: If you load a netCDF library into your Unix environment with the module command, then very often the root path to the netCDF library will be automatically set for you. Then you can use this to define the variables listed below. Ask your IT staff for more information.
|NETCDF_BIN||The bin/ folder of the netCDF installation, where utilities such as nc-config, ncdump, etc. are stored.|
|NETCDF_INCLUDE||The include/ folder of the netCDF installation, where the netcdf.inc and netcdf_mod.F90 are found.|
|NETCDF_LIB||The lib/ or lib64/ folder of the netCDF installation, where the netCDF library files (e.g. libnetcdf.a) are found.|
NOTE: In netCDF-4.2 and higher versions, the netCDF Fortran libraries are built from a separate distribution. If on your system, the netCDF-Fortran libraries have been installed into a different folder than the rest of the netCDF libaries, you will also need to set these environment variables in your system startup file:
|NETCDF_FORTRAN_BIN||The bin/ folder of the netCDF-Fortran installation, where utilities such as nf-config is stored.|
|NETCDF_FORTRAN_INCLUDE||The include/ folder of the netCDF-Fortran installation, where the netcdf.inc and netcdf_mod.F90 are found.|
|NETCDF_FORTRAN_LIB||The lib/ or lib64/ folder of the netCDF-Fortran installation, where the netCDF-Fortran library files (e.g. libnetcdff.a) are found.|
Downloading the standalone NcdfUtilities distribution
You can download a copy of the NcdfUtilities standalone package with the Git source code management system. The master NcdfUtilities code repository is hosted on Bitbucket.org. To download the code, type:
git clone https://bitbucket.org/gcst/ncdfutilities.git NcdfUtil
This will download the NcdfUtilities into a folder named NcdfUtil in your disk space.
Directory structure of the standalone NcdfUtilities distribution
If you download the NcdfUtilities as a standalone package, the root-level directory will contain the following sub-directories:
|Code/||The Fortran source code files (*.F *.F90) reside here.|
|bin/||The TestNcdfUtilities.x executable will be created here.|
|doc/||The NcdfUtilities documentation will be created here.|
|lib/||The NetCdfUtilities library file (libNcUtils.a) will be created here.|
|mod/||Compiled module files (*.mod) will be created here.|
|perl/||Several perl scripts that can be useful in creating netCDF files are contained here.|
System requirements for using NcdfUtilities
In order to use NcdfUtilities, you will first have to check to see if the netCDF library is installed on your system. You may find that there are several netCDF library versions to select from. Or you can ask your IT staff to build you a version.
In order to build the reference documents (described below), you must have the LaTeX utilities (i.e. latex, dvips, dvipdf) installed on your system.
Compiling the standalone NcdfUtilities distribution
The NcdfUtilities/Code directory contains the Fortran source code modules as well as two Makefiles (named Makefile and Makefile_header.mk).
The file Makefile_header.mk is a sub-makefile which is used to define the compilation options for different compilers. At present, the Intel Fortran Compiler, GNU Fortran compiler, and PGI Fortran compiler are supported.
Once you have set the proper environment variables for your system (as described above), you are ready to build the executable. Make sure you are in the Code/ subdirectory and type:
This should start building the source code and create a library file named libNcUtils.a in the lib/ subdirectory.
If you would like to build the NcdfUtilities for a HPC-environment, then type:
make lib HPC=yes
This will compile the code using the mpif90 compiler wrapper rather than the underlying compilers themselves. This will activate the various MPI settings.
Ensuring that the NcdfUtilities code was correctly compiled
Once the libNcUtils.a file has been created in the lib/ subdirectory, you can test to see if the library was created (and can link to) the netCDF library correctly. Type:
This will create an executable file named TestNcdfUtilities.x in the bin subdirectory, and will also execute the file.
If you would like to compile and run the test for HPC environments, then type:
make check HPC=yes
which will use the mpif90 compiler wrapper.
The output of the TestNcdfUtilities.x should look similar to this;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% Testing libNcdfUtilities.a %%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% === Begin netCDF file creation test === Writing time (dim ) to netCDF file Writing lev (dim ) to netCDF file Writing lat (dim ) to netCDF file Writing lon (dim ) to netCDF file Writing cdim1 (dim ) to netCDF file Writing cdim2 (dim ) to netCDF file Testing re-opening of define mode Writing lon (1D array) to netCDF file Writing lat (1D array) to netCDF file Writing lev (1D array) to netCDF file Writing time (1D array) to netCDF file Writing PS (3D array) to netCDF file Writing T (4D array) to netCDF file Writing DESC (2D char ) to netCDF file === End netCDF file creation test === === Begin netCDF file reading test === Reading lon (dim ) back from netCDF file...........PASSED Reading lat (dim ) back from netCDF file...........PASSED Reading lev (dim ) back from netCDF file...........PASSED Reading time (dim ) back from netCDF file...........PASSED Reading cdim1 (dim ) back from netCDF file...........PASSED Reading cdim2 (dim ) back from netCDF file...........PASSED Reading lon (array) back from netCDF file...........PASSED Reading lat (array) back from netCDF file...........PASSED Reading lev (array) back from netCDF file...........PASSED Reading time (array) back from netCDF file...........PASSED Reading PS back from netCDF file...........PASSED Reading PS:units back from netCDF file...........PASSED Reading PS:long_name back from netCDF file...........PASSED Reading PS:_FillValue back from netCDF file...........PASSED Reading PS:valid_range back from netCDF file...........PASSED Reading T back from netCDF file...........PASSED Reading T:units back from netCDF file...........PASSED Reading T:long_name back from netCDF file...........PASSED Reading T:_FillValue back from netCDF file...........PASSED Reading T:valid_range back from netCDF file...........PASSED Reading DESC back from netCDF file...........PASSED Reading DESC:units back from netCDF file...........PASSED Reading DESC:long_name back from netCDF file...........PASSED Reading title back from netCDF file...........PASSED Reading start_date back from netCDF file...........PASSED Reading start_time back from netCDF file...........PASSED === End of netCDF file read test! ===
If all of the tests return with "PASSED" then the libNcUtils.a file was created correctly.
Setting up automatic checks for several netCDF library installations
The GEOS-Chem Support Team has created a set of scripts to check several netCDF configurations by using the TestNcdfUtilities.x program. For more information, please see the README file at our NcdfUnitTest code repository on Bitbucket].
Generating reference documentation
The NcdfUtilities Fortran source code and Makefiles use the ProTeX automatic documentation system. This enables you to create reference documents in *.pdf and *.ps format from the comments in the subroutine headers.
To build the reference documents, make sure you are in the doc/ subdirectory, then type:
This will create the following documents in the doc/ subdirectory:
NcdfUtilities.pdf NcdfUtilities.ps NcdfUtilities.tex
-- Reference document for the NcdfUtilities Fortran code
in *.pdf, *.ps, and LaTeX formats
NcdfUtilities_Makefiles.pdf NcdfUtilities_Makefiles.ps NcdfUtilities_Makefiles.tex
-- Reference document for the NcdfUtilities Makefiles
in *.pdf, *.ps, and LaTeX formats
The reference documents contain a description of each subroutine and function, the variables that are passed to it as input & output arguments, and the revision history. The Makefile reference document displays the full text of the Makefiles. These documents will come in handy if you need to modify or update the Fortran code or Makefiles.
If you wish to remove the NcdfUtilities reference documentation files, then make sure you are in the doc directory and type:
To remove all of the *.o, *.mod and executable file in the Code/ subdirectory only, type:
However, if you wish to also remove the contents of the bin/ and lib/ subdirectories (as well as removing the *.ps, *.pdf, and *.txt files from the doc/ subdirectory), then type:
Previous issues that have since been resolved
Routine DO_ERR_OUT now returns a non-zero error code
This update will be included in v11-02a.
Andy Jacobson (NOAA) wrote:
GEOS-Chem shouldn’t exit with status 0 when something goes wrong, but the current NcdfUtil/m_do_err_out.F90 does just that. May I suggest that the existing
if (err_do_stop) then stop 'Code stopped from Do_Err_Out.' end if
(which does return 0 to my shell) be replaced. Consider this a needed bandaid for the interim. Also, I believe that the
STOPstatement behavior is system- and compiler-dependent, so maybe it’s just our ifort that returns 0 as is.
Bob Yantosca wrote:
Thanks for letting us know about the netCDF exit issue. That was in a part of GEOS-Chem that we originally inherited from NASA. I don’t know if you have a very new compiler version, but it could be that the default behavior of STOP was changed recently w/r/t older compiler versions. I've fixed it in both the standalone NcdfUtilities and also in the GEOS-Chem code with this check:
if (err_do_stop) then !------------------------------------------------------------------- ! Prior to 3/7/17: ! Call the EXIT function with a non-zero error code (bmy, 3/7/17) ! stop "Code stopped from Do_Err_Out." !------------------------------------------------------------------- WRITE( 6, 100 ) 100 FORMAT( 'Code stopped from DO_ERR_OUT (in module m_do_err_out.F90)' ) CALL EXIT( 999 ) end if
This will for sure return a non-zero error code, which your run script can trap.
Enable compression in netCDF-4 output files
This update was included in v11-02a and approved on 12 May 2017.
Chris Holmes wrote:
I noticed that netCDF-4 files created with GEOS-Chem were not using compression, which is one of the major benefits of netCDF-4. I don’t know if that was intentional, but I added this feature.
Along the way I found that the files called netCDF-4 in the HEMCO comments were actually netCDF-3 files with 64bit support (i.e. large file support), so I made the files real netCDF-4 classic model then added compression support.
I have sent a patch to the GCST (applied to v11-01-public-release) that enables these changes. NetCDF3 users should see no effect.With the lowest level of compression enabled, the restart files are about half of their previous size, so a big benefit.
Enable compression for netCDF-4 files NetCDF-4 files created by HEMCO now have lossless compression enabled. Uses lowest compression level (deflate_level=1). Informal testing and netCDF discussion forums suggest that higher compression provides little additional benefit, but slower file writing. Restart files are about 50% smaller. Write time increases about 1 second out of about 5 seconds total. Non-fatal errors are displayed if compression doesn't work. No error is displayed for netCDF-3 files that don't support compression.
The GCST has added an extra check on top of Chris Holmes' update in order to prevent errors if the netCDF library cannot support file compression.
Bob Yantosca replied:
Some netCDF-4 library installations might not have been built with compression enabled. We now first check the include file netcdf.inc (which is in the netCDF include folder) to see if the function nf_def_var_deflate is defined. If it is, then we set a C-preprocessor switch named NC_HAS_COMPRESSION, which will activate the code to compress the netCDF output files. Otherwise the compression code is left disabled. This workaround was necessary in order to avoid compile-time errors.
Tests with the geosfp_4x5_standard simulation show a decrease in file size from approx. 200 MB (uncompressed) to 120 MB (compressed).We also display a message at the top of the log file indicating if this netCDF library build supports file compression.
Improve write speed of netCDF output files
This update was included in GEOS-Chem v11-01 public release
Chris Holmes wrote:
I have found that GEOS-Chem v11-01 requires a very long time to write netCDF restart files at the end of a simulation. On my system it takes over 8 minutes to write a 170MB restart file. I’m doing 1-hour simulations for development and everything else in the simulation (chemistry, transport, emissions, etc.) requires just 2 minutes.
For comparison the trac_avg file, which is bpch format, requires just 1-3 seconds to write 111MB. Clearly the long write times are specific to the netCDF output, not my hardware speed, since the bpch output is fast. I’ve looked through the netCDF output subroutines, but there are so many layers involving multiple libraries that I can’t get an overall view of which steps might be the bottleneck.
[This update] resolves the slow restart write times by minimizing the number of times that an open netcdf file must switch between define and data modes. Only a few lines of code were changed. Most of the changes you will see in the patch are simple indentation changes. I verified that the restart files were bitwise identical before and after my changes.
With these changes the write time for the restart file dropped from 6-8 minutes to 4 seconds on my system. 100X faster!I don’t exactly know why switching between define and data modes is so slow, but the netcdf library documentation explains that metadata is not written to disk until define mode ends. On our system we have good sustained write speeds for large files, but relatively slow write speeds for many tiny files. I think that’s why we see a big benefit on our system.