The NcdfUtilities package: Difference between revisions

From Geos-chem
Jump to navigation Jump to search
(Created page with "The NcdfUtilities package contains Fortran modules that you can use to write data to and read data from netCDF files. This package is contained within GEOS-Chem (in the <tt>N...")
 
 
(43 intermediate revisions by 2 users not shown)
Line 2: Line 2:


== List of modules ==
== List of modules ==
NcdfUtilities contains the Fortran source code files listed below.  These same files are used both in GEOS-Chem, and as a standalone distribution.
{| border=1 cellspacing=0 cellpadding=5
|-bgcolor="#CCCCCC"
!width="220px"|Module
!width="780px"|Description
|-valign="top"
|<tt>charpak_mod.F</tt>
|Contains routines from the CHARPAK string and character manipulation package.
|-valign="top"
|<tt>julday_mod.F</tt>
|Contains routines used to convert from month/day/year to Astronomical Julian Date and back again.
|-valign="top"
|<tt>m_netcdf_io_checks.F90</tt>
|Contains routines to check if a netCDF file contains a specified variable.
   
|-valign="top"
|<tt>m_netcdf_io_close.F90</tt>
|Contains routines to close a netCDF file.
|-valign="top"
|<tt>m_netcdf_io_create.F90</tt>
|Contains routines for creating and synchronizing netCDF files.
|-valign="top"
|<tt>m_netcdf_io_define.F90</tt>
|Contains netCDF utility routines to define dimensions, variables, and attributes.
|-valign="top"
|<tt>m_netcdf_io_get_dimlen.F90</tt>
|Contains routines to obtain the length of a given dimension.
|-valign="top"
|<tt>m_netcdf_io_handle_err.F90</tt>
|Contains routines to handle error messages.
|-valign="top"
|<tt>m_netcdf_io_open.F90</tt>
|Contains routines to open a netCDF file.
|-valign="top"
|<tt>m_netcdf_io_readattr.F90</tt>
|Contains netCDF utility routines to read both netCDF global attributes and variable attributes.
|-valign="top"
|<tt>m_netcdf_io_read.F90</tt>
|Contains routines to read variables from a netCDF file.
|-valign="top"
|<tt>m_netcdf_io_write.F90</tt>
|Contains routines to write variables into a netCDF file.
|-valign="top"
|<tt>ncdf_mod.F90</tt>
|Contains routines to read data from and write data to a netCDF file.  These routines are convenience wrappers for the routines in the m_netcdf*.F9- modules listed above.
|}
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:30, 8 March 2017 (UTC)


== NcdfUtilities within GEOS-Chem ==
== NcdfUtilities within GEOS-Chem ==


=== Setting environment variables for GEOS-Chem===
The NcdfUtilities code is used by GEOS-Chem "Classic" simulations to perform netCDF file I/O.  The source code modules listed above are contained in the <tt>NcdfUtil</tt> folder of the GEOS-Chem source code directory.  They are compiled along with the rest of GEOS-Chem.


NOTE: When using the NcdfUtilities within GEOS-Chem, you must make sure to set [[Installing_libraries_for_GEOS-Chem#Setting_environment_variables_for_GEOS-Chem|the proper environment variables in your .bashrc or .cshrc]].


=== ===
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:27, 8 March 2017 (UTC)


== NcdfUtilities as a standalone distribution ==
== NcdfUtilities as a standalone distribution ==


The following sections describe how you can download and run the NcdfUtilities as a standalone package that you can incorporate into your own Fortran programs.
=== Setting environment variables for the standalone NcdfUtilities distribution ===


The NcdfUtilities library requires that you set the following environment variables listed below in your system startup file (e.g. <tt>.bashrc</tt> or <tt>.cshrc</tt>). 


The following sections describe
<span style="color:red">'''''NOTE: The environment variable names for the standalone NcdfUtilities distribution are different than the ones you need to set if you use NcdfUtilities within GEOS-Chem.'''''</span>


=== ties Directory Structure:
ALSO NOTE: If you load a netCDF library into your Unix environment with the <tt>module</tt> command, then very often the root path to the netCDF library will be automatically set for you. Then you can use this to define the variables listed below.  Ask your IT staff for more information.
===============================================================================


The root-level directory contains these subdirectories:
{| border=1 cellspacing=0 cellpadding=5
|-bgcolor="#CCCCCC"
!width="175px"|Variable
!width="825px"|Description


Code/ : Subdir containing Fortran source code files
|-valign="top"
bin/ : Subdir where the TestNcdfUtilities.x executable will be sent
|<tt>NETCDF_BIN</tt>
doc/ : Subdir where the NcdfUtilities documentation will be built
|The <tt>bin/</tt> folder of the netCDF installation, where utilities such as <tt>nc-config</tt>, <tt>ncdump</tt>, etc. are stored.
lib/ : Subdir where the library file libNcUtils.a will be built
   
|-valign="top"     
|<tt>NETCDF_INCLUDE</tt>
|The <tt>include/</tt> folder of the netCDF installation, where the <tt>netcdf.inc</tt> and <tt>netcdf_mod.F90</tt> are found.
       
|-valign="top"
|<tt>NETCDF_LIB</tt>
|The <tt>lib/</tt> or <tt>lib64/</tt> folder of the netCDF installation, where the netCDF library files (e.g. <tt>libnetcdf.a</tt>) are found.


NOTE: Each directory also has a "CVS/" subdirectory.  This is for the CVS
|}
version control system, and does not contain any usable files.


NOTE: In netCDF-4.2 and higher versions, the netCDF Fortran libraries are built from a separate distribution.  If on your system, the netCDF-Fortran
libraries have been installed into a different folder than the rest of the netCDF libaries, you will also need to set these environment variables
in your system startup file:


System Requirements:
{| border=1 cellspacing=0 cellpadding=5
===============================================================================
|-bgcolor="#CCCCCC"
!width="175px"|Variable
!width="825px"|Description


(1) In order to use NcdfUtilities, you will first have to make sure that the  
|-valign="top"
netCDF library is installed on your system.  Consult with your local
|<tt>NETCDF_FORTRAN_BIN</tt>
sysadmin as to where these libraries are found.
|The <tt>bin/</tt> folder of the netCDF-Fortran installation, where utilities such as <tt>nf-config</tt> is stored.  


If the netCDF library has not yet been installed on your system, then you (or
|-valign="top"     
your local sysadmin)  will have to install it.  An easy way to do this is to
|<tt>NETCDF_FORTRAN_INCLUDE</tt>
obtain the "Baselibs" package from Bob Yantosca and follow the directions on
|The <tt>include/</tt> folder of the netCDF-Fortran installation, where the <tt>netcdf.inc</tt> and <tt>netcdf_mod.F90</tt> are found.
the GEOS-Chem wiki page:
       
|-valign="top"
|<tt>NETCDF_FORTRAN_LIB</tt>
|The <tt>lib/</tt> or <tt>lib64/</tt> folder of the netCDF-Fortran installation, where the netCDF-Fortran library files (e.g. <tt>libnetcdff.a</tt>) are found.


http://wiki.seas.harvard.edu/geos-chem/index.php/Installing_ESMF_and_other_required_libraries#Baselibs
|}


--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:43, 8 March 2017 (UTC)


(2) In order to build the reference documents (described below), you must
=== Downloading the standalone NcdfUtilities distribution ===
have the LaTeX utilities (i.e. latex, dvips, dvipdf) installed on your system.


Setting environment variables:
You can download a copy of the NcdfUtilities standalone package with the Git source code management system.  The master NcdfUtilities code repository is hosted on Bitbucket.org.  To download the code, type:
===============================================================================


The NcdfUtilities library requires that you set the following environment
git clone <nowiki>https://bitbucket.org/gcst/ncdfutilities.git</nowiki> NcdfUtil
variables in your system startup file (e.g. .bashrc or .cshrc):


(1) NETCDF_BIN            : The "bin/" folder of the netCDF installation,
This will download the NcdfUtilities into a folder named <tt>NcdfUtil</tt> in your disk space.
                            where utilities such as nc-config are stored
 
         
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:56, 8 March 2017 (UTC)
(2) NETCDF_INCLUDE        : The "include" folder of the netCDF installation,
 
                            where the netcdf.inc and netcdf_mod.F90 are found.
=== Directory structure of the standalone NcdfUtilities distribution ===
       
 
If you download the NcdfUtilities as a standalone package, the root-level directory will contain the following sub-directories:
 
{| border=1 cellspacing=0 cellpadding=5
|-bgcolor="#CCCCCC"
!width="175px"|Sub-directory
!width="825px"|Description
 
|-valign="top"
|<tt>Code/</tt> ||The Fortran source code files (<tt>*.F</tt> <tt>*.F90</tt>) reside here.
 
|-valign="top"
|<tt>bin/</tt>  ||The <tt>TestNcdfUtilities.x</tt> executable will be created here.
 
|-valign="top"
|<tt>doc/</tt>  ||The NcdfUtilities documentation will be created here.
 
|-valign="top"
|<tt>lib/</tt>  ||The NetCdfUtilities library file (<tt>libNcUtils.a</tt>) will be created here.


(3) NETCDF_LIB            : The "lib/" or "lib64" folder of the netCDF
|-valign="top"
                            installation, where the netCDF library files
|<tt>mod/</tt>  ||Compiled module files (<tt>*.mod</tt>) will be created here.
                            (ending in *.a) files are found.


NOTE: In netCDF-4.2 and higher versions, the netCDF Fortran libraries are
|-valign="top"
built from a separate distribution. If on your system, the netCDF-Fortran
|<tt>perl/</tt> ||Several perl scripts that can be useful in creating netCDF files are contained here.
libraries have been installed into a different folder than the rest of the
netCDF libaries, you will also need to set these environment variables
in your system startup file:


(4) NETCDF_FORTRAN_BIN    : The "bin/" folder of the netCDF-Fortran
|}
                            installation, where "nf-config" is found.


(5) NETCDF_FORTRAN_INCLUDE : The "include/" folder of the netCDF-Fortran
=== System requirements for using NcdfUtilities ===
                            installation, where "netcdf.inc" is found.


(6) NETCDF_FORTRAN_LIB    : The "lib/" or "lib64/" folder of the
#<p>In order to use NcdfUtilities, you will first have to [[Installing_libraries_for_GEOS-Chem#Check_to_see_if_netCDF_is_already_installed_on_your_systemc|check to see if the netCDF library is installed on your system]].  You may find that there are several netCDF library versions to select from.  Or you can ask your IT staff to build you a version.</p>
                            netCDF-Fortran installation, where the  
#<p>In order to build the reference documents (described below), you must have the LaTeX utilities (i.e. latex, dvips, dvipdf) installed on your system.</p>
                            library files (ending in *.a) are found.


=== Compiling the NcdfUtilities Library ===
=== Compiling the standalone NcdfUtilities distribution ===


The NcdfUtilities/Code directory contains the Fortran source code modules
The <tt>NcdfUtilities/Code</tt> directory contains the Fortran source code modules as well as two Makefiles (named Makefile and Makefile_header.mk).   
as well as two Makefiles (named Makefile and Makefile_header.mk).   


The file "Makefile_header.mk" is a sub-makefile which is used to define the
The file <tt>Makefile_header.mk</tt> is a sub-makefile which is used to define the compilation options for different compilers.  At present, the [[Intel Fortran Compiler]], [[GNU Fortran compiler]], and [[PGI Fortran compiler]] are supported.
compilation options for different compilers.  At present, the ifort, gfortran,
and pgfortran compilers are supported.


Once you have set the proper environment variables for your system (as
Once you have set the proper environment variables for your system (as described above), you are ready to build the executable.  Make sure you are  
described above), you are ready to build the executable.  Make sure you are  
in the Code/ subdirectory and type:
in the Code/ subdirectory and type:


  make lib
make lib
 
This should start building the source code and create a library file named <tt>libNcUtils.a</tt> in the lib/ subdirectory. 
 
If you would like to build the NcdfUtilities for a HPC-environment, then type:
 
make lib HPC=yes
 
This will compile the code using the <tt>mpif90</tt> compiler wrapper rather than the underlying compilers themselves.  This will activate the various MPI settings.
 
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:55, 8 March 2017 (UTC)


This should start building the source code and create a library file
=== Ensuring that the NcdfUtilities code was correctly compiled ===
named "libNcUtils.a" in the lib/ subdirectory. 


=== Testing the NcdfUtilities Library ===
Once the <tt>libNcUtils.a</tt>  file has been created in the lib/ subdirectory, you can test to see if the library was created (and can link to) the netCDF library correctly.  Type:


Once the "libNcUtils.a" file has been created in the lib/ subdirectory, you
  make check
can test to see if the library was created (and can link to) the netCDF
library correctly. Type:


  make check
This will create an executable file named <tt>TestNcdfUtilities.x</tt> in the bin subdirectory, and will also execute the file. 


This will create an executable file named "TestNcdfUtilities.x" in the
If you would like to compile and run the test for HPC environments, then type:
bin subdirectory, and will also execute the file.  If the libNcUtils.a
library was installed correctly you should see the following output:


  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  make check HPC=yes
  %%% Testing libNcdfUtilities.a  %%%
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  === Begin netCDF file creation test ===
  Writing XDim (# lons)  to netCDF file
  Writing YDim (# lats)  to netCDF file
  Writing ZDim (# alts)  to netCDF file
  Writing LON  (1D array) to netCDF file
  Writing LAT  (1D array) to netCDF file
  Writing PLEV (1D array) to netCDF file
  Writing PS  (2D array) to netCDF file
  Writing T    (3D array) to netCDF file
  === End netCDF file creation test ===
  === Begin netCDF file reading test ===
  Reading XDim back from netCDF file...........PASSED
  Reading YDim back read from netCDF...........PASSED
  Reading ZDim back from netCDF file...........PASSED
  Reading LON  back from netCDF file...........PASSED
  Reading LAT  back from netCDF file...........PASSED
  Reading PLEV back from netCDF file...........PASSED
  Reading PS  back from netCDF file...........PASSED
  Reading T    back from netCDF file...........PASSED
  === End of netCDF file read test! ===


If all of the tests return with "PASSED" then the libNcUtils.a file was
which will use the <tt>mpif90</tt> compiler wrapper.
created correctly and you have


The output of the <tt>TestNcdfUtilities.x</tt> should look similar to this;


Building the NcdfUtilities Reference Documentation:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
===============================================================================
%%%  Testing libNcdfUtilities.a  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
=== Begin netCDF file creation test ===
Writing time  (dim    ) to netCDF file
Writing lev  (dim    ) to netCDF file
Writing lat  (dim    ) to netCDF file
Writing lon  (dim    ) to netCDF file
Writing cdim1 (dim    ) to netCDF file
Writing cdim2 (dim    ) to netCDF file
Testing re-opening of define mode
Writing lon  (1D array) to netCDF file
Writing lat  (1D array) to netCDF file
Writing lev  (1D array) to netCDF file
Writing time  (1D array) to netCDF file
Writing PS    (3D array) to netCDF file
Writing T    (4D array) to netCDF file
Writing DESC  (2D char ) to netCDF file
=== End netCDF file creation test ===
=== Begin netCDF file reading test ===
Reading lon  (dim  )  back from netCDF file...........PASSED
Reading lat  (dim  )  back from netCDF file...........PASSED
Reading lev  (dim  )  back from netCDF file...........PASSED
Reading time  (dim  )  back from netCDF file...........PASSED
Reading cdim1 (dim  )  back from netCDF file...........PASSED
Reading cdim2 (dim  )  back from netCDF file...........PASSED
Reading lon  (array)  back from netCDF file...........PASSED
Reading lat  (array)  back from netCDF file...........PASSED
Reading lev  (array)  back from netCDF file...........PASSED
Reading time  (array)  back from netCDF file...........PASSED
Reading PS            back from netCDF file...........PASSED
Reading PS:units      back from netCDF file...........PASSED
Reading PS:long_name  back from netCDF file...........PASSED
Reading PS:_FillValue  back from netCDF file...........PASSED
Reading PS:valid_range back from netCDF file...........PASSED
Reading T              back from netCDF file...........PASSED
Reading T:units        back from netCDF file...........PASSED
Reading T:long_name    back from netCDF file...........PASSED
Reading T:_FillValue  back from netCDF file...........PASSED
Reading T:valid_range  back from netCDF file...........PASSED
Reading DESC          back from netCDF file...........PASSED
Reading DESC:units    back from netCDF file...........PASSED
Reading DESC:long_name back from netCDF file...........PASSED
Reading title          back from netCDF file...........PASSED
Reading start_date    back from netCDF file...........PASSED
Reading start_time    back from netCDF file...........PASSED
=== End of netCDF file read test! ===


The NcdfUtilities Fortran source code and Makefiles use the ProTeX automatic
If all of the tests return with "PASSED" then the <tt>libNcUtils.a</tt> file was created correctly.
documentation system. This enables you to create reference documents in
*.pdf and *.ps format from the comments in the subroutine headers.


To build the reference documents, make sure you are in the doc/ subdirectory,
==== Setting up automatic checks for several netCDF library installations ====
then type:
 
The [[GEOS-Chem Support Team]] has created a set of scripts to check several netCDF configurations by using the <tt>TestNcdfUtilities.x</tt> program.  For more information, please see the README file at our [https://bitbucket.org/gcst/ncdfunittests NcdfUnitTest code repository on Bitbucket]].
 
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:00, 8 March 2017 (UTC)
 
=== Generating reference documentation ===
 
The NcdfUtilities Fortran source code and Makefiles use the ProTeX automatic documentation system.  This enables you to create reference documents in <tt>*.pdf</tt> and <tt>*.ps</tt> format from the comments in the subroutine headers.
 
To build the reference documents, make sure you are in the <tt>doc/</tt> subdirectory, then type:


   make doc
   make doc


This will create the following documents in the doc/ subdirectory:
This will create the following documents in the <tt>doc/</tt> subdirectory:


   NcdfUtilities.pdf               
   NcdfUtilities.pdf               
Line 167: Line 291:




The reference documents contain a description of each subroutine and function,
The reference documents contain a description of each subroutine and function, the variables that are passed to it as input & output arguments, and the revision history.  The Makefile reference document displays the full text of the Makefiles.  These documents will come in handy if you need to modify or update the Fortran code or Makefiles.  
the variables that are passed to it as input & output arguments, and the
revision history.  The Makefile reference document displays the full text of  
the Makefiles.  These documents will come in handy if you need to modify
or update the Fortran code or Makefiles.  


If you wish to remove the NcdfUtilities reference documentation files, then  
If you wish to remove the NcdfUtilities reference documentation files, then make sure you are in the doc directory and type:
make sure you are in the doc directory and type:


   make clean
   make clean


--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:45, 8 March 2017 (UTC)


Cleaning up:
=== Cleaning up ===
===============================================================================


To remove all of the *.o, *.mod and executable file in the Code subdirectory
To remove all of the <tt>*.o</tt>, <tt>*.mod</tt> and executable file in the <tt>Code/</tt> subdirectory only, type:
only, type:


   make clean
   make clean


However, if you wish to also remove the contents of the bin/ and lib/  
However, if you wish to also remove the contents of the <tt>bin/</tt> and <tt>lib/</tt> subdirectories (as well as removing the <tt>*.ps</tt>, <tt>*.pdf</tt>, and <tt>*.txt</tt> files from the <tt>doc/</tt> subdirectory), then type:
subdirectories (as well as removing the *.ps, *.pdf, and *.txt files
from the doc/ subdirectory), then type:


   make realclean
   make realclean


--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 18:46, 8 March 2017 (UTC)
== Previous issues that have since been resolved ==
=== Routine DO_ERR_OUT now returns a non-zero error code ===
<span style="color:green">'''''This update was included in [[GEOS-Chem v11-02#v11-02a|v11-02a]] and approved on 12 May 2017.'''''</span>
'''''Andy Jacobson (NOAA) wrote:'''''
<blockquote>GEOS-Chem shouldn’t exit with status 0 when something goes wrong, but the current <tt>NcdfUtil/m_do_err_out.F90</tt> does just that.  May I suggest that the existing</blockquote>
        if (err_do_stop) then
          stop 'Code stopped from Do_Err_Out.'
        end if
<blockquote>(which does return 0 to my shell) be replaced.  Consider this a needed bandaid for the interim.  Also, I believe that the <code>STOP</code> statement behavior is system- and compiler-dependent, so maybe it’s just our ifort that returns 0 as is.</blockquote>
'''''[[User:bmy|Bob Yantosca]] wrote:'''''
<blockquote>Thanks for letting us know about the netCDF exit issue.  That was in a part of GEOS-Chem that we originally inherited from NASA.  I don’t know if you have a very new compiler version, but it could be that the default behavior of STOP was changed recently w/r/t older compiler versions.
I've fixed it in both the [[#NcdfUtilities as a standalone distribution|standalone NcdfUtilities]] and also in the GEOS-Chem code with this check:</blockquote>
        if (err_do_stop) then
    <span style="color:red">!-------------------------------------------------------------------
    ! Prior to 3/7/17:
    ! Call the EXIT function with a non-zero error code (bmy, 3/7/17)
    !      stop "Code stopped from Do_Err_Out."
    !-------------------------------------------------------------------</span>
          <span style="color:green">WRITE( 6, 100 )
    100  FORMAT( 'Code stopped from DO_ERR_OUT (in module m_do_err_out.F90)' )
          CALL EXIT( 999 )</span>
        end if 
<blockquote>This will for sure return a non-zero error code, which your run script can trap.</blockquote>
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 19:16, 8 March 2017 (UTC)
=== Enable compression in netCDF-4 output files ===
<span style="color:green">'''''This update was included in [[GEOS-Chem v11-02#v11-02a|v11-02a]] and approved on 12 May 2017.'''''</span>
'''''[[User:Chris Holmes|Chris Holmes]] wrote:'''''
<blockquote>I noticed that netCDF-4 files created with GEOS-Chem were not using compression, which is one of the major benefits of netCDF-4. I don’t know if that was intentional, but I added this feature.
Along the way I found that the files called netCDF-4 in the HEMCO comments were actually netCDF-3 files with 64bit support (i.e. large file support), so I made the files real netCDF-4 classic model then added compression support.
I have sent a patch to the [[GCST]] (applied to v11-01-public-release) that enables these changes.  NetCDF3 users should see no effect.
With the lowest level of compression enabled, the restart files are about half of their previous size, so a big benefit. </blockquote>
      Enable compression for netCDF-4 files
     
      NetCDF-4 files created by HEMCO now have lossless compression enabled.
      Uses lowest compression level (deflate_level=1).
      Informal testing and netCDF discussion forums suggest that higher compression
      provides little additional benefit, but slower file writing.
   
      Restart files are about 50% smaller.
      Write time increases about 1 second out of about 5 seconds total.
   
      Non-fatal errors are displayed if compression doesn't work.
      No error is displayed for netCDF-3 files that don't support compression.
The [[GCST]] has added an extra check on top of Chris Holmes' update in order to prevent errors if the netCDF library cannot support file compression. 
'''''[[User:Bmy|Bob Yantosca]] replied:'''''
<blockquote>Some netCDF-4 library installations might not have been built with compression enabled.  We now first check the include file <tt>netcdf.inc</tt> (which is in the netCDF include folder) to see if the function <tt>nf_def_var_deflate</tt> is defined.  If it is, then we set a C-preprocessor switch named <tt>NC_HAS_COMPRESSION</tt>, which will activate the code to compress the netCDF output files.  Otherwise the compression code is left disabled.  This workaround was necessary in order to avoid compile-time errors.
   
Tests with the <tt>geosfp_4x5_standard</tt> simulation show a decrease in file size from approx. 200 MB (uncompressed) to 120 MB (compressed).
   
We also display a message at the top of the log file indicating if this netCDF library build supports file compression.</blockquote>
--[[User:Bmy|Bob Yantosca]] ([[User talk:Bmy|talk]]) 21:44, 1 March 2017 (UTC)
=== Improve write speed of netCDF output files ===
<span style="color:green">'''''This update was included in [[GEOS-Chem v11-01#v11-01 public release|GEOS-Chem v11-01 public release]]'''''</span>
'''''[[User:Chris Holmes|Chris Holmes]] wrote:'''''
<blockquote>I have found that [[GEOS-Chem v11-01]] requires a very long time to write netCDF restart files at the end of a simulation. On my system it takes over 8 minutes to write a 170MB restart file. I’m doing 1-hour simulations for development and everything else in the simulation (chemistry, transport, emissions, etc.) requires just 2 minutes.
For comparison the trac_avg file, which is bpch format, requires just 1-3 seconds to write 111MB. Clearly the long write times are specific to the netCDF output, not my hardware speed, since the bpch output is fast. I’ve looked through the netCDF output subroutines, but there are so many layers involving multiple libraries that I can’t get an overall view of which steps might be the bottleneck.


That's itGood luck and contact me if you have any questions:
[This update] resolves the slow restart write times by minimizing the number of times that an open netcdf file must switch between define and data modes. Only a few lines of code were changed. Most of the changes you will see in the patch are simple indentation changes. I verified that the restart files were bitwise identical before and after my changes.
With these changes the write time for the restart file dropped from 6-8 minutes to 4 seconds on my system. 100X faster!
   
I don’t exactly know why switching between define and data modes is so slow, but the netcdf library documentation explains that metadata is not written to disk until define mode ends. On our system we have good sustained write speeds for large files, but relatively slow write speeds for many tiny files. I think that’s why we see a big benefit on our system.</blockquote>


Bob Yantosca
--[[User:Melissa Payer|Melissa Sulprizio]] ([[User talk:Melissa Payer|talk]]) 15:03, 23 January 2017 (UTC)
yantosca@seas.harvard.edu

Latest revision as of 20:10, 22 May 2017

The NcdfUtilities package contains Fortran modules that you can use to write data to and read data from netCDF files. This package is contained within GEOS-Chem (in the NcdfUtil/ folder, but may also be downloaded as a separate standalone package.

List of modules

NcdfUtilities contains the Fortran source code files listed below. These same files are used both in GEOS-Chem, and as a standalone distribution.

Module Description
charpak_mod.F Contains routines from the CHARPAK string and character manipulation package.
julday_mod.F Contains routines used to convert from month/day/year to Astronomical Julian Date and back again.
m_netcdf_io_checks.F90 Contains routines to check if a netCDF file contains a specified variable.
m_netcdf_io_close.F90 Contains routines to close a netCDF file.
m_netcdf_io_create.F90 Contains routines for creating and synchronizing netCDF files.
m_netcdf_io_define.F90 Contains netCDF utility routines to define dimensions, variables, and attributes.
m_netcdf_io_get_dimlen.F90 Contains routines to obtain the length of a given dimension.
m_netcdf_io_handle_err.F90 Contains routines to handle error messages.
m_netcdf_io_open.F90 Contains routines to open a netCDF file.
m_netcdf_io_readattr.F90 Contains netCDF utility routines to read both netCDF global attributes and variable attributes.
m_netcdf_io_read.F90 Contains routines to read variables from a netCDF file.
m_netcdf_io_write.F90 Contains routines to write variables into a netCDF file.
ncdf_mod.F90 Contains routines to read data from and write data to a netCDF file. These routines are convenience wrappers for the routines in the m_netcdf*.F9- modules listed above.

--Bob Yantosca (talk) 19:30, 8 March 2017 (UTC)

NcdfUtilities within GEOS-Chem

The NcdfUtilities code is used by GEOS-Chem "Classic" simulations to perform netCDF file I/O. The source code modules listed above are contained in the NcdfUtil folder of the GEOS-Chem source code directory. They are compiled along with the rest of GEOS-Chem.

NOTE: When using the NcdfUtilities within GEOS-Chem, you must make sure to set the proper environment variables in your .bashrc or .cshrc.

--Bob Yantosca (talk) 19:27, 8 March 2017 (UTC)

NcdfUtilities as a standalone distribution

The following sections describe how you can download and run the NcdfUtilities as a standalone package that you can incorporate into your own Fortran programs.

Setting environment variables for the standalone NcdfUtilities distribution

The NcdfUtilities library requires that you set the following environment variables listed below in your system startup file (e.g. .bashrc or .cshrc).

NOTE: The environment variable names for the standalone NcdfUtilities distribution are different than the ones you need to set if you use NcdfUtilities within GEOS-Chem.

ALSO NOTE: If you load a netCDF library into your Unix environment with the module command, then very often the root path to the netCDF library will be automatically set for you. Then you can use this to define the variables listed below. Ask your IT staff for more information.

Variable Description
NETCDF_BIN The bin/ folder of the netCDF installation, where utilities such as nc-config, ncdump, etc. are stored.
NETCDF_INCLUDE The include/ folder of the netCDF installation, where the netcdf.inc and netcdf_mod.F90 are found.
NETCDF_LIB The lib/ or lib64/ folder of the netCDF installation, where the netCDF library files (e.g. libnetcdf.a) are found.

NOTE: In netCDF-4.2 and higher versions, the netCDF Fortran libraries are built from a separate distribution. If on your system, the netCDF-Fortran libraries have been installed into a different folder than the rest of the netCDF libaries, you will also need to set these environment variables in your system startup file:

Variable Description
NETCDF_FORTRAN_BIN The bin/ folder of the netCDF-Fortran installation, where utilities such as nf-config is stored.
NETCDF_FORTRAN_INCLUDE The include/ folder of the netCDF-Fortran installation, where the netcdf.inc and netcdf_mod.F90 are found.
NETCDF_FORTRAN_LIB The lib/ or lib64/ folder of the netCDF-Fortran installation, where the netCDF-Fortran library files (e.g. libnetcdff.a) are found.

--Bob Yantosca (talk) 18:43, 8 March 2017 (UTC)

Downloading the standalone NcdfUtilities distribution

You can download a copy of the NcdfUtilities standalone package with the Git source code management system. The master NcdfUtilities code repository is hosted on Bitbucket.org. To download the code, type:

git clone https://bitbucket.org/gcst/ncdfutilities.git NcdfUtil

This will download the NcdfUtilities into a folder named NcdfUtil in your disk space.

--Bob Yantosca (talk) 18:56, 8 March 2017 (UTC)

Directory structure of the standalone NcdfUtilities distribution

If you download the NcdfUtilities as a standalone package, the root-level directory will contain the following sub-directories:

Sub-directory Description
Code/ The Fortran source code files (*.F *.F90) reside here.
bin/ The TestNcdfUtilities.x executable will be created here.
doc/ The NcdfUtilities documentation will be created here.
lib/ The NetCdfUtilities library file (libNcUtils.a) will be created here.
mod/ Compiled module files (*.mod) will be created here.
perl/ Several perl scripts that can be useful in creating netCDF files are contained here.

System requirements for using NcdfUtilities

  1. In order to use NcdfUtilities, you will first have to check to see if the netCDF library is installed on your system. You may find that there are several netCDF library versions to select from. Or you can ask your IT staff to build you a version.

  2. In order to build the reference documents (described below), you must have the LaTeX utilities (i.e. latex, dvips, dvipdf) installed on your system.

Compiling the standalone NcdfUtilities distribution

The NcdfUtilities/Code directory contains the Fortran source code modules as well as two Makefiles (named Makefile and Makefile_header.mk).

The file Makefile_header.mk is a sub-makefile which is used to define the compilation options for different compilers. At present, the Intel Fortran Compiler, GNU Fortran compiler, and PGI Fortran compiler are supported.

Once you have set the proper environment variables for your system (as described above), you are ready to build the executable. Make sure you are in the Code/ subdirectory and type:

make lib

This should start building the source code and create a library file named libNcUtils.a in the lib/ subdirectory.

If you would like to build the NcdfUtilities for a HPC-environment, then type:

make lib HPC=yes

This will compile the code using the mpif90 compiler wrapper rather than the underlying compilers themselves. This will activate the various MPI settings.

--Bob Yantosca (talk) 18:55, 8 March 2017 (UTC)

Ensuring that the NcdfUtilities code was correctly compiled

Once the libNcUtils.a file has been created in the lib/ subdirectory, you can test to see if the library was created (and can link to) the netCDF library correctly. Type:

make check

This will create an executable file named TestNcdfUtilities.x in the bin subdirectory, and will also execute the file.

If you would like to compile and run the test for HPC environments, then type:

make check HPC=yes

which will use the mpif90 compiler wrapper.

The output of the TestNcdfUtilities.x should look similar to this;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  Testing libNcdfUtilities.a  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
=== Begin netCDF file creation test ===
Writing time  (dim     ) to netCDF file
Writing lev   (dim     ) to netCDF file
Writing lat   (dim     ) to netCDF file
Writing lon   (dim     ) to netCDF file
Writing cdim1 (dim     ) to netCDF file
Writing cdim2 (dim     ) to netCDF file
Testing re-opening of define mode
Writing lon   (1D array) to netCDF file
Writing lat   (1D array) to netCDF file
Writing lev   (1D array) to netCDF file
Writing time  (1D array) to netCDF file
Writing PS    (3D array) to netCDF file
Writing T     (4D array) to netCDF file
Writing DESC  (2D char ) to netCDF file
=== End netCDF file creation test ===
=== Begin netCDF file reading test ===
Reading lon   (dim  )  back from netCDF file...........PASSED
Reading lat   (dim  )  back from netCDF file...........PASSED
Reading lev   (dim  )  back from netCDF file...........PASSED
Reading time  (dim  )  back from netCDF file...........PASSED
Reading cdim1 (dim  )  back from netCDF file...........PASSED
Reading cdim2 (dim  )  back from netCDF file...........PASSED
Reading lon   (array)  back from netCDF file...........PASSED
Reading lat   (array)  back from netCDF file...........PASSED
Reading lev   (array)  back from netCDF file...........PASSED
Reading time  (array)  back from netCDF file...........PASSED
Reading PS             back from netCDF file...........PASSED
Reading PS:units       back from netCDF file...........PASSED
Reading PS:long_name   back from netCDF file...........PASSED
Reading PS:_FillValue  back from netCDF file...........PASSED
Reading PS:valid_range back from netCDF file...........PASSED
Reading T              back from netCDF file...........PASSED
Reading T:units        back from netCDF file...........PASSED
Reading T:long_name    back from netCDF file...........PASSED
Reading T:_FillValue   back from netCDF file...........PASSED
Reading T:valid_range  back from netCDF file...........PASSED
Reading DESC           back from netCDF file...........PASSED
Reading DESC:units     back from netCDF file...........PASSED
Reading DESC:long_name back from netCDF file...........PASSED
Reading title          back from netCDF file...........PASSED
Reading start_date     back from netCDF file...........PASSED
Reading start_time     back from netCDF file...........PASSED
=== End of netCDF file read test! ===

If all of the tests return with "PASSED" then the libNcUtils.a file was created correctly.

Setting up automatic checks for several netCDF library installations

The GEOS-Chem Support Team has created a set of scripts to check several netCDF configurations by using the TestNcdfUtilities.x program. For more information, please see the README file at our NcdfUnitTest code repository on Bitbucket].

--Bob Yantosca (talk) 19:00, 8 March 2017 (UTC)

Generating reference documentation

The NcdfUtilities Fortran source code and Makefiles use the ProTeX automatic documentation system. This enables you to create reference documents in *.pdf and *.ps format from the comments in the subroutine headers.

To build the reference documents, make sure you are in the doc/ subdirectory, then type:

  make doc

This will create the following documents in the doc/ subdirectory:

  NcdfUtilities.pdf               
  NcdfUtilities.ps
  NcdfUtilities.tex

-- Reference document for the NcdfUtilities Fortran code

          in *.pdf, *.ps, and LaTeX formats


  NcdfUtilities_Makefiles.pdf
  NcdfUtilities_Makefiles.ps
  NcdfUtilities_Makefiles.tex

-- Reference document for the NcdfUtilities Makefiles

          in *.pdf, *.ps, and LaTeX formats


The reference documents contain a description of each subroutine and function, the variables that are passed to it as input & output arguments, and the revision history. The Makefile reference document displays the full text of the Makefiles. These documents will come in handy if you need to modify or update the Fortran code or Makefiles.

If you wish to remove the NcdfUtilities reference documentation files, then make sure you are in the doc directory and type:

  make clean

--Bob Yantosca (talk) 18:45, 8 March 2017 (UTC)

Cleaning up

To remove all of the *.o, *.mod and executable file in the Code/ subdirectory only, type:

  make clean

However, if you wish to also remove the contents of the bin/ and lib/ subdirectories (as well as removing the *.ps, *.pdf, and *.txt files from the doc/ subdirectory), then type:

  make realclean

--Bob Yantosca (talk) 18:46, 8 March 2017 (UTC)

Previous issues that have since been resolved

Routine DO_ERR_OUT now returns a non-zero error code

This update was included in v11-02a and approved on 12 May 2017.

Andy Jacobson (NOAA) wrote:

GEOS-Chem shouldn’t exit with status 0 when something goes wrong, but the current NcdfUtil/m_do_err_out.F90 does just that. May I suggest that the existing

       if (err_do_stop) then
          stop 'Code stopped from Do_Err_Out.'
       end if

(which does return 0 to my shell) be replaced. Consider this a needed bandaid for the interim. Also, I believe that the STOP statement behavior is system- and compiler-dependent, so maybe it’s just our ifort that returns 0 as is.

Bob Yantosca wrote:

Thanks for letting us know about the netCDF exit issue. That was in a part of GEOS-Chem that we originally inherited from NASA. I don’t know if you have a very new compiler version, but it could be that the default behavior of STOP was changed recently w/r/t older compiler versions. I've fixed it in both the standalone NcdfUtilities and also in the GEOS-Chem code with this check:

       if (err_do_stop) then
   !-------------------------------------------------------------------
   ! Prior to 3/7/17:
   ! Call the EXIT function with a non-zero error code (bmy, 3/7/17) 
   !       stop "Code stopped from Do_Err_Out."
   !-------------------------------------------------------------------
         WRITE( 6, 100 )
   100   FORMAT( 'Code stopped from DO_ERR_OUT (in module m_do_err_out.F90)' )
         CALL EXIT( 999 )
       end if   

This will for sure return a non-zero error code, which your run script can trap.

--Bob Yantosca (talk) 19:16, 8 March 2017 (UTC)

Enable compression in netCDF-4 output files

This update was included in v11-02a and approved on 12 May 2017.

Chris Holmes wrote:

I noticed that netCDF-4 files created with GEOS-Chem were not using compression, which is one of the major benefits of netCDF-4. I don’t know if that was intentional, but I added this feature.

Along the way I found that the files called netCDF-4 in the HEMCO comments were actually netCDF-3 files with 64bit support (i.e. large file support), so I made the files real netCDF-4 classic model then added compression support.

I have sent a patch to the GCST (applied to v11-01-public-release) that enables these changes. NetCDF3 users should see no effect.

With the lowest level of compression enabled, the restart files are about half of their previous size, so a big benefit.

     Enable compression for netCDF-4 files
     
     NetCDF-4 files created by HEMCO now have lossless compression enabled.
     Uses lowest compression level (deflate_level=1).
     Informal testing and netCDF discussion forums suggest that higher compression
     provides little additional benefit, but slower file writing.
   
     Restart files are about 50% smaller.
     Write time increases about 1 second out of about 5 seconds total.
   
     Non-fatal errors are displayed if compression doesn't work.
     No error is displayed for netCDF-3 files that don't support compression.

The GCST has added an extra check on top of Chris Holmes' update in order to prevent errors if the netCDF library cannot support file compression.

Bob Yantosca replied:

Some netCDF-4 library installations might not have been built with compression enabled. We now first check the include file netcdf.inc (which is in the netCDF include folder) to see if the function nf_def_var_deflate is defined. If it is, then we set a C-preprocessor switch named NC_HAS_COMPRESSION, which will activate the code to compress the netCDF output files. Otherwise the compression code is left disabled. This workaround was necessary in order to avoid compile-time errors.

Tests with the geosfp_4x5_standard simulation show a decrease in file size from approx. 200 MB (uncompressed) to 120 MB (compressed).

We also display a message at the top of the log file indicating if this netCDF library build supports file compression.

--Bob Yantosca (talk) 21:44, 1 March 2017 (UTC)

Improve write speed of netCDF output files

This update was included in GEOS-Chem v11-01 public release

Chris Holmes wrote:

I have found that GEOS-Chem v11-01 requires a very long time to write netCDF restart files at the end of a simulation. On my system it takes over 8 minutes to write a 170MB restart file. I’m doing 1-hour simulations for development and everything else in the simulation (chemistry, transport, emissions, etc.) requires just 2 minutes.

For comparison the trac_avg file, which is bpch format, requires just 1-3 seconds to write 111MB. Clearly the long write times are specific to the netCDF output, not my hardware speed, since the bpch output is fast. I’ve looked through the netCDF output subroutines, but there are so many layers involving multiple libraries that I can’t get an overall view of which steps might be the bottleneck.

[This update] resolves the slow restart write times by minimizing the number of times that an open netcdf file must switch between define and data modes. Only a few lines of code were changed. Most of the changes you will see in the patch are simple indentation changes. I verified that the restart files were bitwise identical before and after my changes.

With these changes the write time for the restart file dropped from 6-8 minutes to 4 seconds on my system. 100X faster!

I don’t exactly know why switching between define and data modes is so slow, but the netcdf library documentation explains that metadata is not written to disk until define mode ends. On our system we have good sustained write speeds for large files, but relatively slow write speeds for many tiny files. I think that’s why we see a big benefit on our system.

--Melissa Sulprizio (talk) 15:03, 23 January 2017 (UTC)