Installing libraries for GEOS-Chem

From Geos-chem
Jump to: navigation, search

On this page we provide instructions on how to install netCDF-4 and related libraries for GEOS-Chem.

A brief introduction to netCDF

GEOS-Chem reads and writes data using the netCDF file format. NetCDF is a self-describing file format that can store data fields as well as the relevant "metadata", or information about the contents of the file. Types of metadata include descriptive names, units, horizontal and vertical, coordinates, file creation date/time, file history, etc.

The netCDF frequently asked questions (FAQ) guide gives this short overview of netCDF:

NetCDF (network Common Data Form) is a set of interfaces for array-oriented data access and a freely distributed collection of data access libraries for C, Fortran, C++, Java, and other languages. The netCDF libraries support a machine-independent format for representing scientific data. Together, the interfaces, libraries, and format support the creation, access, and sharing of scientific data.

NetCDF data is:

  • Self-Describing. A netCDF file includes information about the data it contains.
  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
  • Scalable. A small subset of a large dataset may be accessed efficiently.
  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.

There are two commonly-used major versions of netCDF in use today:

  1. netCDF-3, aka "netCDF classic".
  2. netCDF-4

The major difference between the two versions is that netCDF-4 relies on the HDF5 library "under the hood" whereas netCDF-3 does not. For this reason, netCDF-4 can be used to store more data per file than netCDF-3.

A netCDF installation contains library files (ending in .a) , which hold compiled utility routines meant to be called from programs written in C or Fortran. In netCDF-4.1 and prior versions, the C-language library file (libnetcdf.a) and the Fortran-language library file (libnetcdff.a) were always installed into the same folder by default. But starting with netCDF-4.2, the netCDF Fortran libraries now must be built from a separate distribution package. Because of this new configuration, you might find that the libnetcdff.a (Fortran) and libnetcdf.a (C) library files are stored in separate folders on your system. Ask your IT staff for more information about how netCDF is installed on your system. See this section below for more information.

Check to see if netCDF is already installed on your system

If you are going to use GEOS-Chem on a shared computer system, chances are that your IT staff will have already installed one or more netCDF library versions that you can use. Depending on your system's setup, there are several ways that you can tell your computational environment where to find the netCDF library files, as described below.

Using modules

Many high-performance computing (HPC) clusters use the Lmod module software. Lmod allows you to load different compilers and libraries with simple module load commands. For example, on the Harvard Odyssey cluster, compiler and netCDF libraries are initialized with commands such as these:

module purge
module load gcc/8.2.0-fasrc01
module load openmpi/3.1.1-fasrc01
module load netcdf/4.1.3-fasrc02

The first line removes all pre-loaded modules. The second line loads the GNU C and Fortran compilers (version 8.2.0). The third and fourth lines openmpi 3.1.1 (which netCDF depends on), and finally netCDF 4.1.3 itself. You can add these module load statements into your system startup files (e.g. .bashrc, .bash_aliases), etc.

As a convenience, Lmod will also export the relevant folder paths to your Unix environment. For example, issuing the above module statements on the Harvard Odyssey cluster will export the following environment variables:

$GCC_HOME        # Home folder for gcc 8.2.0
$GCC_INCLUDE     # Folder where include files of gcc 8.2.0 are stored
$GCC_LIB         # Folder where library files of gcc 8.2.0 are stored
$MPI_HOME        # Home folder for openmpi 3.1.1
$MPI_INCLUDE     # Folder where include files (e.g. mpi.h) of openmpi 3.1.1 are stored
$MPI_LIB         # Folder where library files (e.g. libmpi*.a) openmpi 3.1.1 are stored
$NETCDF_HOME     # Home folder for netcdf-4.1.3
$NETCDF_INCLUDE  # Folder where include files (e.g. netcdf.h, netcdf.inc) are stored
$NETCDF_LIB      # Folder where library files (e.g. libnetcdf.a, libnetcdff.a) for netCDF 4.1.3 are stored

You can then use these environment variables to tell GEOS-Chem where it can find the netCDF libraries on your system. See our Setting Unix environment variables for GEOS-Chem wiki page for more information. NOTE: The names of these environment variables may be different on your system (ask you IT staff for more information).

Lmod makes it very easy to switch between different compilers and libraries. To load the netCDF libraries that were built with the Intel Fortran Compiler, all one has to do is to use a different set of module load statements, such as:

module purge
module load intel/17.0.4-fasrc01
module load openmpi/2.1.0-fasrc02
module load netcdf/4.3.2-fasrc05
module load netcdf-fortran/4.4.0-fasrc03

NOTE: For an explanation of why netcdf-fortran is loaded as a separate module, please see this section below.

One downside of using Lmod is that you are locked into using only those compiler and software versions that have already been installed on your system by your IT staff. For example, an update to computer model that you are using might also updating to a new compiler version that is not yet available on your system. In this case, you will need to request that your IT staff install the new compiler version for you (and wait for them to do it). But in general, Lmod succeeds in ensuring that only well-tested compiler/software combinations are available to users.

--Bob Yantosca (talk) 19:58, 9 January 2019 (UTC)

Manual library installation

If your computer system does not use Lmod, then the netCDF libraries may have already been installed by your IT staff in one of the usual Unix folder locations (such as /usr/lib or /usr/local/lib). If this is the case, ask your IT staff where these libraries reside.

Once you know the location of the compiler and netCDF libraries, you can set the proper environment variables for GEOS-Chem.

--Bob Yantosca (talk) 19:08, 9 January 2019 (UTC)

Library installation on the cloud

If you are using GEOS-Chem on the Amazon Web Services cloud computing platform, then the netCDF libraries will already be installed for you, either as part of the Amazon Machine Image (AMI) or software container (e.g. Docker or Singularity) that you used to initialize your computational environment. The proper Unix environment variables will also be defined.

For more information, please see our comprehensive cloud computing tutorial: cloud.geos-chem.org

--Bob Yantosca (talk) 19:43, 9 January 2019 (UTC)

netCDF 4.2 and later versions require a separate netCDF-Fortran installation

In our section on the Lmod module system above, we used the following example commands to load libraries that are compatible with GNU Fortran 8.2.0:

module purge
module load gcc/8.2.0-fasrc01
module load openmpi/3.1.1-fasrc01
module load netcdf/4.1.3-fasrc02

But later on in that same section, we listed a different set of module load commands to load libraries that are compatible with Intel Fortran Compiler 17.0.4:

module purge
module load intel/17.0.4-fasrc01
module load openmpi/2.1.0-fasrc02
module load netcdf/4.3.2-fasrc05
module load netcdf-fortran/4.4.0-fasrc03

You might have noticed that have loaded netcdf-fortran as a separate module for Intel Fortran but not for GNU Fortran. What is the reason for this?

As it turns out, in all netCDF versions up to 4.1.3, the library files for the C-language interface (libnetcdf.a) and the Fortran-language interface (libnetcdff.a) were always stored in the same folder. But in netCDF 4.2.0 (circa 2010) and later versions, the Fortran-language interface to netCDF was moved to a completely separate distribution, with its own version numbering system. Therefore, if you are using a netCDF package greater than 4.2.0, you have to install netCDF-Fortran as a completely separate library.

If your computer system uses the Lmod module software, then loading the netcdf-fortran module will also export the following environment variables to your Unix environment:

NETCDF_FORTRAN_HOME     # Root folder of the netCDF Fortran-language interface
NETCDF_FORTRAN_INCLUDE  # Folder where netCDF-Fortran include files (e.g. netcdf.inc) are stored
NETCDF_FORTRAN_LIB      # Folder where netCDF-Fortran library files (e.g. libnetcdff.a) are stored.

These are analogous to NETCDF_HOME, NETCDF_INCLUDE, and NETCDF_LIB as mentioned above.

Long story short:

  1. If you are using netCDF-4.2.0 and later (which are the most recent versions of netCDF), look for a separate netCDF-Fortran installation
  2. If you are using netCDF-4.1.3 and prior, then there is no separate netCDF-Fortran installation

GEOS-Chem is designed to work with any version of netCDF, regardless if the netCDF-Fortran installation is separate or not.

--Bob Yantosca (talk) 19:39, 9 January 2019 (UTC)

If you do not have netCDF on your system, use Spack to install it

The GEOS-Chem-Libraries installer contains library versions that might not be compatible with the most recent Linux/Ubuntu/Fedora operating systems and gcc/gfortran and icc/ifort compilers. In particular, several users have noticed that the build fails for gcc/gfortran compiler versions 4.8 and higher.

If you do not already have a pre-built netCDF library on your system, we recommend using the Spack package manager to install the required libraries. Spack should be able to install netCDF and all required libraries for a variety of compiler/platform combinations (including Linux and MacOS).

Downloading Spack

Clone the Spack repository, which is hosted at Github, to your disk space:

git clone https://github.com/spack/spack.git

Using Spack to install netCDF libraries

Before you execute Spack, make sure that you specify the compiler that you wish to use. From within the bin folder of the Spack repository that you just downloaded, you can type:

 ./spack compiler find

to make sure that Spack has found the compiler that you want to use. For more information, please see the Compiler ccnfiguration section of the Spack manual.

Installing netCDF: our recommended configurations

Once you have make sure that Spack has found the compiler, you can proceed to installing the libraries. Here are the commands that you need:

# Configuration Spack installation commands
1 Install netCDF for use with both GEOS-Chem "Classic" and GCHP (also includes the MPI library)
THIS IS OUR RECOMMENDED CONFIGURATION
cd spack/bin
./spack install netcdf-fortran
2 Install netCDF for use with GEOS-Chem "Classic" only
(i.e. will NOT install the MPI library)
cd spack/bin
./spack install netcdf-fortran ^netcdf~mpi ^hdf5~mpi

The commands will tell Spack to download and install the netCDF Fortran-language library along with all of its dependent libraries (such as the netCDF C-language library, the HDF-5 library, an MPI library, etc.). By default, Spack picks the most recent library versions that are available, but you can modify this behavior this with the commands described below. The installation can take about 30-60 minutes, depending on the options that you specify.

If you think you will be using GEOS-Chem in both its "Classic" mode and its high-performance mode (aka GCHP), we recommend that you install netCDF with MPI (Configuration #1). If you are only going to use GEOS-Chem in "Classic" mode, you can omit installing MPI (Configuration #2).

NOTE: For more information on why the netCDF C-language and Fortran-language libraries are installed separately, please see this section.

Commands for customizing the initialization process

Using Spack with the default options should be sufficient for most GEOS-Chem applications. But Spack also lets you customize certain aspects of the installation process. The table below gives some common examples:

Action Spack commands
Tell Spack to build libraries without depending on other libraries
(e.g. build netCDF and its HDF5 dependency without MPI)
cd spack/bin
./spack install netcdf-fortran ^netcdf~mpi ^hdf5~mpi
Tell Spack to install specific library versions instead of the most recent versions: cd spack/bin
./spack install netcdf-fortran@4.4.0 netcdf@4.6
Tell Spack to install libraries using a specific compiler version: cd spack/bin
./spack install netcdf-fortran %gcc@8.2.0

For more information about customization, please see the Spack beginner's tutorial:

Pointing GEOS-Chem environment variables to the Spack library paths

To find the root paths where Spack has installed these libraries, you can use these commands:

spack find --paths netcdf-cxx4
spack find --paths netcdf-fortran

But you can also use the spack location command as shown below to automatically insert the paths to the netCDF and netCDF-Fortran libraries into one of your Unix environment startup scripts (such as .bashrc or .bash_aliases):

# Environment variables for the netCDF C-language interface
export NETCDF_HOME=$(spack location -i netcdf)
export GC_BIN=$NETCDF_HOME/bin
export GC_INCLUDE=$NETCDF_HOME/include
export GC_LIB=$NETCDF_HOME/lib

# Environment variables for the netCDF Fortran-languge interface
export NETCDF_FORTRAN_HOME=$(spack location -i netcdf-fortran)
export GC_F_BIN=$NETCDF_FORTRAN_HOME/bin
export GC_F_INCLUDE=$NETCDF_FORTRAN_HOME/include
export GC_F_LIB=$NETCDF_FORTRAN_HOME/lib

Please see our Setting Unix environment variables for GEOS-Chem wiki page for more information about other environment variables that you may need to define.

For advanced users: You can also use Spack to create module files that can be used with the Lmod module manager, which is used on many HPC cluster systems. For more information, please see the Module Files tutorial of the Spack manual.

For more information

For complete instructions on using Spack, please see the Spack manual

If you need to manage a lot of separate software environments, then you can use Spack to create packages, so that you can easily switch between them. Please see this tutorial for more information:

Here is a useful tutorial about using Spack to install libraries for High-Performance Computing applications:

For more information about using Spack with Docker and Singularity software containers, please see this tutorial:

If you encounter an error while using Spack, it could be due to an incompatibility with your particular compiler/platform combination. We encourage you to report all issues to the Spack developers by opening a ticket on the Spack issue tracker:

--Bob Yantosca (talk) 16:34, 9 January 2019 (UTC)