Difference between revisions of "Setting Unix environment variables for GEOS-Chem"

From Geos-chem
Jump to: navigation, search
(Environment variables that specify library paths)
(Environment variables for OpenMP parallelization)
Line 175: Line 175:
 
== Environment variables for OpenMP parallelization ==
 
== Environment variables for OpenMP parallelization ==
  
GEOS-Chem "Classic" uses [[Parallelizing GEOS-Chem|OpenMP parallelization]], which is an implementation of shared-memory (aka serial) parallelization.  Two Unix environment variables control the OpenMP parallelization settings:
+
GEOS-Chem "Classic" uses [[Parallelizing GEOS-Chem|OpenMP parallelization]], which is an implementation of shared-memory (aka serial) parallelization.  Two Unix environment variables control the OpenMP parallelization settings, as defined below.
  
 
=== OMP_NUM_THREADS ===
 
=== OMP_NUM_THREADS ===

Revision as of 16:15, 17 June 2019

This page contains information about how to set several environment variables in your Unix environment that are needed to compile GEOS-Chem.

Environment variables that specify compiler names

GEOS-Chem currently supports the Intel Fortran compiler (ifort) and the GNU Fortran compiler (gfortran). You must set the following variables in your Unix environment to tell GEOS-Chem which compiler type you are using.

Variable Description
FC Name of the Fortran compiler
CC Name of the C compiler.
Not needed for GEOS-Chem "Classic", but is needed for GCHP.
CXX Name of the C++ compiler.
Not needed for GEOS-Chem "Classic", but is needed for GCHP.

On many systems (such as the Harvard Odyssey cluster), FC, CC, and CXX will be set automatically for you when you load a software module into your Unix environment with the module load command. The easiest way to check if these variables have been automatically set for you is to print them to the screen. Type at the Unix prompt:

echo $FC
echo $CC
echo $CXX

On the other hand, if FC, CC, and CXX are all undefined, then you will have to manually set them in your startup script, as described in the table below:

If your
shell is
and your
compiler is
add to
this file
the following lines of code
/bin/bash ifort .bashrc export FC=ifort
export CC=icc
export CXX=icpc
/bin/bash gfortran .bashrc export FC=gfortran
export CC=gcc
export CXX=g++
/bin/csh or
/bin/tcsh
ifort .cshrc setenv FC ifort
setenv CC icc
setenv CXX icpc
/bin/csh or
/bin/tcsh
gfortran .cshrc setenv FC gfortran
setenv CC gcc
setenv CXX g++

Then make sure to type:

source ~/.bashrc   # if you are using bash
source ~/.cshrc    # if you are using csh or tcsh

to apply the changes.

--Bob Yantosca (talk) 18:47, 9 January 2019 (UTC)

Environment variables that specify library paths

GEOS-Chem uses the netCDF library for file I/O. You should first check to see if there is a pre-built netCDF library installation on your system already. If not, then you (or your IT staff) can install GEOS-Chem with the Spack package manager.

After you have verified that a netCDF library installation exists on your system, the next step is to tell GEOS-Chem where to find the relevant library, include, and executable files. Otherwise you will get errors during the compilation process. The easiest way to do this is to set environment variables in your setup files:

  1. .bashrc : or .bash_aliases: if you use Bourne-Again shell (bash)
  2. .cshrc  : if you use C-shell (csh) or T-shell (tcsh)

There are three environment variables that you need to set:

Variable Description
GC_BIN Points to the bin/ subfolder of the root netCDF path. This is where the nc-config and nf-config files are located.
GC_INCLUDE Points to the include/ subfolder of the root netCDF path. This is where netCDF include files (*.h, *.inc) and compiled module files (*.mod) for the netCDF (and HDF5) libraries are located.
GC_LIB Points to the lib/ subfolder of the root netCDF path. (On some systems this may be named lib64/ instead.) This is where the netCDF library files (*.a) are located.


If (and only if) netCDF-Fortran is installed as a separate library on your system, you will also need to set these variables:

Variable Description
GC_F_BIN Points to the bin/ subfolder of the root netCDF Fortran path. This is where the nf-config file is located.
GC_F_INCLUDE Points to the include/ subfolder of the root netCDF Fortran path. This is where netCDF include files (*.h) and compiled module files (*.mod) are located.
GC_F_LIB Points to the lib/ subfolder of the root netCDF Fortran path. (On some systems this may be named lib64/ instead.) This is where the netCDF library files (*.a) are located.

The best way to define these variables is to add them to one of your sysetm startup files.

If you are using a HPC cluster with the Lmod module manager, then environment variables that point to the root library folders (e.g. NETCDF_HOME, NETCDF_FORTRAN_HOME) will already be loaded into your Unix environment. We recommend using these variables to define the other environment variables as described in following secttions.

If you have any questions about where the library paths for netCDF are located on your system, ask your IT staff.

With bash

If you use bash, then add the following lines to one of your Unix environment startup files (e.g. .bashrc, .bash_aliases, etc.):

# Tell GEOS-Chem where to find netCDF library files
export NETCDF_HOME=path to netCDF library files
export GC_BIN=$NETCDF_HOME/bin
export_GC_INCLUDE=$NETCDF_INCLUDE
export GC_LIB=$NETCDF_LIB

# NOTE: If netCDF-Fortran was loaded as a separate module, then
# also define these variables.  (Otherwise comment these out.)
export NETCDF_FORTRAN_HOME=path to netCDF-Fortran library files
export GC_F_BIN=$NETCDF_FORTRAN_HOME/bin
export GC_F_INCLUDE=$NETCDF_FORTRAN_INCLUDE
export GC_F_LIB=$NETCDF_FORTRAN_LIB

Then to accept the changes, type at the Unix prompt: source ~/.bashrc (or source ~/.bash_aliases, etc.)

With csh or tcsh

If you use C-shell (csh) or T-shell (tcsh), then add the following lines to your .cshrc file:

# Tell GEOS-Chem where to find netCDF library files
setenv NETCDF_HOME         path to netCDF library files
setenv GC_BIN              $(NETCDF_HOME)/bin
setenv GC_INCLUDE          $(NETCDF_INCLUDE)
setenv GC_LIB              $(NETCDF_LIB)

# NOTE: If netCDF-Fortran was loaded as a separate module, then
# also define these variables.  (Otherwise comment these out.)
setenv NETCDF_FORTRAN_HOME path to my netCDF-Fortran library files
setenv GC_F_BIN            $(NETCDF_FORTRAN_HOME)/bin
setenv GC_F_INCLUDE        $(NETCDF_FORTRAN_INCLUDE)
setenv GC_F_LIB            $(NETCDF_FORTRAN_LIB)

Then to accept the changes, type source ~/.cshrc at the Unix prompt.

--Bob Yantosca (talk) 16:10, 10 January 2019 (UTC)

For libraries installed with the Spack package manager

If you installed the netCDF libraries with the Spack package manager, then please see this wiki post which describes how you can define the environment variables to point to the proper library paths.

--Bob Yantosca (talk) 15:10, 17 June 2019 (UTC)

Environment variables for OpenMP parallelization

GEOS-Chem "Classic" uses OpenMP parallelization, which is an implementation of shared-memory (aka serial) parallelization. Two Unix environment variables control the OpenMP parallelization settings, as defined below.

OMP_NUM_THREADS

The OMP_NUM_THREADS environment variable sets the number of computational cores (aka threads) that you would like GEOS-Chem to use. We recommend that you set OMP_NUM_THREADS in your .bashrc (or .cshrc file) as well as in each GEOS-Chem run script that you use.

The following commands will request that GEOS-Chem use 8 cores by default:

If you use bash.
use this command
If you use csh or tcsh,
use this command
export OMP_NUM_THREADS=8 setenv OMP_NUM_THREADS 8

You can of course change the number of cores from 8 to however many you want your GEOS-Chem simulation to use. The caveat being that OpenMP-parallelized programs cannot execute on more than 1 computational node of a multi-node system. Most modern computational cores typically contain between 16 and 64 cores. Therefore, your GEOS-Chem "Classic" simulations will not be able to take advantage of more cores than these. (We recommend that you consider using GCHP for more computationally-intensive simulations.)

If your system uses the SLURM batch scheduler, then you can write your GEOS-Chem job script using the SLURM_CPUS_PER_TASK environment variable so that it will use the same number of cores as the number of cores you requested via SLURM.

#!/bin/bash

#SBATCH -c 24
#SBATCH -N 1
#SBATCH -t 0-12:00
#SBATCH -p MY_QUEUE_NAME
#SBATCH --mem=15000

# Apply your environment settings to the computational queue
source ~/.bashrc
 
# Set the proper # of threads for OpenMP
# SLURM_CPUS_PER_TASK ensures this matches the number you set with -c above
#
# So in this example, we requested that SLURM make 24 cores available,
# and GEOS-Chem will use all of these 24 cores.
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

... etc ...

IMPORTANT! If you forget to define OMP_NUM_THREADS in your Unix environment and/or run scripts, then GEOS-Chem will only execute using one core. This can cause GEOS-Chem to execute much more slowly than intended.

OMP_STACKSIZE

In order to use GEOS-Chem "Classic" with OpenMP parallelization, you must request the maximum amount of stack memory in your Unix environment. (The stack memory is where local automatic variables and temporary !$OMP PRIVATE variables will be created.) Add the following lines to your system startup file and to your GEOS-Chem run scripts:

If you use bash.
add this to your .bashrc file
If you use csh or tcsh,
add this to your .cshrc file
ulimit -s unlimited
export OMP_STACKSIZE=500m
limit stacksize unlimited
setenv OMP_STACKSIZE 500m

The ulimit -s unlimited (for bash) or limit stacksize unlimited commands tell the Unix shell to use the maximum amount of stack memory available.

The environment variable OMP_STACKSIZE must also be set to a very large number. In this example, we are nominally requesting 500 MB of memory. But in practice, this will tell the GNU Fortran compiler to use the maximum amount of stack memory available on your system. The value 500m is a good round number that is larger than the amount of stack memory on most computer clusters, but you can change this if you wish.

--Bob Yantosca (talk) 15:45, 17 June 2019 (UTC)

Errors caused by incorrect environment variable settings

If you encounter any of the GEOS-Chem errors listed below, please doublecheck your environment variable settings as described in the preceding sections.

  1. f77: command not found

--Bob Yantosca (talk) 15:12, 17 June 2019 (UTC)