Programming techniques for HPC environments
On this page we discuss various programming techniques that we have used to prepare GEOS-Chem to run in High-Performance Computing (HPC) environments. Please also see our GEOS-Chem HP wiki page.
- 1 Replacing common blocks with Fortran modules
- 2 Using derived-type objects to pass data between modules
- 3 Replacing binary file I/O with netCDF
- 4 Restricting screen and log file output to the root CPU
- 5 Using findFreeLUN to assign logical unit numbers for file I/O
- 6 Error handling and traceback
- 7 The DEVEL C-preprocessor switch
- 8 The EXTERNAL_GRID and EXTERNAL_FORCING C-preprocessor switches
Replacing common blocks with Fortran modules
The following changes were made to GEOS-Chem v9-01-03 and higher versions:
COMMONblocks containing global data were completely removed from GEOS-Chem v9-01-03 and higher versions. Global data arrays (i.e. those with lon, lat, level dimensions) were either converted to allocatable arrays and placed in Fortran-90 modules, or were converted to fields of derived-type objects.
- Include statements of the form
#include "CMN_SIZE"were replaced with Fortran-90 USE stateements, such as USE CMN_SIZE_MOD (etc.) everywhere throughout GEOS-Chem.
COMMON blocks from GEOS-Chem facilitates running in HPC environments using ESMF and MPI parallelization. The problem is that
COMMON blocks are static storage. You cannot change the size of arrays in
COMMON blocks once they are declared. As such, it is difficult to distribute elements of arrays in
COMMON blocks to CPUs on multiple nodes. Instead, we are now able to use
ALLOCATABLE arrays or
POINTER arrays, which can be sized at run-time instead of at compile time.
Using derived-type objects to pass data between modules
Since GEOS-Chem v9-01-03, we have rewritten most of GEOS-Chem's subroutines and functions to accept derived-type objects passed as argumetnts. A derived type object is a data structure that can hold several individual variables (think of an object as a "bucket of variables").
We use the following objects to pass data between subroutines and functions:
||Read-only||Contains inputs for GEOS-Chem as read from the |
||Read-only||Contains meteorological fields and other relevant input data.|
||Read-write||Contains species concentrations and related information, including the GEOS-Chem species database.|
With this approach, passing additional variables to subroutines and functions is as simple as adding the variable to one of the above objects. (Think of dropping a new "variable" into the "bucket", and then passing the bucket around.)
For more information, please see our Derived type objects used by GEOS-Chem wiki page.
Replacing binary file I/O with netCDF
Since GEOS-Chem v9-01-03, we have started replacing unformatted binary I/O (i.e. the "binary punch file format") with netCDF I/O. This is a necessary step towards running GEOS-Chem in high-performance computing (HPC) environments.
The HEMCO emissions component, which was introduced in GEOS-Chem v10-01, now reads emissions and related data from COARDS-compliant netCDF files. This allowed us to remove much of the legacy emissions code from GEOS-Chem. We also converted our existing emissions data from binary punch format to netCDF format for HEMCO.
As of this writing (prior to the GEOS-Chem v11-01 release), we are working towards replacing the existing GEOS-Chem diagnostics (which archive data to binary punch format) with netCDF diagnostic output. We expect this to be completed by GEOS-Chem v11-02.
For more information about netCDF, please see our Preparing data files for use with HEMCO wiki page.
Restricting screen and log file output to the root CPU
NOTE: This feature was implemented into GEOS-Chem v9-01-03, release date 14 Sep 2012. We continue to add this feature to new GEOS-Chem routines.
You will see several statements such as
WRITE( 6, 100 ) or
PRINT* placed throughout the GEOS-Chem source code. These statements will print text to the Unix
stdout stream. (
Stdout is Unix-speak for the text that gets printed to your screen.) As described in Chapter 6.2.1 of the GEOS-Chem Users' Guide, you can redirect the
stdout stream to a log file with a command such as:
geos > log &
This command is known as a redirect.
GEOS-Chem currently uses OpenMP, which parallelizes individual DO loops. Because GEOS-Chem's
GEOS-Chem will connect to the NASA GEOS-5 GCM via an interface that utilizes the Earth System Model Framework (ESMF) library. ESMF employs Message Passing Interface (MPI) parallelization to run the combined GEOS-Chem/GEOS-5 GCM on hundreds of CPUs. Each individual CPU will execute its own GEOS-Chem simulation for a small sub-domain of the world (i.e. a single vertical column or group of several adjacent vertical columns). The sum total of all of these individual simulations will comprise the global GEOS-Chem/GEOS-5 GCM simulation.
When using MPI parallelzation, all GEOS-Chem processes—including writing text to
stdout—will occur on each of the CPUs on the computational cluster. In order to avoid writing the same text messages over and over to
stdout, we must take some extra precautions. We have chosen to restrict printing informational text messages (other than error messages) to the root CPU. However, we must also allow these text messages to print when GEOS-Chem is used in the traditional manner.
Starting in GEOS-Chem v9-01-03, you will see a new argument,
am_I_Root, passed to many of GEOS-Chem's key subroutines. Right now we have focused on adding
am_I_Root to routines that are part of the Chemistry Component. (In the future we will extend this to all GEOS-Chem subroutines.) We use
am_I_Root to wrap existing
If you looked at subroutine
input_mod.F) from GEOS-Chem v9-01-02 or previous versions, you would have seen these
SUBROUTINE READ_INPUT_FILE . . . WRITE( 6, '(a )' ) REPEAT( '=', 79 ) WRITE( 6, '(a,/)' ) 'G E O S - C H E M U S E R I N P U T' WRITE( 6, 100 ) TRIM( FILENAME ) 100 FORMAT( 'READ_INPUT_FILE: Reading ', a )
But in if you look at the same subroutine in GEOS-Chem v9-01-03 and higher versions, you will see this code:
SUBROUTINE READ_INPUT_FILE( am_I_Root, Input_Opt, RC ) . . . ! ! !INPUT PARAMETERS: ! LOGICAL, INTENT(IN) :: am_I_Root ! Is this the root CPU? . . . IF ( am_I_Root ) THEN WRITE( 6, '(a )' ) REPEAT( '=', 79 ) WRITE( 6, '(a,/)' ) 'G E O S - C H E M U S E R I N P U T' WRITE( 6, 100 ) TRIM( FILENAME ) 100 FORMAT( 'READ_INPUT_FILE: Reading ', a ) ENDIF
am_I_Root argument sets apart all
PRINT* statements. If you are running a "traditional" GEOS-Chem simulation (i.e. without connecting to the GEOS-5 GCM), then
am_I_Root is set to
.TRUE. in the driver program
! When connecting G-C to an external GCM, we need to only write ! to stdout if we are on the root CPU. Otherwise this will slow ! down the code. This is why we introduced the am_I_Root logical ! variable. ! ! However, if we are using the "traditional" G-C, then we don't ! need to restrict I/O to the root CPU. Therefore, we can just ! set am_I_Root = .true. here and then have it propagate down to ! all of the lower-level routines. The main.F routine is not ! called when connecting G-C to an external GCM. ! (mlong, bmy, 7/30/12) LOGICAL, PARAMETER :: am_I_Root = .TRUE. . . . !================================================================= ! ***** I N I T I A L I Z A T I O N ***** !================================================================= ! Read input file and call init routines from other modules CALL READ_INPUT_FILE( am_I_Root, Input_Opt, RC ) . . .
and is then passed to
READ_INPUT_FILE and other lower-level GEOS-Chem subroutines.
For combined GEOS-Chem/GEOS-5 GCM simulations, the
am_I_Root function is defined with the library function
MAPL_Am_I_Root. This function returns
.TRUE. if the current processor is the root processor, or
.FALSE. otherwise. Any
WRITE statements within an
IF ( am_I_Root ) block (such as the above example from subroutine
READ_INPUT_FILE) will only execute on the root CPU. This will print informational messages only once instead of hundreds or thousands of times.
--Bob Y. 12:51, 10 December 2012 (EST)
Using findFreeLUN to assign logical unit numbers for file I/O
NOTE: This feature was implemented into GEOS-Chem v9-01-03, release date 14 Sep 2012.
Prior to GEOS-Chem v9-01-03, all logical unit numbers (LUNs) used for Fortran file I/O were pre-defined as
PARAMETERs in module
GeosUtil/file_mod.F, as shown below:
! ! !DEFINED PARAMETERS: ! ! Logical file unit numbers for ... INTEGER, PUBLIC, PARAMETER :: IU_RST = 1 ! Tracer restart file INTEGER, PUBLIC, PARAMETER :: IU_CHEMDAT = 7 ! "chem.dat" INTEGER, PUBLIC, PARAMETER :: IU_FASTJ = 8 ! FAST-J input files INTEGER, PUBLIC, PARAMETER :: IU_GEOS = 10 ! "input.geos" INTEGER, PUBLIC, PARAMETER :: IU_BPCH = 11 ! "ctm.bpch" INTEGER, PUBLIC, PARAMETER :: IU_ND20 = 12 ! "rate.YYYYMMDD" INTEGER, PUBLIC, PARAMETER :: IU_ND48 = 13 ! ND48 output INTEGER, PUBLIC, PARAMETER :: IU_ND49 = 14 ! "tsYYYYMMDD.bpch" INTEGER, PUBLIC, PARAMETER :: IU_ND50 = 15 ! "ts24h.bpch" INTEGER, PUBLIC, PARAMETER :: IU_ND51 = 16 ! "ts10_12am.bpch" etc. INTEGER, PUBLIC, PARAMETER :: IU_ND51b = 23 ! for ND51b diagnostic INTEGER, PUBLIC, PARAMETER :: IU_ND52 = 17 ! ND52 output (NRT only) INTEGER, PUBLIC, PARAMETER :: IU_PLANE = 18 ! "plane.log" INTEGER, PUBLIC, PARAMETER :: IU_BC = 19 ! TPCORE BC files INTEGER, PUBLIC, PARAMETER :: IU_BC_NA = 20 ! TPCORE BC files: NA grid INTEGER, PUBLIC, PARAMETER :: IU_BC_EU = 21 ! TPCORE BC files: EU grid INTEGER, PUBLIC, PARAMETER :: IU_BC_CH = 22 ! TPCORE BC files: CH grid INTEGER, PUBLIC, PARAMETER :: IU_FILE = 65 ! Generic file INTEGER, PUBLIC, PARAMETER :: IU_TP = 69 ! "YYYYMMDD.tropp.*" INTEGER, PUBLIC, PARAMETER :: IU_PH = 70 ! "YYYYMMDD.phis.*" INTEGER, PUBLIC, PARAMETER :: IU_I6 = 71 ! "YYYYMMDD.i6.*" INTEGER, PUBLIC, PARAMETER :: IU_A6 = 72 ! "YYYYMMDD.a6.*" INTEGER, PUBLIC, PARAMETER :: IU_A3 = 73 ! "YYYYMMDD.a3.*" INTEGER, PUBLIC, PARAMETER :: IU_A1 = 74 ! "YYYYMMDD.a1.*" INTEGER, PUBLIC, PARAMETER :: IU_GWET = 75 ! "YYYYMMDD.gwet.*" INTEGER, PUBLIC, PARAMETER :: IU_XT = 76 ! "YYYYMMDD.xtra.*" INTEGER, PUBLIC, PARAMETER :: IU_CN = 77 ! "YYYYMMDD.cn.*" INTEGER, PUBLIC, PARAMETER :: IU_SMV2LOG = 93 ! "smv2.log" INTEGER, PUBLIC, PARAMETER :: IU_DEBUG = 98 ! Reserved for debugging INTEGER, PUBLIC, PARAMETER :: IU_OAP = 99 ! soaprod.YYYYMMDDhh
We assigned a unique LUN value to each different type of GEOS-Chem input or output file. This ensured that data would get written to the proper file.
When GEOS-Chem connects to the GEOS-5 GCM, however, we can no longer rely on these pre-defined LUNs. The GEOS-5 GCM assign LUNs to files based on availability. The GEOS-5 GCM will check for the next free LUN, and then use that to open a file for input or output.
For better compatibility with the GEOS-5 GCM, we have removed the pre-defined LUNs from
file_mod.F. File LUNs are now made local to routines or modules instead of being centrally located within
GeosUtil/file_mod.F. Eric Nielsen (from GSFC) has created a new function
findFreeLUN (contained within module
GeosUtil/inquireMod.F90) that can be used to search for LUNs that are not already in use. You will see the following calls to
findFreeLUN wherever GEOS-Chem reads a non-netCDF file from disk:
USE inquireMod, ONLY : findFreeLUN . . . ! LUN is now declared locally and not in file_mod.F INTEGER :: IU_FILE . . . ! Look for a free file LUN IU_FILE = findFreeLun() ! Open file OPEN( IU_FILE, FILE=TRIM( FILENAME ) ... ) . . . ! Close the file CLOSE( IU_FILE )
Unlike all of the other LUNs,
IU_BPCH is referred to from several different routines within the source code. Therefore, we had to leave this in n
IU_BPCH. But we have converted this from a
PARAMETER to a regular variable:
Old INTEGER, PUBLIC, PARAMETER :: IU_BPCH = 11 ! "ctm.bpch" New INTEGER, PUBLIC :: IU_BPCH
We now use the
findFreeLUN function to initialize
NOTE: The LUN
IU_BPCH only gets used in those sections of GEOS-Chem that will not be called by the GEOS-5 GCM.
--Bob Y. 12:51, 10 December 2012 (EST)
Error handling and traceback
NOTE: As of v11-01, GEOS-Chem's error trapping is not completely implemented. We will complete this in a future version.
As mentioned in this wiki post, we now use derived-type objects in order to pass data between GEOS-Chem routines. A typical GEOS-Chem subroutine will now take the following arguments:
! ! !INTERFACE: ! SUBROUTINE MY_GEOS_CHEM_SUB( am_I_Root, Input_Opt, State_Met, State_Chm, RC ) ! ! !USES: ! USE GIGC_ErrCode_Mod USE GIGC_Input_Opt_Mod, ONLY : OptInput USE GIGC_State_Chm_Mod, ONLY : ChmState USE GIGC_State_Met_Mod, ONLY : MetState ! ! !INPUT PARAMETERS: ! LOGICAL, INTENT(IN) :: am_I_Root ! Are we on the root CPU? TYPE(OptInput), INTENT(IN) :: Input_Opt ! Input Options object TYPE(MetState), INTENT(IN) :: State_Met ! Meteorology State object ! ! !INPUT/OUTPUT PARAMETERS: ! TYPE(ChmState), INTENT(INOUT) :: State_Chm ! Chemistry State object ! ! !OUTPUT PARAMETERS: ! INTEGER, INTENT(OUT) :: RC ! Success or failure?
RC (return code) argument will be set to one of the
PARAMETER values contained in module file
! ! !DEFINED PARAMETERS: ! INTEGER, PUBLIC, PARAMETER :: GIGC_SUCCESS = 0 ! Routine returns success INTEGER, PUBLIC, PARAMETER :: GIGC_FAILURE = -1 ! Routine returns failure
If the subroutine finishes normally, then we assign:
RC = GIGC_SUCCESS
and then exit normally. On the other hand, if the subroutine dies with a catastrophic error, we assign:
RC = GIGC_FAILURE RETURN
This shall cause GEOS-Chem to cease program execution and return to the calling routine. In the calling routine we shall have an IF statement to determine if the subroutine finished normally:
IF ( .not. GIGC_SUCCESS ) RETURN
If the subroutine finished normally, then execution is allowed to proceed. Otherwise, GEOS-Chem program flow shall exit the calling routine and return to the subroutine one level higher (which shall return to the subroutine one level higher than that, etc). In this way, GEOS-Chem shall propagate the error from the location where it occurred all the way back up to the main "driver" routine, which shall display an error message and shut down the simulation gracefully.
As of this writing (Jan 2013), we have added the
RC argument to many GEOS-Chem subroutines but we have not fully implemented the error trapping. The work is ongoing.
--Bob Y. 11:16, 15 January 2013 (EST)
The DEVEL C-preprocessor switch
NOTE: We made heavy use of the DEVEL switch was used heavily when we were modifying GEOS-Chem to accept derived-type objects as input. As of this writing (v11-01), most of the #if defined( DEVEL ) blocks have been removed from GEOS-Chem. But we still continue to use this technique when we need to preserve both old code and new code in the same subroutine for testing.
GEOS-Chem HPC development resembles a highway construction project. Consider a new bridge that is being constructed alongside an existing bridge. In order to prevent major traffic disruptions, vehicles will continue to travel across the old bridge while the new bridge is being built. At the end of the construction project, traffic is finally rerouted over the new bridge, and the old bridge is taken down.
In much the same way, we are adding new sections of source code to GEOS-Chem that will allow it to connect to the NASA GEOS-5/GCM. In order to prevent disruptions to the normal GEOS-Chem workflow, we have segregated these sections of new code from existing GEOS-Chem code with C-preprocessor switches. This allows us to activate the new code for testing, while leaving the existing code untouched.
If you look through the GEOS-Chem source code routines, you will a bunch of
#if defined( DEVEL ) ... endif blocks.
DEVEL stands for "Development code". Source code located within these
#if blocks will not execute unless you activate the
DEVEL switch at compile time. You can safely ignore these
#if blocks for the time being. But be aware that the new code within these blocks will eventually replace the GEOS-Chem source code.
Using DEVEL to test HPC updates
NOTE: The STT field was removed from GEOS-Chem v11-01 and higher versions. Also, the State_Chm object is now located in state_chm_mod.F90. While the code is different in the most recent GEOS-Chem versions, the methodology described below is still valid.
We typically use the
DEVEL switch to add new sections of code into existing GEOS-Chem subroutines. We frequently use this method to introduce new derived type objects into GEOS-Chem. Each derived type object is a "bucket" of variables that may hold one or more scalar or array fields. We objects to pass data between subroutines for better compatibility with the Earth System Model Framework, which controls the flow of information between components of the NASA GEOS-5/GCM.
In the example below, we use
DEVEL to pass tracer concentration information in/out of the subroutine with a derived type object named
State_Chm instead of using the
STT tracer array.
#if defined( DEVEL ) SUBROUTINE MY_SUB( State_Chm, ... ) ! New code: Pass State_Chm via the argument list #else SUBROUTINE MY_SUB( ... ) ! Old code: Keep the existing argument list #endif #if defined( DEVEL ) USE GIGC_State_Chm_Mod, ONLY : ChmState ! New code: Get the derived type for State_Chm #else USE TRACER_MOD, ONLY : STT ! Old code: Get STT directly from TRACER_MOD with a USE statement #endif . . . #if defined( DEVEL ) TYPE(ChmState), INTENT(INOUT) :: State_Chm ! New code: Declare State_Chm as an input/output argument REAL*8, POINTER :: STT(:,:,:,:) ! New code: Declare STT a local pointer variable #endif . . . !%%% START OF SUBROUTINE %%% #if defined( DEVEL ) STT => State_Chm%TRACERS ! New code: let STT point to the State_Chm%TRACERS field #endif ! This allows you to keep all of the other instances ! of STT in the existing code intact without having to ! modify them . . . !%%% END OF SUBROUTINE %%% #if defined( DEVEL ) NULLIFY( STT ) ! New code: Nullify the STT pointer so that we no longer #endif ! point to State_Chm before leaving the subroutine END SUBROUTINE MY_SUB
The default behavior will be to accept the old code and ignore the new code. But if we compile GEOS-Chem with the
DEVEL=yes option, the new code will be activated, and the old code will be ignored. Having both instruction sets in the same subroutine allows us to debug the model to make sure that the new code is functioning as expected.
We recognize that it is burdensome to keep both new and old code in the subroutine indefinitely. Once the new code has been validated, we shall remove the remaining sections of old code, as well as any remaining
--Bob Y. 11:38, 18 April 2013 (EDT)
Update December 2012
In GEOS-Chem v9-02d and higher versions, we have standardized a significant amount of code that had been previously been set apart in
#if defined( DEVEL ) blocks. The old code and
DEVEL blocks have been removed from these routines.
We are currently using
#if defined( DEVEL ) blocks to replace the existing
CSPEC arrays with fields from the Chemistry State object (named
--Bob Y. 12:35, 14 December 2012 (EST)
Update April 2013
In our development branch, we have integrated many of the DEVEL blocks into the mainline code. Many module arrays (i.e. met fields, STT tracer array, etc) are now replaced with derived type objects.
--Bob Y. 11:39, 18 April 2013 (EDT)
The EXTERNAL_GRID and EXTERNAL_FORCING C-preprocessor switches
In addition to the the DEVEL C-preprocessor switch, we have also introduced two additional C-preprocessor switches named
EXTERNAL_FORCING. These are intended to be set whenever GEOS-Chem needs to do something special for connecting to an external GCM (such as the NASA GEOS-5 GCM).
In many cases,
EXTERNAL_FORCING can be synonyms for
DEVEL. In many locations in GEOS-Chem you will see C-preprocessor blocks with all three switches, such as this one in
#if defined( DEVEL ) || defined( EXTERNAL_GRID ) || defined( EXTERNAL_FORCING ) ! Add this error trap to prevent out of bounds error ! but we should benchmark first before adding to ! the std G-C code (bmy, 8/2/12) IF ( JLOOP == 0 ) CYCLE #endif
However, there are other instances where we will use
EXTERNAL_FORCING without using
DEVEL switch allows testing of grid-independent modifications in the standard GEOS-Chem. Therefore, if we need to make modifications that will only get activated when we are connecting GEOS-Chem to an external GCM, we should exclude these from #
if defined( DEVEL ) blocks.
For example, this
#if block in
Headers/comode_loop_mod.F allows us to use the same setting as the standard GEOS-Chem when
DEVEL=yes, which facilitates debugging and comparison to the "traditional" GEOS-Chem. But if we are connecting to an external GCM, we will use a different setting.
#if defined( EXTERNAL_GRID ) || defined( EXTERNAL_FORCING ) !----------------------------------------------------------------- ! %%%%% CONNECTING TO GEOS-5 GCM via ESMF INTERFACE %%%%% ! ! KBLOOP is the # of boxes that SMVGEAR will process per CPU. ! Set KBLOOP=1 for connecting to an external GCM ! ! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ! %%% NOTE: If you are using GEOS-Chem without ESMF, but with %%% ! %%% the DEVEL=yes option (i.e. to test grid-independent %%% ! %%% updates w/r/t a standard G-C simulation), then you must %%% ! %%% make sure that KBLOOP is set to the same value in both %%% ! %%% simulations. %%% ! %%% %%% ! %%% The absolute and relative errors (which determine if the %%% ! %%% chemistry has converged to a solution) are computed over %%% ! %%% all KBLOOP boxes at once. Using different KBLOOP values %%% ! %%% in different simulations will cause slightly different %%% ! %%% results in chemical concentrations (even after only one %%% ! %%% timestep). %%% ! %%% %%% ! %%% To this end, we now only set KBLOOP=1 if we are %%% ! %%% connecting GEOS-Chem to an external GCM (i.e. if the Cpp %%% ! %%% switches EXTERNAL_GRID or EXTERNAL_FORCING are set in %%% ! %%% define.h). %%% ! %%% %%% ! %%% -- Bob Yantosca (14 Aug 2012) %%% ! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% !----------------------------------------------------------------- INTEGER, PARAMETER :: KBLOOP = 1 #else !----------------------------------------------------------------- ! %%%%% TRADITIONAL GEOS-Chem %%%%% ! ! KBLOOP is the # of boxes that SMVGEAR will process per CPU. ! For "traditional" G-C simulations, leave KBLOOP = 24 !----------------------------------------------------------------- INTEGER, PARAMETER :: KBLOOP = 24 #endif
Also note that we shall endeavor to denote in the comments which section of code is for connecting to the external GCM and which section of code is meant for the traditional GEOS-Chem (i.e. w/o the ESMF interface).
--Bob Y. 13:47, 10 December 2012 (EST)