Programming techniques for HPC environments

From Geos-chem
Jump to: navigation, search

On this page we discuss various programming techniques that we have used to prepare GEOS-Chem to run in High-Performance Computing (HPC) environments. Please also see our GEOS-Chem HP wiki page.

Replacing common blocks with Fortran modules

The following changes were made to GEOS-Chem v9-01-03 and higher versions:

  1. COMMON blocks containing global data were completely removed from GEOS-Chem v9-01-03 and higher versions. Global data arrays (i.e. those with lon, lat, level dimensions) were either converted to allocatable arrays and placed in Fortran-90 modules, or were converted to fields of derived-type objects.
  2. Include statements of the form #include "CMN_SIZE" were replaced with Fortran-90 USE stateements, such as USE CMN_SIZE_MOD (etc.) everywhere throughout GEOS-Chem.

Removing the COMMON blocks from GEOS-Chem facilitates running in HPC environments using ESMF and MPI parallelization. The problem is that COMMON blocks are static storage. You cannot change the size of arrays in COMMON blocks once they are declared. As such, it is difficult to distribute elements of arrays in COMMON blocks to CPUs on multiple nodes. Instead, we are now able to use ALLOCATABLE arrays or POINTER arrays, which can be sized at run-time instead of at compile time.

--Bob Yantosca (talk) 21:27, 5 December 2016 (UTC)

Using derived-type objects to pass data between modules

Since GEOS-Chem v9-01-03, we have rewritten most of GEOS-Chem's subroutines and functions to accept derived-type objects passed as argumetnts. A derived type object is a data structure that can hold several individual variables (think of an object as a "bucket of variables").

We use the following objects to pass data between subroutines and functions:

Object Type Description
Input_Opt Read-only Contains inputs for GEOS-Chem as read from the input.geos file. This incldues the switches that toggle various GEOS-Chem options and operations on or off.
State_Met Read-only Contains meteorological fields and other relevant input data.
State_Chm Read-write Contains species concentrations and related information, including the GEOS-Chem species database.

With this approach, passing additional variables to subroutines and functions is as simple as adding the variable to one of the above objects. (Think of dropping a new "variable" into the "bucket", and then passing the bucket around.)

For more information, please see our Derived type objects used by GEOS-Chem wiki page.

--Bob Yantosca (talk) 17:19, 2 December 2016 (UTC)

Replacing binary file I/O with netCDF

Since GEOS-Chem v9-01-03, we have started replacing unformatted binary I/O (i.e. the "binary punch file format") with netCDF I/O. This is a necessary step towards running GEOS-Chem in high-performance computing (HPC) environments.

The HEMCO emissions component, which was introduced in GEOS-Chem v10-01, now reads emissions and related data from COARDS-compliant netCDF files. This allowed us to remove much of the legacy emissions code from GEOS-Chem. We also converted our existing emissions data from binary punch format to netCDF format for HEMCO.

As of this writing (prior to the GEOS-Chem v11-01 release), we are working towards replacing the existing GEOS-Chem diagnostics (which archive data to binary punch format) with netCDF diagnostic output. We expect this to be completed by GEOS-Chem v11-02.

For more information about netCDF, please see our Preparing data files for use with HEMCO wiki page.

--Bob Yantosca (talk) 17:26, 2 December 2016 (UTC)

Restricting screen and log file output to the root CPU

NOTE: This feature was implemented into GEOS-Chem v9-01-03, release date 14 Sep 2012. We continue to add this feature to new GEOS-Chem routines.

You will see several statements such as WRITE( 6, 100 ) or PRINT* placed throughout the GEOS-Chem source code. These statements will print text to the Unix stdout stream. (Stdout is Unix-speak for the text that gets printed to your screen.) As described in Chapter 6.2.1 of the GEOS-Chem Users' Guide, you can redirect the stdout stream to a log file with a command such as:

geos > log &

This command is known as a redirect.

GEOS-Chem currently uses OpenMP, which parallelizes individual DO loops. Because GEOS-Chem's WRITE and PRINT statements mostly occur outside of parallelized DO loops, this means that only one CPU (the "root CPU") is trying to write text to stdout.

GEOS-Chem will connect to the NASA GEOS-5 GCM via an interface that utilizes the Earth System Model Framework (ESMF) library. ESMF employs Message Passing Interface (MPI) parallelization to run the combined GEOS-Chem/GEOS-5 GCM on hundreds of CPUs. Each individual CPU will execute its own GEOS-Chem simulation for a small sub-domain of the world (i.e. a single vertical column or group of several adjacent vertical columns). The sum total of all of these individual simulations will comprise the global GEOS-Chem/GEOS-5 GCM simulation.

When using MPI parallelzation, all GEOS-Chem processes—including writing text to stdout—will occur on each of the CPUs on the computational cluster. In order to avoid writing the same text messages over and over to stdout, we must take some extra precautions. We have chosen to restrict printing informational text messages (other than error messages) to the root CPU. However, we must also allow these text messages to print when GEOS-Chem is used in the traditional manner.

Starting in GEOS-Chem v9-01-03, you will see a new argument, am_I_Root, passed to many of GEOS-Chem's key subroutines. Right now we have focused on adding am_I_Root to routines that are part of the Chemistry Component. (In the future we will extend this to all GEOS-Chem subroutines.) We use am_I_Root to wrap existing WRITE and PRINT statements.

If you looked at subroutine READ_INPUT_FILE (in input_mod.F) from GEOS-Chem v9-01-02 or previous versions, you would have seen these WRITE statements:


      . . .

      WRITE( 6, '(a  )' ) REPEAT( '=', 79 )
      WRITE( 6, '(a,/)' ) 'G E O S - C H E M   U S E R   I N P U T'
      WRITE( 6, 100   ) TRIM( FILENAME )
 100  FORMAT( 'READ_INPUT_FILE: Reading ', a )

But in if you look at the same subroutine in GEOS-Chem v9-01-03 and higher versions, you will see this code:

      SUBROUTINE READ_INPUT_FILE( am_I_Root, Input_Opt, RC )

     . . .
      LOGICAL, INTENT(IN) :: am_I_Root   ! Is this the root CPU?
      . . .

      IF ( am_I_Root ) THEN
         WRITE( 6, '(a  )' ) REPEAT( '=', 79 )
         WRITE( 6, '(a,/)' ) 'G E O S - C H E M   U S E R   I N P U T'
         WRITE( 6, 100   ) TRIM( FILENAME )
 100     FORMAT( 'READ_INPUT_FILE: Reading ', a )

The am_I_Root argument sets apart all WRITE and PRINT* statements. If you are running a "traditional" GEOS-Chem simulation (i.e. without connecting to the GEOS-5 GCM), then am_I_Root is set to .TRUE. in the driver program main.F:

     ! When connecting G-C to an external GCM, we need to only write 
     ! to stdout if we are on the root CPU.  Otherwise this will slow
     ! down the code.  This is why we introduced the am_I_Root logical
     ! variable.
     ! However, if we are using the "traditional" G-C, then we don't
     ! need to restrict I/O to the root CPU.  Therefore, we can just
     ! set am_I_Root = .true. here and then have it propagate down to
     ! all of the lower-level routines.  The main.F routine is not
     ! called when connecting G-C to an external GCM. 
     ! (mlong, bmy, 7/30/12)
     LOGICAL, PARAMETER       :: am_I_Root = .TRUE. 

     . . .

     !            ***** I N I T I A L I Z A T I O N *****

     ! Read input file and call init routines from other modules
     CALL READ_INPUT_FILE( am_I_Root, Input_Opt, RC ) 

     . . .      

and is then passed to READ_INPUT_FILE and other lower-level GEOS-Chem subroutines.

For combined GEOS-Chem/GEOS-5 GCM simulations, the am_I_Root function is defined with the library function MAPL_Am_I_Root. This function returns .TRUE. if the current processor is the root processor, or .FALSE. otherwise. Any WRITE statements within an IF ( am_I_Root ) block (such as the above example from subroutine READ_INPUT_FILE) will only execute on the root CPU. This will print informational messages only once instead of hundreds or thousands of times.

--Bob Y. 12:51, 10 December 2012 (EST)

Using findFreeLUN to assign logical unit numbers for file I/O

NOTE: This feature was implemented into GEOS-Chem v9-01-03, release date 14 Sep 2012.

Prior to GEOS-Chem v9-01-03, all logical unit numbers (LUNs) used for Fortran file I/O were pre-defined as PARAMETERs in module GeosUtil/file_mod.F, as shown below:

      ! Logical file unit numbers for ...
      INTEGER, PUBLIC, PARAMETER :: IU_RST     = 1   ! Tracer restart file
      INTEGER, PUBLIC, PARAMETER :: IU_CHEMDAT = 7   ! "chem.dat" 
      INTEGER, PUBLIC, PARAMETER :: IU_FASTJ   = 8   ! FAST-J input files
      INTEGER, PUBLIC, PARAMETER :: IU_GEOS    = 10  ! "input.geos" 
      INTEGER, PUBLIC, PARAMETER :: IU_BPCH    = 11  ! "ctm.bpch" 
      INTEGER, PUBLIC, PARAMETER :: IU_ND20    = 12  ! "rate.YYYYMMDD"   
      INTEGER, PUBLIC, PARAMETER :: IU_ND48    = 13  ! ND48 output
      INTEGER, PUBLIC, PARAMETER :: IU_ND49    = 14  ! "tsYYYYMMDD.bpch" 
      INTEGER, PUBLIC, PARAMETER :: IU_ND50    = 15  ! "ts24h.bpch"
      INTEGER, PUBLIC, PARAMETER :: IU_ND51    = 16  ! "ts10_12am.bpch" etc.
      INTEGER, PUBLIC, PARAMETER :: IU_ND51b   = 23  ! for ND51b diagnostic
      INTEGER, PUBLIC, PARAMETER :: IU_ND52    = 17  ! ND52 output (NRT only)
      INTEGER, PUBLIC, PARAMETER :: IU_PLANE   = 18  ! "plane.log"
      INTEGER, PUBLIC, PARAMETER :: IU_BC      = 19  ! TPCORE BC files
      INTEGER, PUBLIC, PARAMETER :: IU_BC_NA   = 20  ! TPCORE BC files: NA grid
      INTEGER, PUBLIC, PARAMETER :: IU_BC_EU   = 21  ! TPCORE BC files: EU grid
      INTEGER, PUBLIC, PARAMETER :: IU_BC_CH   = 22  ! TPCORE BC files: CH grid
      INTEGER, PUBLIC, PARAMETER :: IU_FILE    = 65  ! Generic file
      INTEGER, PUBLIC, PARAMETER :: IU_TP      = 69  ! "YYYYMMDD.tropp.*"
      INTEGER, PUBLIC, PARAMETER :: IU_PH      = 70  ! "YYYYMMDD.phis.*"
      INTEGER, PUBLIC, PARAMETER :: IU_I6      = 71  ! "YYYYMMDD.i6.*"
      INTEGER, PUBLIC, PARAMETER :: IU_A6      = 72  ! "YYYYMMDD.a6.*"
      INTEGER, PUBLIC, PARAMETER :: IU_A3      = 73  ! "YYYYMMDD.a3.*"
      INTEGER, PUBLIC, PARAMETER :: IU_A1      = 74  ! "YYYYMMDD.a1.*"
      INTEGER, PUBLIC, PARAMETER :: IU_GWET    = 75  ! "YYYYMMDD.gwet.*"
      INTEGER, PUBLIC, PARAMETER :: IU_XT      = 76  ! "YYYYMMDD.xtra.*"
      INTEGER, PUBLIC, PARAMETER :: IU_CN      = 77  ! "*"
      INTEGER, PUBLIC, PARAMETER :: IU_SMV2LOG = 93  ! "smv2.log"
      INTEGER, PUBLIC, PARAMETER :: IU_DEBUG   = 98  ! Reserved for debugging
      INTEGER, PUBLIC, PARAMETER :: IU_OAP     = 99  ! soaprod.YYYYMMDDhh 

We assigned a unique LUN value to each different type of GEOS-Chem input or output file. This ensured that data would get written to the proper file.

When GEOS-Chem connects to the GEOS-5 GCM, however, we can no longer rely on these pre-defined LUNs. The GEOS-5 GCM assign LUNs to files based on availability. The GEOS-5 GCM will check for the next free LUN, and then use that to open a file for input or output.

For better compatibility with the GEOS-5 GCM, we have removed the pre-defined LUNs from file_mod.F. File LUNs are now made local to routines or modules instead of being centrally located within GeosUtil/file_mod.F. Eric Nielsen (from GSFC) has created a new function findFreeLUN (contained within module GeosUtil/inquireMod.F90) that can be used to search for LUNs that are not already in use. You will see the following calls to findFreeLUN wherever GEOS-Chem reads a non-netCDF file from disk:

      USE inquireMod, ONLY : findFreeLUN

      . . .

      ! LUN is now declared locally and not in file_mod.F
      INTEGER :: IU_FILE   

      . . .

      ! Look for a free file LUN
      IU_FILE = findFreeLun()
      ! Open file

      . . .

      ! Close the file
      CLOSE( IU_FILE )

Unlike all of the other LUNs, IU_BPCH is referred to from several different routines within the source code. Therefore, we had to leave this in n GeosUtil/file_mod.F is IU_BPCH. But we have converted this from a PARAMETER to a regular variable:

      INTEGER, PUBLIC, PARAMETER :: IU_BPCH    = 11  ! "ctm.bpch" 


We now use the findFreeLUN function to initialize IU_BPCH in GeosCore/input_mod.F.

NOTE: The LUN IU_BPCH only gets used in those sections of GEOS-Chem that will not be called by the GEOS-5 GCM.

--Bob Y. 12:51, 10 December 2012 (EST)

Error handling and traceback

NOTE: As of v11-01, GEOS-Chem's error trapping is not completely implemented. We will complete this in a future version.

As mentioned in this wiki post, we now use derived-type objects in order to pass data between GEOS-Chem routines. A typical GEOS-Chem subroutine will now take the following arguments:

      SUBROUTINE MY_GEOS_CHEM_SUB( am_I_Root, Input_Opt, State_Met, State_Chm, RC  )
! !USES:
      USE GIGC_ErrCode_Mod
      USE GIGC_Input_Opt_Mod, ONLY : OptInput
      USE GIGC_State_Chm_Mod, ONLY : ChmState
      USE GIGC_State_Met_Mod, ONLY : MetState
      LOGICAL,        INTENT(IN)    :: am_I_Root   ! Are we on the root CPU?
      TYPE(OptInput), INTENT(IN)    :: Input_Opt   ! Input Options object
      TYPE(MetState), INTENT(IN)    :: State_Met   ! Meteorology State object
      TYPE(ChmState), INTENT(INOUT) :: State_Chm   ! Chemistry State object
      INTEGER,        INTENT(OUT)   :: RC          ! Success or failure?

The RC (return code) argument will be set to one of the PARAMETER values contained in module file Header/gigc_errcode_mod.F90:

  INTEGER, PUBLIC, PARAMETER :: GIGC_SUCCESS =  0   ! Routine returns success
  INTEGER, PUBLIC, PARAMETER :: GIGC_FAILURE = -1   ! Routine returns failure

If the subroutine finishes normally, then we assign:


and then exit normally. On the other hand, if the subroutine dies with a catastrophic error, we assign:


This shall cause GEOS-Chem to cease program execution and return to the calling routine. In the calling routine we shall have an IF statement to determine if the subroutine finished normally:


If the subroutine finished normally, then execution is allowed to proceed. Otherwise, GEOS-Chem program flow shall exit the calling routine and return to the subroutine one level higher (which shall return to the subroutine one level higher than that, etc). In this way, GEOS-Chem shall propagate the error from the location where it occurred all the way back up to the main "driver" routine, which shall display an error message and shut down the simulation gracefully.

As of this writing (Jan 2013), we have added the RC argument to many GEOS-Chem subroutines but we have not fully implemented the error trapping. The work is ongoing.

--Bob Y. 11:16, 15 January 2013 (EST)

The DEVEL C-preprocessor switch

NOTE: We made heavy use of the DEVEL switch was used heavily when we were modifying GEOS-Chem to accept derived-type objects as input. As of this writing (v11-01), most of the #if defined( DEVEL ) blocks have been removed from GEOS-Chem. But we still continue to use this technique when we need to preserve both old code and new code in the same subroutine for testing.

GEOS-Chem HPC development resembles a highway construction project. Consider a new bridge that is being constructed alongside an existing bridge. In order to prevent major traffic disruptions, vehicles will continue to travel across the old bridge while the new bridge is being built. At the end of the construction project, traffic is finally rerouted over the new bridge, and the old bridge is taken down.

In much the same way, we are adding new sections of source code to GEOS-Chem that will allow it to connect to the NASA GEOS-5/GCM. In order to prevent disruptions to the normal GEOS-Chem workflow, we have segregated these sections of new code from existing GEOS-Chem code with C-preprocessor switches. This allows us to activate the new code for testing, while leaving the existing code untouched.

GIGC Bridge.jpg

If you look through the GEOS-Chem source code routines, you will a bunch of #if defined( DEVEL ) ... endif blocks. DEVEL stands for "Development code". Source code located within these #if blocks will not execute unless you activate the DEVEL switch at compile time. You can safely ignore these #if blocks for the time being. But be aware that the new code within these blocks will eventually replace the GEOS-Chem source code.

Using DEVEL to test HPC updates

NOTE: The STT field was removed from GEOS-Chem v11-01 and higher versions. Also, the State_Chm object is now located in state_chm_mod.F90. While the code is different in the most recent GEOS-Chem versions, the methodology described below is still valid.

We typically use the DEVEL switch to add new sections of code into existing GEOS-Chem subroutines. We frequently use this method to introduce new derived type objects into GEOS-Chem. Each derived type object is a "bucket" of variables that may hold one or more scalar or array fields. We objects to pass data between subroutines for better compatibility with the Earth System Model Framework, which controls the flow of information between components of the NASA GEOS-5/GCM.

In the example below, we use DEVEL to pass tracer concentration information in/out of the subroutine with a derived type object named State_Chm instead of using the STT tracer array.

#if defined( DEVEL ) 
      SUBROUTINE MY_SUB( State_Chm, ... )             ! New code: Pass State_Chm via the argument list 
      SUBROUTINE MY_SUB( ... )                        ! Old code: Keep the existing argument list

#if defined( DEVEL )
      USE GIGC_State_Chm_Mod, ONLY : ChmState         ! New code: Get the derived type for State_Chm
      USE TRACER_MOD,         ONLY : STT              ! Old code: Get STT directly from TRACER_MOD with a USE statement

. . .
#if defined( DEVEL )
      TYPE(ChmState), INTENT(INOUT) :: State_Chm      ! New code: Declare State_Chm as an input/output argument
      REAL*8,         POINTER       :: STT(:,:,:,:)   ! New code: Declare STT a local pointer variable
. . .

      !%%% START OF SUBROUTINE %%%
#if defined( DEVEL )
      STT => State_Chm%TRACERS                        ! New code: let STT point to the State_Chm%TRACERS field
#endif                                                ! This allows you to keep all of the other instances
                                                      ! of STT in the existing code intact without having to
                                                      ! modify them
. . .
      !%%% END OF SUBROUTINE %%%
#if defined( DEVEL )
      NULLIFY( STT )                                  ! New code: Nullify the STT pointer so that we no longer
#endif                                                ! point to State_Chm before leaving the subroutine


The default behavior will be to accept the old code and ignore the new code. But if we compile GEOS-Chem with the DEVEL=yes option, the new code will be activated, and the old code will be ignored. Having both instruction sets in the same subroutine allows us to debug the model to make sure that the new code is functioning as expected.

We recognize that it is burdensome to keep both new and old code in the subroutine indefinitely. Once the new code has been validated, we shall remove the remaining sections of old code, as well as any remaining DEVEL switches.

--Bob Y. 11:38, 18 April 2013 (EDT)

Update December 2012

In GEOS-Chem v9-02d and higher versions, we have standardized a significant amount of code that had been previously been set apart in #if defined( DEVEL ) blocks. The old code and DEVEL blocks have been removed from these routines.

We are currently using #if defined( DEVEL ) blocks to replace the existing STT and CSPEC arrays with fields from the Chemistry State object (named State_Chm).

--Bob Y. 12:35, 14 December 2012 (EST)

Update April 2013

In our development branch, we have integrated many of the DEVEL blocks into the mainline code. Many module arrays (i.e. met fields, STT tracer array, etc) are now replaced with derived type objects.

--Bob Y. 11:39, 18 April 2013 (EDT)

The EXTERNAL_GRID and EXTERNAL_FORCING C-preprocessor switches

In addition to the the DEVEL C-preprocessor switch, we have also introduced two additional C-preprocessor switches named EXTERNAL_GRID and EXTERNAL_FORCING. These are intended to be set whenever GEOS-Chem needs to do something special for connecting to an external GCM (such as the NASA GEOS-5 GCM).

In many cases, EXTERNAL_GRID and EXTERNAL_FORCING can be synonyms for DEVEL. In many locations in GEOS-Chem you will see C-preprocessor blocks with all three switches, such as this one in setemis.F:

#if defined( DEVEL ) || defined( EXTERNAL_GRID ) || defined( EXTERNAL_FORCING )
                     ! Add this error trap to prevent out of bounds error
                     ! but we should benchmark first before adding to
                     ! the std G-C code (bmy, 8/2/12)
                     IF ( JLOOP == 0 ) CYCLE

However, there are other instances where we will use EXTERNAL_GRID and EXTERNAL_FORCING without using DEVEL. The DEVEL switch allows testing of grid-independent modifications in the standard GEOS-Chem. Therefore, if we need to make modifications that will only get activated when we are connecting GEOS-Chem to an external GCM, we should exclude these from #if defined( DEVEL ) blocks.

For example, this #if block in Headers/comode_loop_mod.F allows us to use the same setting as the standard GEOS-Chem when DEVEL=yes, which facilitates debugging and comparison to the "traditional" GEOS-Chem. But if we are connecting to an external GCM, we will use a different setting.

#if defined( EXTERNAL_GRID ) || defined( EXTERNAL_FORCING )
      !     %%%%% CONNECTING TO GEOS-5 GCM via ESMF INTERFACE %%%%%
      ! KBLOOP is the # of boxes that SMVGEAR will process per CPU.
      ! Set KBLOOP=1 for connecting to an external GCM
      ! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      ! %%% NOTE: If you are using GEOS-Chem without ESMF, but with  %%%
      ! %%% the DEVEL=yes option (i.e. to test grid-independent      %%%
      ! %%% updates w/r/t a standard G-C simulation), then you must  %%%
      ! %%% make sure that KBLOOP is set to the same value in both   %%%
      ! %%% simulations.                                             %%%
      ! %%%                                                          %%%
      ! %%% The absolute and relative errors (which determine if the %%% 
      ! %%% chemistry has converged to a solution) are computed over %%%  
      ! %%% all KBLOOP boxes at once.  Using different KBLOOP values %%%
      ! %%% in different simulations will cause slightly different   %%%
      ! %%% results in chemical concentrations (even after only one  %%%
      ! %%% timestep).                                               %%%
      ! %%%                                                          %%%
      ! %%% To this end, we now only set KBLOOP=1 if we are          %%%
      ! %%% connecting GEOS-Chem to an external GCM (i.e. if the Cpp %%%
      ! %%% switches EXTERNAL_GRID or EXTERNAL_FORCING are set in    %%%
      ! %%% define.h).                                               %%%
      ! %%%                                                          %%%
      ! %%%    -- Bob Yantosca (14 Aug 2012)                         %%%
      ! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
      !              %%%%% TRADITIONAL GEOS-Chem %%%%%
      ! KBLOOP is the # of boxes that SMVGEAR will process per CPU.
      ! For "traditional" G-C simulations, leave KBLOOP = 24

Also note that we shall endeavor to denote in the comments which section of code is for connecting to the external GCM and which section of code is meant for the traditional GEOS-Chem (i.e. w/o the ESMF interface).

--Bob Y. 13:47, 10 December 2012 (EST)