Difference between revisions of "GCHP v11-02"

From Geos-chem
Jump to: navigation, search
(GCHP is incompatible with GNU Compiler Collection(GCC) v6 due to use of ESMF v5)
(Outstanding HP issues not yet resolved in GEOS-Chem v11-02)
Line 293: Line 293:
  
 
--[[User:Sebastian D. Eastham|Sebastian D. Eastham]] ([[User talk:Sebastian D. Eastham|talk]]) 16:01, 20 June 2017 (UTC)
 
--[[User:Sebastian D. Eastham|Sebastian D. Eastham]] ([[User talk:Sebastian D. Eastham|talk]]) 16:01, 20 June 2017 (UTC)
 
  
 
=== GCHP is incompatible with GNU Compiler Collection(GCC) v6 due to use of ESMF v5 ===
 
=== GCHP is incompatible with GNU Compiler Collection(GCC) v6 due to use of ESMF v5 ===

Revision as of 21:30, 14 July 2017

GCHP Home

Overview

GEOS-Chem v11-02 with the high performance option enabled (GCHP) features the same science as GEOS-Chem using the standard "classic" capability (GCC) but operates on a cubed-sphere grid and is parallelized using a message-passing interface (MPI) implementation. GCHP improves upon GCC by (1) enabling more accurate transport through elimination of the polar singularity inherent to lat-lon grids, and (2) providing efficient scaling across multiple machines making finer resolution global simulations possible.

Running GEOS-Chem with the high performance option requires use of source code from two repositories maintained under version control. This page documents the mapping between the two repositories as well as the features and bug fixes added specifically to the HP-only repository called GCHP. Updates to the GCHP repository are maintained on this page separately from the primary GEOS-Chem v11-02 wiki page because they are primarily structural updates. Scientific updates from the community will continue to be implemented in the primary GEOS-Chem repository and documented on the GEOS-Chem v11-02 wiki page.

For clarity, we use a separate version numbering system for the HP-only repository, specifically of form vX.Y.Z. The first digit X changes with large-scale updates to the GCHP infrastructure. The second digit Y changes when GCHP is no longer backwards-compatible with the other GEOS-Chem source code repository. The third digit Z changes when there are important updates to the high performance infrastructure and backwards-compatibility with the other repository is maintained.

Version Mapping and Status

Currently GCHP is not a git sub-module of the primary GEOS-Chem repository so we rely on git tags to manually map between the two code bases. This will likely change in the future when every download of GEOS-Chem is expected to include the high performance capability. Until then, please refer to this page to ensure that you are using the correct repository versions to run your simulations. To checkout a git tag simply type:

git checkout tags/tag_name

The table below maps GEOS-Chem versions and status with the compatible versions of the high performance option (GCHP) and unit tester (UT). The initial version, v1.0.0, which is not compatible with v11-02, is included for a complete history. All versions appear as git tags within the master branches of the repositories linked to in the table heading.

GCHP GEOS-Chem UT Download Status Validation Status
v1.1.0 v11-02b v11-02b Recommended 1-month benchmark evaluation in progress (released 15 June 2017)
GCHP_v1.0.0 GCHP_v1.0.0 GCHP_v1.0.0 previous version Internal benchmark completed April 2017

New Features in this Version

HP v1.1.0

This version will be included in the 1-month benchmark of GEOS-Chem v11-02b. Please see the approval form for the 1-month benchmark simulation v11-02b with high performance option for complete information on validation of this version.

Feature Submitted by Type Status
Features affecting the full-chemistry simulation:
Enable use of GFortran compiler with GCHP Seb Eastham (Harvard) Structural Pending benchmark approval
Update handling of the species restart file
  1. Allow omission of advected species from the restart file
  2. Add option to skip use of a species restart file and initialize all species concentrations from the species database
Lizzie Lundgren (GCST) Structural " "
Enable read-in of day-of-week scale factors in ExtData Seb Eastham (Harvard) Structural " "
Minor structural updates for compatibility with v11-02a modifications Lizzie Lundgren (GCST) Structural " "
Correct handling of monthly climatology in December Lizzie Lundgren (GCST) Bug fix " "
Enable starting simulation in early January Seb Eastham (Harvard) Bug fix " "
Correct compile failure if using certain versions of gmake Seb Eastham (Harvard) Bug fix " "

HP v1.0.0

GEOS-Chem with HP v1.0.0 is a preliminary version of the HP option in GEOS-Chem that is compatible with a version of GEOS-Chem base code built off of v11-01. As an initial version of GCHP it is included on this page for completeness even though it is not compatible with GEOS-Chem v11-02. Note that v1.0.0 did not undergo a formal benchmark review by the GEOS-Chem Steering Committee. Please see GEOS-Chem v11-01 HP validation wiki page for information about the internal benchmark of this version performed by the GCHP Development Team.

Feature Submitted by Type Status
Features affecting the full-chemistry simulation:
Initial version for validation of scientific fidelity Mike Long (GCST)
Seb Eastham (Harvard)
Jiawei Zhuang (Harvard)
Bob Yantosca (GCST)
Lizzie Lundgren (GCST)
General Internal benchmark complete (April 2017)

In the pipeline

Feature Submitted by Type Status
On hold

Validation

In this section we provide information about the benchmarks and tests that we have done to validate the high performance option of GEOS-Chem v11-02.

1-month and 1-year benchmarks

For complete information about the benchmark simulations used to validate GEOS-Chem v11-02, please see our GEOS-Chem v11-02 benchmark history wiki page.

Unit tests

We currently do not perform unit tests on high performance GEOS-Chem.

Previous HP issues now resolved in GEOS-Chem v11-02

Day-of-week scale factors not read in by ExtData

This update was introduced in v1.1.0 and validated in the 1-month benchmark of GEOS-Chem v11-02b

Seb Eastham (Harvard) implemented an update in MAPL to allow use of day-of-week scale factors with emissions. He wrote:

Data which varies by day-of-the-week can now be read in. Such data must be contained within a single file, and must have exactly 7 or 84 time entries. If 7 entries, these must correspond to each day of the week, starting with Sunday. If 84, this must be 12 7-day sets, one set for each month. Time information will be largely ignored. This read-in mode can be selected by specifying "D" for the "Cyclic" flag in ExtData.

--Lizzie Lundgren (talk) 20:26, 12 June 2017 (UTC)

GFortran compiler not compatible with GCHP

This update was introduced in v1.1.0 and validated in the 1-month benchmark of GEOS-Chem v11-02b

Seb Eastham (Harvard) implemented updates to enable compilation of the high performance option of GEOS-Chem with GFortran. He wrote:

Modifications to makefiles required for the build process to recognize, and correctly handle, the use of GCC as the only compiler. This set up was tested with GCC 5.4.0. The code was successfully compiled and run with MVAPICH2, and included the bug fixes in the next few commits:
  • The species identification routine in gchp_utils fails if "EOF" is reported from read_one_line, but EOF is never set. This has been fixed.
  • GC_F_INCLUDE is now only added to the "INC_NETCDF" variable if it is defined. Previously it was added once, and then again if defined, resulting in it being added twice.
  • The State_Met field MOLENGTH was being populated even when the array had not been allocated. This behavior resulted in a segfault with GFortran.
  • The default LUN for log writing is 700 in the code, but can be overwritten from the input directory. However, if the user selects LUN 6, this can cause problems, as units 5 and 6 are reserved for STDIN and STDOUT. ifort seemed to be able to cope but GFortran would die with mysterious errors. To prevent this, the code now stops if the user tries to select either of these LUNs.

--Lizzie Lundgren (talk) 20:31, 12 June 2017 (UTC)

Restart file must contain all advected species

This update was introduced in v1.1.0 and validated in the 1-month benchmark of GEOS-Chem v11-02b

Lizzie Lundgren (GCST) implemented two updates to allow more flexibility in the use of restart files.

  1. Remove the requirement of including all advected species in the restart file: MAPL now changes the internal state RESTART attribute per species from initial category MAPL_RestartOptional to new category MAPL_RestartBootstrap if species are bootstrapped (default values used) rather than initialized from the restart file. This enables a later step of over-writing only the bootstrapped species with background concentration values stored in the species database, ultimately allowing the user to omit species from the restart file.
  2. Remove the requirement of using a restart file: If the user now enters '+none' for GIGCchem_INTERNAL_RESTART_FILE in the GCHP.rc config file, then new RESTART category MAPL_RestartSkipInitial is set for all species during initialization of the internal state. This has the same effect as using existing category MAPL_RestartSkip except that (1) background values from the species database are retrieved to overwrite default values set in MAPL, and (2) species are written to the output checkout point.

--Lizzie Lundgren (talk) 20:53, 12 June 2017 (UTC)

Bug in handling of monthly climatology in December

This update was introduced in v1.1.0 and validated in the 1-year benchmark of GEOS-Chem HP v11-02b

Lizzie Lundgren (GCST) discovered a MAPL bug where the right time bracket is not correctly handled for climatology files during December simulations. This bug does not affect GCHP simulations at other times of the year.

Special climatology file handling is needed for edge months of the year (December and January) due to the year change and month number wrap-around. Special handling exists if the current month is January to properly get the climatology for the left bracket (previous) month. Similar handling should occur if the current month is December to properly get the climatology for the right bracket (next) month. Instead, the right bracket special handling is applied only if the current time is January and also improperly sets the right bracket month to December. This results in treating December as any other month for calculation of the right time bracket, thereby iterating the month number by one and subsequently attempting to access data for month 13.

The fix is to apply the right bracket special handling if the current month is December and to set variable imm to January. Below is the updated code in GCHP/Shared/MAPL_Base/MAPL_ExtDataGridCompMod.F90 subroutine UpdateBracketTime, with the bug fix shown in red:

          if (bSide == "R") then
             found=.false.
             newFile=.true.
             status = ESMF_SUCCESS
             If (MAPL_Am_I_Root().and.(Ext_Debug > 19)) Write(*,'(a,a,a,I5,x,2L1)') ' DEBUG: Sanity check on file ', trim(file_processed), ' with flags: ', status, status==ESMF_SUCCESS,found
             do while ((status==ESMF_SUCCESS).and.(.not.found))
                if (trim(cyclic)=='y') then
                   call ESMF_TimeGet(cTime,yy=iyr,mm=imm,dd=idd,h=ihr,m=imn,s=isc,__RC__)
                   if (imm == 12) then
                      cYear = iyr + 1
                      ! change year you will read from
                      iyr = climYear-1
                      call ESMF_TimeSet(readTime,yy=iyr,mm=imm,dd=idd,h=ihr,m=imn,s=isc,__RC__)
                      iyr = climYear
                      ! change month of file
                      imm = 1

--Lizzie Lundgren (talk) 15:58, 21 June 2017 (UTC)

Compile failure if using certain versions of gmake

This update was introduced in v1.1.0

Seb Eastham (Harvard) reported a bug that causes compile failure with certain versions of gmake. There is a typo in Line 72 of GCHP/Shared/GFDL_fms/GNUmakefile where the SUFFIXES command is called as follows:

  SUFFIXES: += .inc. 

The SUFFIXES command automatically adds its arguments to the existing list and therefore the extra "+=" is unnecessary and may get misinterpreted or rejected by some versions of gmake. If that happens then gmake will print the following message immediately after starting the GFDL_fms build and then skip the GFDL_fms build entirely:

  GNUmakefile:72: *** empty variable name.  Stop.

Note that compilation does not immediately fail and continues until when code needs to use libGFDL_fms.a. As a result, the signature of this error would probably be an error like this:

  File not found: ….libGFDL_fms.a

The solution is to remove the "+/=" from Line 72 of GCHP/Shared/GFDL_fms/GNUmakefile.

--Lizzie Lundgren (talk) 18:38, 12 July 2017 (UTC)

Simulation crashes if started prior to the first MMDD in climatology files

This update was introduced in v1.1.0 and validated in the 1-year benchmark of GEOS-Chem HP v11-02b

Prior to GCHP v1.1.0 there was a bug where MAPL subroutine UpdateBracketTime would exit if the current datetime preceded the first datetime of a climatology dataset. This occurred if starting a simulation in January prior to January 15th which is the first date in the monthly XLAI 2008 files used as climatology. The bug resulted in run failure.

Seb Eastham (Harvard) corrected this issue by removing the forced exit and replacing it with returning ESMF_FAILURE if both the time precedes the earliest climatology file time and no file is found. For XLAI, a file is found and thus no failure is triggered. Updating bracket time to successfully look for the December file proceeds later in the routine as part of the existing handling of climatology edge months.

--Lizzie Lundgren (talk) 20:25, 12 July 2017 (UTC)

Outstanding HP issues not yet resolved in GEOS-Chem v11-02

HEMCO restart variables are not read into ExtData

GCHP is not currently compatible with the HEMCO restart file and all HEMCO restart variables are set to /dev/null in ExtData.rc. This is an open issue which will be addressed in v11-02.

--Lizzie Lundgren (talk) 21:02, 13 June 2017 (UTC)

Time can only flow forward in MAPL

Time must be able to flow backwards in all components of GEOS-Chem in order for the adjoint to be compatible with GCHP. Seb Eastham (Harvard) writes:

To run GCHP backwards, I think there are two phases of work that will be needed.

The first is just reconditioning MAPL to be able to deal with backwards flow of time. This means disabling some of the hardcoded error handling, such as in the case Melissa pointed out where we ran afoul of the line "ASSERT_(HEARTBEAT_DT>=0" in MAPL_Cap.F90. Some more significant plumbing work will also be needed to deal with the way that modules are called. Routines in ESMF/MAPL run when "alarms" are triggered, meaning that the time has progressed past a certain point. My expectation is that the majority of the changes will just involve adding logic for a negative timestep which flips inequalities, i.e. "If time > alarm_time" becomes "If time < alarm_time". In these instances the absolute value of "time" doesn't matter, just how much has passed since the last alarm. Most of the rest of the code is pretty agnostic about the flow of time, caring only about what the current value is.

The second phase will be modifying the exceptions to this agnosticism, by far the biggest of which is ExtData. This is because a) the time dimensions in all input files are required to increase with index, which complicates the idea of just reversing everything, and b) ExtData must at all times know what file is coming next so that it can perform interpolation. Again, this doesn't seem too difficult to solve, but it will require either a creative solution or a lot of logical tests. A lesser but related case will be the modifications needed to the output module, History.

--Lizzie Lundgren (talk) 21:12, 13 June 2017 (UTC)

Timestep midpoint cos(SZA) is set to cos(SZA) at start of timestep

State_Met variables SUNCOS and SUNCOSmid are both set to internal state variable zenith which is extracted at the start of each chemistry timestep in MAPL. This causes out-of-the-box differences with GEOS-Chem "classic" (GCC). This is an open issue which will be addressed in v11-02 as part of our efforts to make GCHP and GCC out-of-the-box standard simulations more comparable.

GFortran + MVAPICH2 runs hang if using multiple nodes

While GCHP successfully compiles with GFortran starting in v1.1.0 (compatible with GEOS-Chem v11-02b), runs hang with an MPI_WAIT error if using cores across multiple nodes. This issue is not guaranteed; it has been observed on the Harvard Odyssey cluster with some builds, but there have also been successful GCHP simulations across multiple nodes when using a different set of GFortran and MVAPICH2 versions. The root cause is not yet known.

--Lizzie Lundgren (talk) 21:35, 15 June 2017 (UTC)

MPI_Wait error when running at coarse resolution with high-resolution input data

MAPL crashes when regridding wind vector data from very high resolutions (e.g. 0.25x0.3125 to C24) if running on a small number of cores (e.g. 6). The problem appears to be the production of invalid MPI instructions within the scatter and gather routines.

--Sebastian D. Eastham (talk) 16:01, 20 June 2017 (UTC)

GCHP is incompatible with GNU Compiler Collection(GCC) v6 due to use of ESMF v5

Yanko Davila (University of Colorado) reported the following problem encountered while compiling GCHP with GCC version 6.1.0. He wrote:

I’m trying to compile the latest version of GCHP as per the wiki and I’m finding something interesting with gfortran. On file ESMCI_WebServNetEsmfServer.C there is a problem with line 1798 with the equal sign. For some reason gcc version 6.1.0 complains about a wrong function overloading. The following change works for me.
@@ -1795,7 +1795,7 @@ void  ESMCI_WebServNetEsmfServer::copyFile(
        fstream fin(srcFilename, ios::in | ios::binary);
        fstream fout(destFilename, ios::out | ios::binary);

-       if ((fin == NULL)  ||  (fout == NULL))
+       if ((!fin.is_open())  ||  (!fout.is_open()))

Sebastian Eastham (Harvard) explains the fix as follows:

The GMAO MAPL infrastructure in GCHP uses ESMF v5, which unfortunately is known to be incompatible with GCC v6 specifically because GCC v6 does not support the if (NULL == file) construct (https://trac.macports.org/ticket/52427). However, ESMF v7 - which GMAO are currently in the process of testing in an unstable branch of MAPL - does support GCC v6.

Due to the dependency of GCHP on ESMF v5 at this time and the expected update to ESMF v7 in the near future, we are not applying this fix into the GCHP source code for distruction. However, if you encounter this issue you may modify your local code using this fix.

--Lizzie Lundgren (talk) 21:29, 14 July 2017 (UTC)