Difference between revisions of "GCHP Run Configuration Files"

From Geos-chem
Jump to: navigation, search
(ExtData.rc)
(Overview)
Line 15: Line 15:
  
 
GCHP is controlled using a set of resource configuration files that are included in the GCHP run directory, most of which are denoted by suffix <tt>.rc</tt>. These files contain information required by GCHP including hardware allocation, initialization and runtime parameters, locations of necessary files, and where, what and how to read and write data to and from GCHP. Files include:  
 
GCHP is controlled using a set of resource configuration files that are included in the GCHP run directory, most of which are denoted by suffix <tt>.rc</tt>. These files contain information required by GCHP including hardware allocation, initialization and runtime parameters, locations of necessary files, and where, what and how to read and write data to and from GCHP. Files include:  
 +
#[[#GCHP.rc|GCHP.rc]]
 
#[[#CAP.rc|CAP.rc]]
 
#[[#CAP.rc|CAP.rc]]
 
#[[#ExtData.rc|ExtData.rc]]
 
#[[#ExtData.rc|ExtData.rc]]
#[[#GCHP.rc|GCHP.rc]]
 
#[[#HISTORY.rc|HISTORY.rc]]
 
#[[#input.nml|input.nml]]
 
 
#[[#input.geos|input.geos]]
 
#[[#input.geos|input.geos]]
 
#[[#HEMCO_Config.rc|HEMCO_Config.rc]]
 
#[[#HEMCO_Config.rc|HEMCO_Config.rc]]
 
#[[#HEMCO_Diagn.rc|HEMCO_Diagn.rc]]
 
#[[#HEMCO_Diagn.rc|HEMCO_Diagn.rc]]
 +
#[[#input.nml|input.nml]]
 +
#[[#HISTORY.rc|HISTORY.rc]]
  
 
Several run-time settings must be set consistently across multiple files. Inconsistencies may result in your program crashing or yielding unexpected results. To avoid mistakes and make run configuration easier, we therefore include bash shell script <tt>runConfig.sh</tt> in run directories to update the most commonly used settings in the configuration files. Sourcing this script will update other config files to use values specified in <tt>runConfig.sh</tt>. This is done automatically at run start if using any of the example run scripts provided with GCHP. The updated settings will be printed to the GCHP log file by default.
 
Several run-time settings must be set consistently across multiple files. Inconsistencies may result in your program crashing or yielding unexpected results. To avoid mistakes and make run configuration easier, we therefore include bash shell script <tt>runConfig.sh</tt> in run directories to update the most commonly used settings in the configuration files. Sourcing this script will update other config files to use values specified in <tt>runConfig.sh</tt>. This is done automatically at run start if using any of the example run scripts provided with GCHP. The updated settings will be printed to the GCHP log file by default.

Revision as of 16:14, 16 November 2020

Previous | Getting Started with GCHP | GCHP Main Page

  1. Hardware and Software Requirements
  2. Setting Up the GCHP Environment
  3. Downloading Source Code and Data Directories
  4. Compiling
  5. Obtaining a Run Directory
  6. Running GCHP: Basics
  7. Running GCHP: Configuration
  8. Output Data
  9. Developing GCHP
  10. Run Configuration Files


Overview

GCHP is controlled using a set of resource configuration files that are included in the GCHP run directory, most of which are denoted by suffix .rc. These files contain information required by GCHP including hardware allocation, initialization and runtime parameters, locations of necessary files, and where, what and how to read and write data to and from GCHP. Files include:

  1. GCHP.rc
  2. CAP.rc
  3. ExtData.rc
  4. input.geos
  5. HEMCO_Config.rc
  6. HEMCO_Diagn.rc
  7. input.nml
  8. HISTORY.rc

Several run-time settings must be set consistently across multiple files. Inconsistencies may result in your program crashing or yielding unexpected results. To avoid mistakes and make run configuration easier, we therefore include bash shell script runConfig.sh in run directories to update the most commonly used settings in the configuration files. Sourcing this script will update other config files to use values specified in runConfig.sh. This is done automatically at run start if using any of the example run scripts provided with GCHP. The updated settings will be printed to the GCHP log file by default.

Using runConfig.sh to configure common settings makes run configure much simpler, but also comes with peril. If you manually edit a config file setting that is also set in runConfig.sh then the setting in runConfig.sh will be used. For example, most frequency and duration settings for diagnostics in HISTORY.rc are set in runConfig.sh. File runConfig.sh must be edited rather than HISTORY.rc to change frequency and duration of output files. Please get very familiar with the options in runConfig.sh and be conscientious about not updating the same setting elsewhere.

While much of the labor of updating the various configuration files has been eliminated by runConfig.sh, it is still worth understanding what is happening under the hood with the other files. This page details the settings within those other configuration files and what they are used for.

Configuration File Descriptions

The following table lists the core functions of each of the configuration files in the GCHP run directory. See the individual subsections on each file for additional information.

File Primary function in GCHP
input.geos Controls which aspects of the chemistry simulation are run. Note that input.geos does not currently control input and output information such as restart file, met-field paths, and diagnostics.
CAP.rc Controls the simulation timing parameters such as start date and duration.
HEMCO_Config.rc Contains emissions information used by HEMCO
ExtData.rc Specifies where all input files are located. All files requested by GCHP must be specified in this file. Using default values can be given by specifying the path as “/dev/null”.
GCHP.rc Controls all major aspects of the simulation. Grid resolution, core distribution, timesteps, and restart files are be defined in this file.
HISTORY.rc Controls and directs output from GCHP.
HEMCO_Diagn.rc Contains HEMCO diagnostic information to map emissions listed in HISTORY.rc to HEMCO containers
input.nml Contains the configurable domain stack size.

CAP.rc

CAP.rc is the configuration file for the top-level "Cap" (in ESMF/MAPL lingo) component. It handles general runtime settings for GCHP including time and data parameters, performance profiling routines, and system-wide timestep (hearbeat). Combined with file cap_restart, CAP.rc configures the exact dates for the next run of GCHP.

Parameter Description
ROOT_NAME Sets the child name (GCHP) that MAPL will use to initialize the ROOT component within CAP. It uses this name in all operations when querying and interacting with ROOT.
ROOT_CF Resource configuration file for the ROOT component (GCHP.rc) which stores run information such as timesteps.
HIST_CF MAPL's HISTORY component resource configuration file (HISTORY.rc) which stores output configuration information such as variable names and grid.
BEG_DATE Begin date for the range of the experiment, in format YYYYMMDD hhmmss.
Set BEG_DATE to your simulation start date.
END_DATE End date for the range of the experiment, in format YYYYMMDD hhmmss.
Ensure END_DATE is later than or equal to your simulation end date.
JOB_SGMT Duration of each individual run, in format YYYYMMDD hhmmss. Should be shorter than or equal to the span of BEG_DATE and END_DATE.
Set JOB_SGMT to your simulation duration.
DEBUG_LEVEL Similar to the HEMCO VERBOSE flag, the debug flag enables debug output in GCHP. Values range from 0 (no debug output; default) to 20 (maximum amount of debug output).

Setting this flag to 20 is especially helpful in identifying problems concerning reading input files in ExtData.
Set DEBUG_LEVEL to 20 if having problems reading input files with ExtData.

TILEPATH Path to tile files used in ExtData. Default is symbolic link TileFiles set when setting up your GCHP run directory.
HEARTBEAT_DT The 'ticking' rate of the ESMF/MAPL internal clock, in seconds. At every tick of the heartbeat, ESMF queries all components to see if anything needs to be calculated based upon the components' defined timesteps. Default is 1800.
Set HEARTBEAT_DT to the minimum timestep set in GCHP.rc.
MAPL_ENABLE_TIMERS Enables output of MAPL's run-profile timers. Default is YES.
MAPL_ENABLE_MEMUTILS Enables runtime output of the programs' memory usage. Default is YES.
PRINTSPEC If enabled, runs the model until all components are initialized. Then, dumps all the names of the members of the internal import/export states and ends the model. Setting options include: 0 (default): Off, 1: Import & Export, 2: Import, and 3: Export.

ExtData.rc

ExtData.rc contains emissions information for use with MAPL and ESMF. For more information see the ExtData component page on the GEOS-5 wiki.

The information that ExtData.rc contains overlaps with GEOS-Chem configuration file HEMCO_Config.rc. However, during a GCHP run, emissions files are found using the paths in ExtData.rc rather than in HEMCO_Config.rc; the file paths in HEMCO_Config.rc are ignored. Unlike the HEMCO_Config.rc file, the ExtData.rc file also includes GMAO Meteorological data.

Note that meteorology data paths in input.geos are ignored by GCHP and are superceded by paths in ExtData.rc. See the GEOS-5 wiki for detailed information about ExtData component. If you run into a problem with met fields when you later run GCHP, first check ExtData.rc to ensure you have the correct data paths.

ExtData.rc includes the following information in a space-delimited single row for each primary export (e.g. met field), 2D variable at the edge (e.g. VEGFRAC_*), and emission field.

Info Name Description
Export Name Name of imported met field (e.g. ALBD) or HEMCO emissions container name (e.g. GEIA_NH3_ANTH).
Units Unit string nested within single quotes. '1' indicates there is no unit conversion from the native units in the netCDF file.
Clim Enter Y if the file is a 12 month climatology, otherwise enter N. If you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month.
Conservative Enter Y the data should be regridded in a mass conserving fashion through a tile file. F;{VALUE} can also be used for fractional regridding. Otherwise enter N to use the non-conervative bilinear regridding.
Refresh Time Template Possible values include:
  • -: The field will only be updated once the first time ExtData runs
  • 0: Update the variable at every step. ExtData will do a linear interpolation to the current time using the available data.
  • %y4-%m2-%h2T%h2:%n2:00: Set the recurring time to update the file. The file will be updated when the evaluated template changes. For example, a template in the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day (i.e. when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00).
Offset Factor Factor the variable will be shifted by. Use none for no shifting.
Scale Factor Factor the variable will be scaled by. Use none for no scaling.
External File Variable The name of the variable in the netCDF data file, e.g. ALBEDO in met fields.
External File Template Path to the netCDF data file. If not using the data, specify /dev/null to reduce processing time. If there are no tokens in the template name ExtData will assume that all the data is on one file. Note that if the data on file is at a different resolution that the application grid, the underlying I/O library ExtData uses will regrid the data to the application grid.

GCHP.rc

Important Note: This section is not yet updated for GCHP 12.5.0. Several changes to this file occur in that version for compatibility with a newer version of MAPL.

GCHP.rc is resource configuration file for the ROOT component within GCHP. It controls, at a minimum, global model resolution, grid type, parallel sub-domain size, component time steps, and job restart parameters.

Parameter Description
NX, NY Number of boxes in the two MPI sub-domain dimensions.
NX * NY must equal your number of CPUs; NX or NY must be a multiple of 6 (the number of faces); typically NY is set as a multiple of 6.
IM Number of grid cells on the side of a single cubed sphere face.
Get a rough idea of equivalent lat/lon resolution by dividing 90 by IM. Choose your timestep settings accordingly (see RUN_DT below).
JM Number of grid cells times 6, essentially defining the second dimension if all six faces are stacked in a 2-dimensional array.
Must be to equal to IM*6.
LM Number of vertical grid cells. The default value is 72.
Must be equal to the vertical resolution of the offline meteorological fields since MAPL cannot regrid vertically
GRIDNAME The default grid name is PE24x144-CF. The grid name includes how the pole is treated, the face side length, the face side length times six, and whether it is a Cubed Sphere Grid or Lat/Lon. The name PE24x144-CF indicates polar edge (PE), 24 cells along one face side, 144 for 24*6, and a cubed-sphere grid (CF). Many options here are defined in MAPL_Generic.
Must be consistent with IM and JM.
GEOChem_CTM A toggle that tells FVDycore that it is operating as a transport model rather than a prognostic model if set to 1.
AdvCore_Advection Default is 1.
DYCORE Should either be set to OFF (default) or ON. This value does nothing, but MAPL will crash if it is not declared.
HEARTBEAT_DT The timestep in seconds that the DYCORE Component should be called. Default is 600.
Must be a multiple of HEARTBEAT_DT in CAP.rc.
SOLAR_DT The timestep in seconds that the SOLAR Component should be called. Default is 600.
Must be a multiple of HEARTBEAT_DT in CAP.rc.
IRRAD_DT The timestep in seconds that the IRRAD Component should be called. ESMF checks this value during its timestep check. Default is 600.
Must be a multiple of HEARTBEAT_DT in CAP.rc.
RUN_DT The timestep in seconds that the RUN Component should be called. ESMF checks this value during its timestep check. The recommended timestep [s] per grid resolution (see IM) are: 1800 (4x5); 900 (2x2.5); 600 (1x1.25); 600 (1/2 degree); 300 (1/4 degree).
Must be a multiple of HEARTBEAT_DT in CAP.rc.
GIGCchem_DT The timestep in seconds that the GIGCchem Component should be called. ESMF checks this value during its timestep check. Default is 1200.
Must be a multiple of HEARTBEAT_DT in CAP.rc.
DYNAMICS_DT The timestep in seconds that the DYNAMICS Component should be called. ESMF checks this value during its timestep check. Default is 600.
Must be a multiple of HEARTBEAT_DT in CAP.rc.
SOLARAvrg, IRRADAvrg Default is 0.
GIGCchem_REFERENCE_TIME
PRINTRC Specifies which resource values to print. Options include 0: Non-Default Values, and 1: All Values. Default setting is 0.
PARALLEL_READFORCING Enables or disables parallel I/O processes when writing the restart files. Default value is 0 (disabled).
NUM_READERS, NUM_WRITERS Number of simultaneous readers. Should divide evenly unto NY. Default value is 1.
BKG_FREQUENCY Active observer when desired. Default value is 0.
GIGCchem_INTERNAL_RESTART_FILE The filename of the internal restart file to be written.
GIGCchem_INTERNAL_RESTART_TYPE The format of the internal restart file. Valid types include pbinary and pnc4.
GIGCchem_INTERNAL_CHECKPOINT_FILE The filename of the internal checkpoint file to be written.
GIGCchem_INTERNAL_CHECKPOINT_TYPE The format of the internal checkstart file. Valid types include pbinary and pnc4.
GIGCchem_INTERNAL_HEADER Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
DYN_INTERNAL_RESTART_FILE The filename of the DYNAMICS internal restart file to be written.
DYN_INTERNAL_RESTART_TYPE The format of the DYNAMICS internal restart file. Valid types include pbinary and pnc4.
DYN_INTERNAL_CHECKPOINT_FILE The filename of the DYNAMICS internal checkpoint file to be written.
DYN_INTERNAL_CHECKPOINT_TYPE The format of the DYNAMICS internal checkpoint file. Valid types include pbinary and pnc4.
DYN_INTERNAL_HEADER Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
MAX_DIAG, MAX_TRCS, MAX_MEMB, MAX_FAMS, MAX_DEP, LINOZ_NFIELDS, LINOZ_NLAT, LINOZ_NLEVELS, LINOZ_NMONTHS, RUN_PHASES For reading the input.geos file. Default value is 80.
HEMCO_CONFIG Name of the HEMCO configuration file. Default is HEMCO_Config.rc.
STDOUT_LOGFILE Log filename template. Default is PET%%%%%.GEOSCHEMchem.log.
STDOUT_LOGLUN Logical unit number for stdout. Default value is 6.
INIT_ZERO Default is 0.
INIT_GCC Default is 0.

HISTORY.rc

All output from GCHP is in netCDF format, with the contents of each output file controlled by HISTORY.rc. Information in HISTORY.rc is organized as follows:

Parameter Description
EXPID The beginning of where output files can be stored. Includes directory structure (OutputDir) and beginning of filename (GCHP).
EXPDSC The Export Description. If one does ncdump -h, the attribute "Title" will be the set value of EXPDSC.
CoresPerNode Number of CPUs per node for your simulation. Should match NX in GCHP.rc.
COLLECTIONS String names of all collections you wish to output (e.g. 'center', 'regrid'). GCHP outputs data as collections and each collection is output as a single file. You can create as many different collections as you want. For each collection, the following fields may be specified. Note that in some instances default values are used if a setting is not explicitly defined.
{COLLECTION}.template The output filename suffix. Including a date string, such as '%y4%m2%d2, will insert the simulation start day (following GrADS conventions). This should also include the file extension (e.g. .nc4)
{COLLECTION}.format Character string defining the file format. Options include CFIO (default) for netCDF-4 or flat for binary files.
{COLLECTION}.resolution Defines the output resolution, IX IY (in lat-lon). Output can be at a different resolution than the model run. Defaults to the same as the model. This means if the model is running at cubed sphere, it will by default output a cubed-sphere grid unless specified otherwise.
{COLLECTION}.frequency The frequency at which values are archived for output in hhmmss format. For example, the default 010000 will output data every hour.
{COLLECTION}.mode String defining archived values. Options are either instantaneous or time-averaged data. Default is instantaneous.
{COLLECTION}.acc_interval Needed when mode is set to time-averaged. Defines the accumulation interval of the time average. Must be less than or equal to frequency.
NOTE: Does not appear in downloaded run directory HISTORY.rc.
{COLLECTION}.subset Used to declare only outputting a subset of data for output. Values include lonMin, lonMax, latMin, and latMax.
NOTE: Does not appear in downloaded run directory HISTORY.rc.
{COLLECTION}.fields Paired character strings for diagnostic names and its associated gridded component to be written out.

Important note about output resolution: The native resolution of GCHP output is on a cubed-sphere grid. Output in the native resolution will have dimensions [Nx6NxKxT] where N is a cubed sphere side length, K is the number of vertical layers, and T is the number of time samples in the file (i.e. duration divided by frequency). Specifying a regular lat-lon resolution in HISTORY.rc (e.g. 144 91) causes GCHP to attempt to regrid the output variables from cubed sphere resolution to lat-lon, assuming a uniform cell size throughout the domain. In this example, a regular 4x5 output will be created, although it will be different from the standard GEOS-Chem 4x5 output grid (e.g. no half-polar grid cells).

Important note about vertical dimensions: The vertical axis in the unprocessed GCHP output files are flipped relative to GEOS-Chem Classic output file. Level index 1 corresponds to the top of the atmospheric while the end corresponds to the surface. Please keep this in mind when you perform post-processing and data visualization using GCHP output.

input.nml

Setting options in input.nml are listed below. input.nml controls specific aspects of the FV3 dycore. If this file is not present, you may have memory errors.

Parameter Description
print_memory_usage Supposed to toggle memory usage prints to log. However, in practice turning it on or off does not seem to have any effect. Memory usage is always printed.
domain_stack_size Domain stack size in bytes. We use a default of 7000000 but this may not be large enough if using very few cores with a high resolution run (e.g. 6 or 12 cores at c48). If the domain size is too small then you will get an mpp domain stack size overflow error in advection. If this happens, try increasing the domain stack size in input.nml.



Previous | Getting Started with GCHP | GCHP Main Page