Difference between revisions of "Running GCHP: Configuration"

From Geos-chem
Jump to: navigation, search
(Created page with "Return to GCHP Main Page == GCHP Run Directory == Download a GCHP run directory following GEOS-Chem_HP_Dev_Kit#Obtaining_a_GCHP_Run_Directory|these instru...")
 
m
 
(216 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[[GEOS-Chem_HP|Return to GCHP Main Page]]
+
----
 +
<span style="color:crimson;font-size:120%">'''The GCHP documentation has moved to https://gchp.readthedocs.io/.''' The GCHP documentation on http://wiki.seas.harvard.edu/ will stay online for several months, but it is outdated and no longer active!</span>
 +
----
  
== GCHP Run Directory ==
+
__FORCETOC__
 +
'''''[[Running_GCHP:_Basics|Previous]] | [[GCHP_Output_Data| Next]] | [[Getting Started with GCHP]] | [[GCHP Main Page]]'''''
 +
#[[GCHP_Hardware_and_Software_Requirements|Hardware and Software Requirements]]
 +
#[[Setting_Up_the_GCHP_Environment|Setting Up the GCHP Environment]]
 +
#[[Downloading_GCHP|Downloading Source Code and Data Directories]]
 +
#[[Compiling_GCHP|Compiling]]
 +
#[[Obtaining_a_GCHP_Run_Directory|Obtaining a Run Directory]]
 +
#[[Running_GCHP:_Basics|Running GCHP: Basics]]
 +
#<span style="color:blue">'''Running GCHP: Configuration'''</span>
 +
#[[GCHP_Output_Data|Output Data]]
 +
#[[Developing_GCHP|Developing GCHP]]
 +
#[[GCHP_Run_Configuration_Files|Run Configuration Files]]
 +
<br>
  
Download a GCHP run directory following [[GEOS-Chem_HP_Dev_Kit#Obtaining_a_GCHP_Run_Directory|these instructions on the ''GEOS-Chem HP Dev Kit'' wiki page]]. The run directory will include the following files:
+
== Overview ==
  
'''Resource Configuration (.rc) Files:'''
+
All GCHP run directories have default simulation-specific run-time settings that are set when you create a run directory. You will likely want to change these settings. This page goes over how to do this.
#''CAP.rc''
+
#''GCHP.rc''
+
#''HISTORY.rc''
+
#''ExtData.rc''
+
#''HEMCO_Config.rc''
+
#''fvcore_layout.rc''
+
  
'''Other Configuration Files:'''
+
== Configuration files ==
#''input.geos''
+
#''ExtData.Discover''
+
  
'''Restart files:'''
+
GCHP is controlled using a set of configuration files that are included in the GCHP run directory. Files include:  
#''geoschemchem_internal_rst.nc''
+
#[[GCHP_Run_Configuration_Files#CAP.rc|CAP.rc]]
#''cap_restart''
+
#[[GCHP_Run_Configuration_Files#ExtData.rc|ExtData.rc]]
 +
#[[GCHP_Run_Configuration_Files#GCHP.rc|GCHP.rc]]
 +
#[[GCHP_Run_Configuration_Files#input.geos|input.geos]]
 +
#[[GCHP_Run_Configuration_Files#HEMCO_Config.rc|HEMCO_Config.rc]]
 +
#[[GCHP_Run_Configuration_Files#HEMCO_Diagn.rc|HEMCO_Diagn.rc]]
 +
#[[GCHP_Run_Configuration_Files#input.nml|input.nml]]
 +
#[[GCHP_Run_Configuration_Files#HISTORY.rc|HISTORY.rc]]
  
'''GEOS-Chem ASCII data files:'''
+
Several run-time settings must be set consistently across multiple files. Inconsistencies may result in your program crashing or yielding unexpected results. To avoid mistakes and make run configuration easier, bash shell script <tt>runConfig.sh</tt> is included in all run directories to set the most commonly changed config file settings from one location. Sourcing this script will update multiple config files to use values specified in file.
#''chemga.dat''
+
#''dust.dat''
+
#''FJX_j2j.dat''
+
#''FJX_spec.dat''
+
#''globchem.dat''
+
#''jv_spec_aod.dat'' (used in v10-01 public?)
+
#''jv_spec_mie.dat''
+
#''mglob.dat''
+
#''org.dat''
+
#''so4.dat''
+
#''soot.dat''
+
#''ssa.dat''
+
#''ssc.dat''
+
#''ratj.d'' (used in v10-01 public?)
+
  
'''Other files:'''
+
Sourcing <tt>runConfig.sh</tt> is done automatically prior to running GCHP if using any of the example run scripts, or you can do it at the command line. Information about what settings are changed and in what files are standard output of the script. To source the script, type the following:
#''input.nml''
+
#''ExtData''
+
  
= Resource Configuration Files =
+
source runConfig.sh
[[File:GCHP_heirarchy.gif|frame|Figure 1. A basic hierarchy of Gridded Components in the GCHP.]]
+
  
GCHP is controlled using a set of Resource Configuration (RC) files.<br />
+
You may also use it in silent mode if you wish to update files but not display settings on the screen:
These files contain information required by GCHP including hardware allocation, initialization and runtime parameters, locations of necessary files, and where, what and how to read and write data to and from GCHP.
+
* CAP.rc
+
* GCHP.rc
+
* HISTORY.rc
+
* fvcore_layout.rc
+
* ExtData.rc
+
  
While, within GCHP, GEOS-Chem does still rely upon settings controlled by 'input.geos' and 'HEMCO_Config.rc', several of the entries are superseded by *.rc files. These entries will eventually be highlighted in <span style="color:#ff0000">red</span> below.
+
source runConfig.sh --silent
  
== CAP.rc ==
+
While using <tt>runConfig.sh</tt> to configure common settings makes run configure much simpler, it comes with a major caveat. If you manually edit a config file setting that is also set in <tt>runConfig.sh</tt> then your manual update will be overrided via string replacement. Please get very familiar with the options in <tt>runConfig.sh</tt> and be conscientious about not updating the same setting elsewhere.
CAP.rc is the configuration file for the top-level "Cap" (in ESMF lingo) component. It handles general runtime settings for GCHP including time and data parameters, performance profiling routines, and system-wide timestep (hearbeat). Combined with cap_restart, this determines the exact dates for the next run of GCHP. Any additional parameters in CAP.rc that are not included in the below list are artifacts of previous versions of GCHP and may be ignored.
+
  
{| border=1 cellspacing=0 cellpadding=5
+
You generally will not need to know more about the GCHP configuration files beyond what is listed on this page. However, for a comprehensive description of all configuration files used by GCHP see the last section of this user manual.
|-bgcolor="#CCCCCC"
+
!width="40px"|Parameter
+
!width="800px"|Description
+
  
|-valign="top"
+
== Commonly Changed Run Options ==
|MAPLROOT_COMPNAME
+
|The default setting is ''GCHP''.
+
  
|-valign="top"
+
=== Compute Configuration ===
|ROOT_NAME
+
|Sets the child name that MAPL will use to initialize the ROOT component within CAP. It uses this name in all operations when querying and interacting with ROOT. The default setting is ''GCHP''.
+
  
|-valign="top"
+
==== Set Number of Nodes and Cores ====
|ROOT_CF
+
|Resource configuration file for the ROOT component. The default setting is ''GCHP.rc''.
+
  
|-valign="top"
+
To change the number of nodes and cores for your run you must update settings in two places: (1) <tt>runConfig.sh</tt>, and (2) your run script. The <tt>runConfig.sh</tt> file contains detailed instructions on how to set resource parameter options and what they mean. Look for the <tt>Compute Resources</tt> section in the script. Update your resource request in your run script to match the resources set in <tt>runConfig.sh</tt>.
|HIST_CF
+
|MAPL's HISTORY component resource configuration file. The default setting is ''HISTORY.rc''.  
+
  
|-valign="top"
+
It is important to be smart about your resource allocation. To do this it is useful to understand how GCHP works with respect to distribution of nodes and cores across the grid. At least one unique core is assigned to each face on the cubed sphere, resulting in a constraint of at least six cores to run GCHP. The same number of cores must be assigned to each face, resulting in another constraint of total number of cores being a multiple of six. Communication between the cores occurs only during transport processes.  
|BEG_DATE
+
|Begin date for the range of the experiment, in format YYYYMMDD HHMMSS.<br>'''Set this to your simulation start date.'''
+
  
|-valign="top"
+
While any number of cores is valid as long as it is a multiple of six (although there is an upper limit per resolution), you will typically start to see negative effects due to excessive communication if a core is handling less than around one hundred grid cells or a cluster of grid cells that are not approximately square. You can determine how many grid cells are handled per core by analyzing your grid resolution and resource allocation. For example, if running at C24 with six cores each face is handled by one core (6 faces / 6 cores) and contains 576 cells (24x24). Each core therefore processes 576 cells. Since each core handles one face, each core communicates with four other cores (four surrounding faces). Maximizing squareness of grid cells per core is done automatically within <tt>runConfig.sh</tt> if variable <tt>NXNY_AUTO</tt> is set to <tt>ON</tt>.
|END_DATE
+
|End date for the range of the experiment, in format YYYYMMDD HHMMSS.<br>'''Set this to your simulation end date.'''
+
  
|-valign="top"
+
Further discussion about domain decomposition is in <tt>runConfig.sh</tt> section <tt>Domain Decomposition</tt>.
|JOB_SGMT
+
|Duration of each individual run, in format YYYYMMDD HHMMSS. Should be shorter than or equal to the span of BEG_DATE and END_DATE. <br>'''Update this based on BEG_DATE and END_DATE for your simulation.'''
+
  
|-valign="top"
+
==== Split a Simulation Into Multiple Jobs ====
|HEARTBEAT_DT
+
|The 'ticking' rate of the ESMF/MAPL internal clock, in seconds. At every tick of the heartbeat, ESMF queries all components to see if anything needs to be calculated based upon the components' defined timesteps. Default is 1800.<br>'''Set this to the minimum timestep ("*_DT") set in GCHP.rc'''
+
  
|-valign="top"
+
There is an option to split up a single simulation into separate serial jobs. To use this option, do the following:
|MAPL_ENABLE_TIMERS
+
|Enables output of MAPL's run-profile timers. Default is 'YES'.
+
  
|-valign="top"
+
#Update <tt>runConfig.sh</tt> with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted (<tt>NUM_RUNS</tt>). Carefully read the comments in <tt>runConfig.sh</tt> to ensure you understand how it works.
|MAPL_ENABLE_MEMUTILS
+
#Optionally turn on monthly diagnostic (<tt>Monthly_Diag</tt>). Only turn on monthly diagnostics if your run duration is monthly.
|Enables runtime output of the programs' memory usage. Default is 'YES'.br>
+
#Use <tt>gchp.multirun.run</tt> as your run script, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. As with the regular <tt>gchp.run</tt>, you will need to update the file with compute resources consistent with <tt>runConfig.sh</tt>.  '''Note that you should not submit the run script directly.''' It will be done automatically by the file described in the next step.
 +
#Use <tt>gchp.multirun.sh</tt> to submit your job, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. For example, to submit your series of jobs, type: <code>./gchp.multirun.sh</code>
  
|-valign="top"
+
There is much documentation in the headers of both <tt>gchp.multirun.run</tt> and <tt>gchp.multirun.sh</tt> that is worth reading and getting familiar with, although not entirely necessary to get the multi-run option working. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3-hour simulation with 1-hour duration and number of runs equal to 3.  
|PRINTSPEC
+
|If enabled, runs the model until all components are initialized. Then, dumps all the names of the members of the internal import/export states and ends the model. Setting options include: 0 (default): Off, 1: Import & Export, 2: Import, and 3: Export.
+
|}
+
  
== GCHP.rc ==
+
The multi-run script assumes use of SLURM, and a separate SLURM log file is created for each run. There is also log file called <code>multirun.log</code> with high-level information such as the start, end, duration, and job ids for all jobs submitted. If a run fails then all scheduled jobs are cancelled and a message about this is sent to that log file. Inspect this and your other log files, as well as output in the <tt>OutputDir/</tt> directory prior to using for longer duration runs.
GCHP.rc is resource configuration file for the ROOT component within GCHP. It controls, at a minimum, global model resolution, grid type, parallel sub-domain size, component time steps, and job restart parameters.
+
  
{| border=1 cellspacing=0 cellpadding=5
+
==== Change Domains Stack Size ====
|-bgcolor="#CCCCCC"
+
!width="40px"|Parameter
+
!width="800px"|Description
+
  
|-valign="top"
+
For runs at very high resolution or small number of processors you may run into a domains stack size error. This is caused by exceeding the domains stack size memory limit set at run-time and the error will be apparent from the message in your log file. If this occurs you can increase the domains stack size in file <tt>input.nml</tt>. The default is set to 20000000.
|NX
+
|This is the number of grid boxes in an MPI sub-domain for the X-dimension.<br>'''NX * NY must be equal to the number of CPUs you are using.'''
+
  
|-valign="top"
+
=== Basic Run Settings ===
|NY
+
|This is the number of grid boxes in an MPI sub-domain for the Y-dimension.<br>'''NY must be a multiple of 6 (since a cube has 6 sides).'''
+
  
|-valign="top"
+
==== Set Cubed Sphere Grid Resolution ====
|IM
+
|The number of longitudinal (or X) grid-points in the global domain (depending upon grid type).<br>'''Update this based on your intended grid resolution using this general mapping guideline, and make sure it is consistent with timestep settings (see RUN_DT below):<br>24 ~ 4x5; 48 ~ 2x2.5; 96 ~ 1x1.25; 192 ~ 1/2 degree; 384 ~ 1/4 degree.'''
+
  
|-valign="top"
+
GCHP uses a cubed sphere grid rather than the traditional lat-lon grid used in GEOS-Chem Classic. While regular lat-lon grids are typically designated as ΔLat ⨉ ΔLon (e.g. 4⨉5), cubed sphere grids are designated by the side-length of the cube. In GCHP we specify this as CX (e.g. C24 or C180). The simple rule of thumb for determining the roughly equivalent lat-lon resolution for a given cubed sphere resolution is to divide the side length by 90. Using this rule you can quickly match C24 with about 4x5, C90 with 1 degree, C360 with quarter degree, and so on.
|JM
+
|The number of latitudinal (or Y) grid-points in the global domain (depending upon grid type). For the cubed sphere, JM must meet the condition that JM = IM * 6 (since a cube has six sides).<br>'''Update this to equal the value IM*6.'''
+
  
|-valign="top"
+
To change your grid resolution in the run directory edit the <tt>CS_RES</tt> integer parameter in <tt>runConfig.sh</tt> section <tt>Internal Cubed Sphere Resolution</tt> to the cube side length you wish to use. To use a uniform global grid resolution make sure that <tt>STRETCH_GRID</tt> is set to <tt>OFF</tt>.
|LM
+
|The number of gridpoints in the vertical. Must be equal to the vertical resolution of the offline meteorological fields, since MAPL cannot currently regrid vertically. The default value is 72.
+
  
|-valign="top"
+
==== Set Stretch Grid Resolution ====
|GRIDNAME
+
|This should be formatted very carefully to be consistent with settings that came before. The first two characters determine how the pole is treated (default is ''PE''). Next is the value of IM, an x, then the value of JM. This is followed by a -, and then a two-character code that determines Cubed Sphere Grid (''CF'') or Lat/Lon (). Many options here are defined in MAPL_Generic.<br>'''Update this to reflect your settings for IM and JM.'''
+
  
|-valign="top"
+
GCHP has the capability to run with a stretched grid, meaning one portion of the globe is stretched to fine resolution. Set stretched grid parameter in <tt>runConfig.sh</tt> section <tt>Internal Cubed Sphere Resolution</tt>. See instructions in that section of the file.
|GOESChem_CTM
+
|A toggle that tells FVDycore that it is operating as a transport model rather than a prognostic model if set to 1.
+
  
|-valign="top"
+
==== Turn On/Off Model Components ====
|AdvCore_Advection
+
|Default is 1.
+
  
|-valign="top"
+
You can toggle all primary GEOS-Chem components, including type of mixing, from within <tt>runConfig.sh</tt>. The settings in that file will update <tt>input.geos</tt> automatically. Look for section <tt>Turn Components On/Off, and other settings in input.geos</tt>. Other settings in this section beyond component on/off toggles using CH4 emissions in UCX, and initializing stratospheric H2O in UCX.
|DYCORE
+
|Should either be set to OFF (default) or ON. This value does nothing, but MAPL will crash if it is not declared.
+
  
|-valign="top"
+
==== Change Model Timesteps ====
|SOLAR_DT
+
|The timestep in seconds that the SOLAR Component should be called. Default is 1800.<br>'''Must be a multiple of HEARTBEAT_DT in CAP.rc'''
+
  
|-valign="top"
+
Model timesteps, both chemistry and dynamic, are configured within <tt>runConfig.sh</tt>. They are set to match GEOS-Chem Classic default values for low resolutions for comparison purposes but can be updated, with caution. Timesteps are automatically reduced for high resolution runs. Read the documentation in <tt>runConfig.sh</tt> section <tt>Timesteps</tt> for setting them.
|IRRAD_DT
+
|The timestep in seconds that the IRRAD Component should be called. ESMF checks this value during its timestep check. Default is 1800.<br>'''Must be a multiple of HEARTBEAT_DT in CAP.rc'''
+
  
|-valign="top"
+
==== Set Simulation Start and End Dates ====
|RUN_DT
+
|The timestep in seconds that the RUN Component should be called.  ESMF checks this value during its timestep check. The recommended timestep [s] per grid resolution (see IM) are: 1800 ~ 4x5; 900 ~ 2x2.5; 600 ~ 1x1.25; 600 ~ 1/2 degree; 300 ~ 1/4 degree.<br>'''Must be a multiple of HEARTBEAT_DT in CAP.rc'''
+
  
|-valign="top"
+
Set simulation start and end in <tt>runConfig.sh</tt> section <tt>Simulation Start, End, Duration, # runs</tt>. Read the comments in the file for a complete description of the options. Typically a "CAP" runtime error indicates a problem with start, end, and duration settings. If you encounter an error with the words "CAP" near it then double-check that these settings make sense.
|GCHPchem_DT
+
|The timestep in seconds that the GCHPchem Component should be called. ESMF checks this value during its timestep check.<br>'''Must be a multiple of HEARTBEAT_DT in CAP.rc'''
+
  
|-valign="top"
+
=== Inputs ===
|DYNAMICS_DT
+
|The timestep in seconds that the DYNAMICS Component should be called.  ESMF checks this value during its timestep check.<br>'''Must be a multiple of HEARTBEAT_DT in CAP.rc'''
+
  
|-valign="top"
+
==== Change Initial Restart File ====
|SOLARAvrg
+
|Default is 0.
+
  
|-valign="top"
+
All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in <tt>runConfig.sh</tt>.
|IRRADAvrg
+
|Default is 0.
+
  
|-valign="top"
+
You may overwrite the default restart file with your own by specifying the restart filename in <tt>runConfig.sh</tt> section <tt>Initial Restart File</tt>. Beware that it is your responsibility to make sure it is the proper grid resolution.
|PRINTRC
+
|Specifies which resource values to print. Options include 0: Non-Default Values, and 1: All Values. Default setting is 0.
+
  
|-valign="top"
+
Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables, but all output restart files do.
|PARALLEL_READFORCING
+
|Enables or disables parallel I/O processes when writing the restart files. Default value is 0 (disabled).
+
  
|-valign="top"
+
==== Turn On/Off Emissions Inventories ====
|NUM_READERS
+
|Number of simultaneous readers. Should divide evenly unto NY. Default value is 1.
+
  
|-valign="top"
+
Because file I/O impacts GCHP performance it is a good idea to turn off file read of emissions that you do not need. You can turn emissions inventories on or off the same way you would in GEOS-Chem Classic, by setting the inventories to true or false at the top of configuration file <tt>HEMCO_Config.rc</tt>. All emissions that are turned off in this way will be ignored when GCHP uses <tt>ExtData.rc</tt> to read files, thereby speeding up the model.
|NUM_WRITERS
+
|Number of simultaneous writers. Should divide evenly unto NY. Default value is 1.
+
  
|-valign="top"
+
For emissions that do not have an on/off toggle at the top of the file, you can prevent GCHP from reading them by commenting them out in <tt>HEMCO_Config.rc</tt>. No updates to <tt>ExtData.rc</tt> would be necessary. If you alternatively comment out the emissions in <tt>ExtData.rc</tt> but not <tt>HEMCO_Config.rc</tt> then GCHP will fail with an error when looking for the file information.
|BKG_FREQUENCY
+
|Active observer when desired. Default value is 0.
+
  
|-valign="top"
+
Another option to skip file read for certain files is to replace the file path in <tt>ExtData.rc</tt> with <tt>/dev/null</tt>. However, if you want to turn these inputs back on at a later time you should preserve the original path by commenting out the original line.  
|GIGCchem_INTERNAL_RESTART_FILE
+
|The filename of the Internal Restart File to be written.
+
  
|-valign="top"
+
==== Add New Emissions Files ====
|GIGCchem_INTERNAL_RESTART_TYPE
+
|The format of the Internal Restart File. Valid types include pbinary and pnc4.
+
  
|-valign="top"
+
There are two steps for adding new emissions inventories to GCHP:
|GIGCchem_INTERNAL_CHECKPOINT_FILE
+
#Add the inventory information to <tt>HEMCO_Config.rc</tt>.
|The filename of the Internal Checkpoint File to be written.
+
#Add the inventory information to <tt>ExtData.rc</tt>.
  
|-valign="top"
+
To add information to <tt>HEMCO_Config.rc</tt>, follow the same rules as you would for [[The_HEMCO_User%27s_Guide|adding a new emission inventory to GEOS-Chem Classic]]. Note that not all information in <tt>HEMCO_Config.rc</tt> is used by GCHP. This is because HEMCO is only used by GCHP to handle emissions after they are read, e.g. scaling and applying hierarchy. All functions related to HEMCO file read are skipped. This means that you could put garbage for the file path and units in <tt>HEMCO_Config.rc</tt> without running into problems with GCHP, as long as the syntax is what HEMCO expects. However, we recommend that you fill in <tt>HEMCO_Config.rc</tt> in the same way you would for GEOS-Chem Classic for consistency and also to avoid potential format check errors.
|GIGCchem_INTERNAL_CHECKPOINT_TYPE
+
|The format of the Internal Checkstart File. Valid types include pbinary and pnc4.
+
  
|-valign="top"
+
Staying consistent with the information that you put into <tt>HEMCO_Config.rc</tt>, add the inventory information to <tt>ExtData.rc</tt> following the guidelines listed at the top of the file and using existing inventories as examples. You can ignore all entries in <tt>HEMCO_Config.rc</tt> that are copies of another entry since putting these in <tt>ExtData.rc</tt> would result in reading the same variable in the same file twice. HEMCO interprets the copied variables, denoted by having dashes in the <tt>HEMCO_Config.rc</tt> entry, separate from file read.  
|GIGCchem_INTERNAL_HEADER
+
|Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
+
  
|-valign="top"
+
A few common errors encountered when adding new input emissions files to GCHP are:
|DYN_INTERNAL_RESTART_FILE
+
|The filename of the DYNAMICS Internal Restart File to be written.
+
  
|-valign="top"
+
#Your input file contains integer values. Beware that the MAPL I/O component in GCHP does not read or write integers. If your data contains integers then you should reprocess the file to contain floating point values instead.
|DYN_INTERNAL_RESTART_TYPE
+
#Your data latitude and longitude dimensions are in the wrong order. Lat must always come before lon in your inputs arrays, a requirement true for both GCHP and GEOS-Chem Classic. For more information about this, see the [Preparing_data_files_for_use_with_HEMCO#Ordering_of_the_data|Preparing Data Files for Use with HEMCO wiki page]].
|The format of the DYNAMICS Internal Restart File. Valid types include pbinary and pnc4.
+
#Your 3D input data are mapped to the wrong levels in GEOS-Chem (silent error). If you read in 3D data and assign the resulting import to a GEOS-Chem state variable such as State_Chm or State_Met, then you must flip the vertical axis during the assignment. See files <tt>Includes_Before_Run.H</tt> and setting State_Chm%Species in <tt>Chem_GridCompMod.F90</tt> for examples.
 +
#You have a typo in either <tt>HEMCO_Config.rc</tt> or <tt>ExtData.rc</tt>. Error in <tt>HEMCO_Config.rc</tt> typically result in the model crashing right away. Errors in <tt>ExtData.rc</tt> typically result in a problem later on during ExtData read. Always try running with the MAPL debug flags on <tt>runConfig.sh</tt> (maximizes output to <tt>gchp.log</tt>) and Warnings and Verbose set to 3 in <tt>HEMCO_Config.rc</tt> (maximizes output to <tt>HEMCO.log</tt>) when encountering errors such as this. Another useful strategy is to find config file entries for similar input files and compare them against the entry for your new file. Directly comparing the file metadata may also lead to insights into the problem.
  
|-valign="top"
+
=== Outputs ===
|DYN_INTERNAL_CHECKPOINT_FILE
+
|The filename of the DYNAMICS Internal Checkpoint File to be written.
+
  
|-valign="top"
+
==== Output Diagnostics Data on a Lat-Lon Grid ====
|DYN_INTERNAL_CHECKPOINT_TYPE
+
|The format of the DYNAMICS Internal Checkpoint File. Valid types include pbinary and pnc4.
+
  
|-valign="top"
+
See documentation in the <tt>HISTORY.rc</tt> config file for instructions on how to output diagnostic collection on lat-lon grids.
|DYN_INTERNAL_HEADER
+
|Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
+
  
|-valign="top"
+
==== Output Restart Files at Regular or Irregular Frequency ====
|MAX_DIAG
+
|For reading the input.geos file. Default value is 80.
+
  
|-valign="top"
+
The MAPL component in GCHP has the option to output restart files (also called checkpoint files) prior to run end. The frequency of restart file write may be at regular time intervals (regular frequency) or at specific programmed times (irregular frequency). These periodic output restart files contain the date and time in their filenames.
|MAX_TRCS
+
|For reading the input.geos file. Default value is 130.
+
  
|-valign="top"
+
Enabling this feature is a good idea if you plan on doing a long simulation and you are not splitting your run into multiple jobs. If the run crashes unexpectedly then you can restart mid-run rather than start over from the beginning.
|MAX_MEMB
+
|For reading the input.geos file. Default value is 15.
+
  
|-valign="top"
+
Update settings for checkpoint restart outputs in <tt>runConfig.sh</tt> section <tt>Output Restarts</tt>. Instructions for configuring both regular and irregular frequency restart files are included in the file.
|MAX_FAMS
+
|For reading the input.geos file. Default value is 20.
+
  
|-valign="top"
+
==== Turn On/Off Diagnostics ====
|MAX_DEP
+
|For reading the input.geos file. Default value is 130.
+
  
|-valign="top"
+
To turn diagnostic collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file <tt>HISTORY.rc</tt>. Collections cannot be turned on/off from <tt>runConfig.sh</tt>.
|LINOZ_NFIELDS
+
|For reading the input.geos file. Default value is 7.
+
  
|-valign="top"
+
==== Set Diagnostic Frequency, Duration, and Mode ====
|LINOZ_NLAT
+
|For reading the input.geos file. Default value is 18.
+
  
|-valign="top"
+
All diagnostic collections that come with the run directory have frequency, duration, and mode auto-set within <tt>runConfig.sh</tt>. The file contains a list of time-averaged collections and instantaneous collections, and allows setting a frequency and duration to apply to all collections listed for each. See section <tt>Output Diagnostics</tt> within <tt>runConfig.sh</tt>. To avoid auto-update of a certain collection, remove it from the list in <tt>runConfig.sh</tt>. If adding a new collection, you can add it to the file to enable auto-update of frequency, duration, and mode.
|LINOZ_NLEVELS
+
|For reading the input.geos file. Default value is 25.
+
  
|-valign="top"
+
==== Add a New Diagnostics Collection ====
|LINOZ_NMONTHS
+
|For reading the input.geos file. Default value is 12.
+
  
|-valign="top"
+
Adding a new diagnostics collection in GCHP is the same as for GEOS-Chem Classic netcdf diagnostics. You must add your collection to the collection list in <tt>HISTORY.rc</tt> and then define it further down in the file. Any 2D or 3D arrays that are stored within GEOS-Chem objects State_Met, State_Chm, or State_Diag, may be included as fields in a collection. State_Met variables must be preceded by "Met_", State_Chm variables must be preceded by "Chem_", and State_Diag variables should not have a prefix. See the <tt>HISTORY.rc</tt> file for examples.
|RUN_PHASES
+
|For reading the input.geos file. Default value is 1.
+
  
|-valign="top"
+
Once implemented, you can either incorporate the new collection settings into <tt>runConfig.sh</tt> for auto-update, or you can manually configure all settings in <tt>HISTORY.rc</tt>. See the <tt>Output Diagnostics</tt> section of <tt>runConfig.sh</tt> for more information.
|HEMCO_CONFIG
+
|HEMCO configuration file. Default is HEMCO_Config.rc. (supercedes what is in input.geos?)
+
  
|-valign="top"
+
==== Generate Monthly Mean Diagnostics ====
|STDOUT_LOGFILE
+
|Log filename template. Default is PET%%%%%.GEOSCHEMchem.log.
+
  
|-valign="top"
+
There is an option to automatically generate monthly diagnostics by submitting month-long simulations as separate jobs. Splitting up the simulation into separate jobs is a requirement for monthly diagnostics because MAPL History requires a fixed number of hours set for diagnostic frequency and file duration. The monthly mean diagnostic option automatically updates <tt>HISTORY.rc</tt> diagnostic settings each month to reflect the number of days in that month taking into account leap years.  
|STDOUT_LOGLUN
+
|Logical unit number for stdout. Default value is 6.
+
  
|-valign="top"
+
To use the monthly diagnostics option, first read and follow instructions for splitting a simulation into multiple jobs (see separate section on this page). Prior to submitting your run, enable monthly diagnostics in <tt>runConfig.sh</tt> by searching for variable "Monthly_Diag" and changing its value from 0 to 1. Be sure to always start your monthly diagnostic runs on the first day of the month.
|INIT_ZERO
+
|Default is 0.
+
  
|}
+
=== Debugging ===
  
== HISTORY.rc ==
+
==== Enable Maximum Print Output ====
  
{| border=1 cellspacing=0 cellpadding=5
+
Besides compiling with <tt>CMAKE_BUILD_TYPE=Debug</tt>, there are a few settings you can configure to boost your chance of successful debugging. All of them involve sending additional print statements to the log files.
|-bgcolor="#CCCCCC"
+
#Set <tt>Turn on debug printout?</tt> in <tt>input.geos</tt> to <tt>T</tt> to turn on extra GEOS-Chem print statements in the main log file.
!width="40px"|Parameter
+
#Set <tt>MAPL_EXTDATA_DEBUG_LEVEL</tt> in <tt>runConfig.sh</tt> to <tt>1</tt> to turn on extra MAPL print statements in ExtData, the component that handles input.
!width="800px"|Description
+
#Set the <tt>Verbose</tt> and <tt>Warnings</tt> settings in <tt>HEMCO_Config.rc</tt> to maximum values of 3 to send the maximum number of prints to <tt>HEMCO.log</tt>.
  
|-valign="top"
+
None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.
|EXPID
+
|The beginning of where output files can be stored. Includes directory structure and beginning of filename.
+
  
|-valign="top"
 
|EXPDSC
 
|The Export Description. If one does ncdump -h, the attribute "Title" will be the set value of EXPDSC.
 
  
|-valign="top"
+
----------------------------------------
|CoresPerNode
+
|Should match NX in GCHP.rc.
+
  
|-valign="top"
+
'''''[[Running_GCHP:_Basics|Previous]] | [[GCHP_Output_Data| Next]] | [[Getting Started with GCHP]] | [[GCHP Main Page]]'''''
|COLLECTIONS: 'center'  
+
|This defines the collection 'center', which is in further detail in the next several parameters. This collection defines the format of the output files, as well as what gets written to them.
+
 
+
|-valign="top"
+
|center.template
+
|This will go after the COLLECTIONS value in output file names, prefixed by a dot. Should include the file extension. Follows GrADS convensions.
+
 
+
|-valign="top"
+
|center.archive
+
|Still used?
+
 
+
|-valign="top"
+
|center.format
+
|Character string defining the file format. Options include "flat" (the default), "CFIO", or "CFIOasync".
+
 
+
|-valign="top"
+
|center.resolution
+
|Defines the output resolution, IX, IY, in lat-lon. Output can be at a different resolution than the model run. Defaults to the same as the model. This means if the model is running at cube-sphere, it will by default output a cubed sphere grid unless specified otherwise.
+
 
+
|-valign="top"
+
|center.frequency
+
|Is the time frequency of output, HHMMSS. 010000 will output data every hour. Defaults to 060000.
+
 
+
|-valign="top"
+
|center.mode
+
|Character string definining what is written out; either "instantaneous" or "time-averaged" data.<br>'''NOTE: Does not appear in downloaded run directory HISTORY.rc'''
+
 
+
|-valign="top"
+
|center.acc_interval
+
|Is needed when mode is set to "time-averaged". Defines the acculation interval of the time average. Must be less than or equal to frequency.<br>'''NOTE: Does not appear in downloaded run directory HISTORY.rc'''
+
 
+
|-valign="top"
+
|center.subset
+
|Can be used to declare only outputting a subset of data for output. Values include lonMin, lonMax, latMin, and latMax.<br>'''NOTE: Does not appear in downloaded run directory HISTORY.rc'''
+
 
+
|-valign="top"
+
|center.fields
+
|Paired character strings for diagnostic names and its associated gridded component to be written out.
+
|}
+
 
+
== fvcore_layout.rc ==
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="40px"|Parameter
+
!width="800px"|Description
+
 
+
|-valign="top"
+
|npx
+
|The default setting is 22.<br>'''Change this to match IM in GCHP.rc.'''
+
 
+
|-valign="top"
+
|npy
+
|The default setting is 22.<br>'''Change this to match IM in GCHP.rc.'''
+
 
+
|-valign="top"
+
|npz
+
|The default setting is 72.<br>'''Change this to match LM in GCHP.rc.'''
+
 
+
|-valign="top"
+
|dt
+
|The default setting is 1800 (4x5 resolution). '''Change this to match HEARTBEAT_DT in CAP.rc (or RUN_DT in GCHP.rc?)'''
+
 
+
|-valign="top"
+
|n_sponge
+
|The default setting is -1.
+
 
+
|-valign="top"
+
|ADIABATIC
+
|The default setting is true.
+
 
+
|-valign="top"
+
|hydrostatic
+
|The default setting is true.
+
 
+
|-valign="top"
+
|nord
+
|The default setting is 0.
+
 
+
|-valign="top"
+
|d2_bg
+
|The default setting is 0.0075.
+
 
+
|-valign="top"
+
|d4_bg
+
|The default setting is 0.0.
+
 
+
|-valign="top"
+
|dddmp
+
|The default setting is 0.2.
+
 
+
|-valign="top"
+
|ksplit
+
|The default setting is 1.
+
 
+
|-valign="top"
+
|nsplit
+
|The default setting is 0.
+
 
+
|-valign="top"
+
|msplit
+
|The default setting is 0.
+
 
+
|-valign="top"
+
|hord_mt
+
|The default setting is 10.
+
 
+
|-valign="top"
+
|hord_vt
+
|The default setting is 10.
+
 
+
|-valign="top"
+
|hord_tm
+
|The default setting is 10.
+
 
+
|-valign="top"
+
|hord_dp
+
|The default setting is 13.
+
 
+
|-valign="top"
+
|hord_tr
+
|The default setting is 13.
+
 
+
|-valign="top"
+
|kord_tm
+
|The default setting is -9.
+
 
+
|-valign="top"
+
|kord_mt
+
|The default setting is 9.
+
 
+
|-valign="top"
+
|kord_wz
+
|The default setting is 9.
+
 
+
|-valign="top"
+
|kord_tr
+
|The default setting is 9.
+
 
+
|-valign="top"
+
|FV_OFF
+
|The default setting is false.
+
 
+
|-valign="top"
+
|fv_debug
+
|The default setting is false.
+
 
+
|-valign="top"
+
|inline_q
+
|The default setting is false.
+
 
+
|-valign="top"
+
|z_tracer
+
|The default setting is false.
+
 
+
|-valign="top"
+
|chk_mass
+
|The default setting is false.
+
 
+
|}
+
 
+
== ExtData.rc ==
+
ExtData.rc contains emissions information for use with MAPL and ESMF. The information it contains overlaps with GEOS-Chem configuration file HEMCO_Config.rc. However, during a GCHP run, emissions files are found using the paths in ExtData.rc rather than in HEMCO_Config.rc; the file paths in HEMCO_Config.rc are ignored. Unlike the HEMCO_Config.rc file, the ExtData.rc file includes GMAO Meteorological data.
+
 
+
For each primary export (MET-field), 2D variable at the edge (e.g. VEGFRAC), and emission, ExtData.rc includes the following information in a space-delimited single row.
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="175px"|Info Name
+
!width="800px"|Description
+
 
+
|-valign="top"
+
|Export Name
+
|Name of emissions in HEMCO, e.g. TROPP
+
 
+
|-valign="top"
+
|Units
+
|Unit string nested within single quotes. '1' indicates there is no unit conversion from the native units in the netcdf file.
+
 
+
|-valign="top"
+
|Dimension
+
|2D is xy and 3D is xyz.
+
 
+
|-valign="top"
+
|V Loc
+
|Possible values include C and E.
+
 
+
|-valign="top"
+
|Clim
+
|Logical flag
+
 
+
|-valign="top"
+
|Refresh Time Template
+
|Possible values include -, 0, 0:0
+
 
+
|-valign="top"
+
|Offset Factor
+
|Data offset factor, e.g. 0.0 for no offset.
+
 
+
|-valign="top"
+
|Scale Factor
+
|Data scale factor, e.g. 1.0 for no scaling.
+
 
+
|-valign="top"
+
|External File Variable
+
|Variable name in the netcdf data file, e.g. TROPPT in MET-fields
+
 
+
|-valign="top"
+
|External File Path
+
|Path to the netcdf data file. If not using the data, specify /dev/null to reduce processing time.
+
|}
+
 
+
Information for masks and derived exports may also be included using the format specified in the headers within the file.
+
--[[User:Lizzie Lundgren|Lizzie Lundgren]] ([[User talk:Lizzie Lundgren|talk]]) 22:31, 10 December 2015 (UTC)
+
 
+
= GCHP Restart Files =
+
'''This section is in development'''
+
 
+
= The ExtData Gridded Component =
+
'''This section is in development'''
+
 
+
= Offline Data Requirements =
+
'''This section is in development'''
+
 
+
= Working Meteorological Fields =
+
'''This section is in development'''
+
 
+
= Controlling Output with MAPL History =
+
'''This section is in development'''
+
 
+
= Changing Resolutions =
+
'''This section is in development'''
+
 
+
= Changing Resolutions =
+
 
+
= Examples =
+
 
+
Before running GCHP, check that the following is true:
+
*Compilation was successful and you have a <tt>geos</tt> executable in your run directory
+
*<tt>ExtData.rc</tt> contains met-field paths corresponding to your intended input resolution
+
*You have all required symbolic links in your run directory: ChemDataDir, CodeDir, MainDataDir, MetDir, TileFiles, and your restart file.
+
*All libraries and modules required by GCHP are loaded
+
 
+
===Quick start: 1-hr standard simulation===
+
 
+
The default GCHP run directory is set for a 1-hour tropchem simulation at resolution c24 with 2x25 input Met resolution and 4x5 output concentration resolution. c24 is approximately the cubed sphere equivalent of 4x5. If you followed the above instructions for setting up your run directory, you should have specified 2x2.5 input resolution for meteorology fields for your initial devkit run and edited <tt>ExtData.rc</tt> to include paths that reflect this resolution. For Odyssey users, the <tt>ExtData.rc</tt> update occurred automatically while for all non-Odyssey users this required manual editing.
+
 
+
For this quick test, you will need an environment with the following:
+
 
+
*6 CPUs (minimum - see model description)
+
*1 node
+
*At least 2500 MB of memory per CPU
+
*Your compiler, NetCDF and MPI implementation loaded
+
 
+
Odyssey users should refer to the [[GEOS-Chem_HP_Dev_Kit#Harvard Odyssey Users|Harvard Odyssey Users Environment Setup]] section of this page for a refresher on how to set up your environment prior to running GCHP.
+
 
+
'''WARNING! It appears that GFED might not be handled correctly by GCHP. We recommend that users disable fire emissions (GFED) in HEMCO_Config.rc for now. You will also need to disable the BOND emissions dataset as this interacts with the GFED dataset.'''
+
 
+
Once your run directory is all set up, start the simulation by typing the following:
+
 
+
<nowiki>mpirun -n 6 ./geos 2>&1 | tee 1hr_mapl.log</nowiki>
+
 
+
This command can be broken down as follows:
+
 
+
*<tt>mpirun</tt> executes an MPI-enabled executable and associates the necessary resources. This is a version-dependent executable; some MPI implementations use other commands, such as <tt>mpiexec</tt>.
+
 
+
*<tt>-n 6</tt> specifies how many individual CPU cores are requested for the run. The number given here should always be the total number of cores, regardless of how many nodes they are spread over, and must be a multiple of 6 (at least one for each of the cubed sphere faces, and the same number for each face).
+
 
+
*<tt>./geos</tt> is the local GEOS-Chem executable, as with GEOS-Chem Classic
+
 
+
*<tt>2>&1 | tee 1hr_maple.log</tt> is a bash shell-specific means of collecting all MAPL output (standard and error) that is written to the screen into a file.
+
 
+
Note that the output collected to the log file specified when invoking <tt>mpirun</tt> is created by MAPL. The more traditional GEOS-Chem log output is automatically sent to a file defined in configuration file <tt>GCHP.rc</tt>. By default, its format is <tt>PET%%%%%.GEOSCHEMchem.log</tt>, where <tt>%%%%%</tt> is replaced during run-time with a processor id (typically <tt>00000</tt>. <tt>PET</tt> stands for persistent execution thread. Unlike MAPL, which sends output to the log from ALL threads, GEOS-Chem only outputs from a single thread.
+
 
+
Once the simulation is complete, there should be two netcdf output files in the <tt>OutputDir</tt> sub-directory.  To get started with manipulating output data, see [[GEOS-Chem_HP_Output_Data|GCHP output data]].
+
 
+
===Basic test cases===
+
 
+
The next step is to try running some things of your own! These cases will require minor modifications to various elements of GCHP’s run directory, and should help to familiarize you with how exactly GCHP works. If you run into problems, please e-mail Lizzie Lundgren (elundgren@seas.harvard.edu).
+
 
+
====Basic case 1: Changing resolution and moving to multiple nodes====
+
 
+
Re-run the validation case at C48 (cube face side length N = 48 grid cells) resolution, with a shorter timestep (Δt = 600 s). This time, use a larger number of CPUs; say 12 cores, spread evenly across 2 nodes (C = 12, M = 2). Note that GCHP requires that cores are always distributed evenly across nodes. You will then need to change the following files to complete the change of resolution, timestep and core layout:
+
 
+
{| class="wikitable"
+
|-
+
! scope="col"| File (.rc)
+
! scope="col"| Changes for grid resolution CN
+
! scope="col"| Changes for timestep Δt
+
! scope="col"| Changes for core layout CxM
+
|-
+
| GCHP
+
| IM = ''N''<br/>JM = ''6N''<br/>GRIDNAME=PE''N''x''6N''-CF
+
| HEARTBEAT_DT=''Δt''<br/>*_DT=''Δt''
+
| NX=''M''<br/>NY=''C/M''
+
|-
+
| CAP
+
|
+
| HEARTBEAT_DT=''Δt''
+
|
+
|-
+
| fvcore_layout
+
| npx=''N''<br>
+
npy=''N''
+
| dt=''Δt''
+
|
+
|-
+
| HISTORY
+
|
+
|
+
| CoresPerNode=''C/M''
+
|}
+
For the specific case of C48 with a timestep of 600 s, distributed as shown, using 12 cores, you should have:
+
 
+
{| class="wikitable"
+
|-
+
! scope="col"| File (.rc)
+
! scope="col"| Changes for grid resolution CN
+
! scope="col"| Changes for timestep Δt
+
! scope="col"| Changes for core layout CxM
+
|-
+
| GCHP
+
| IM = 48<br/>JM = 288<br/>GRIDNAME=PE48x288-CF
+
| HEARTBEAT_DT=600<br/>*_DT=600
+
| NX=2<br/>NY=6
+
|-
+
| CAP
+
|
+
| HEARTBEAT_DT=600
+
|
+
|-
+
| fvcore_layout
+
| npx=48<br/>npy=48
+
| dt=600
+
|
+
|-
+
| HISTORY
+
|
+
|
+
| CoresPerNode=6
+
|}
+
 
+
Finally, use mpirun as normal (mpirun -n 12 ./geos 2>&1 | tee log).
+
 
+
A note regarding NX and NY: NX and NY specify the domain decomposition; that is, how the surface of the cubed sphere will be split up between the cores. NX corresponds to the number of processors to use per N cells in the X direction, where N is the cube side length. NY corresponds to the number of processors per N cells in the Y direction, but must also include an additional factor of 6, corresponding to the number of cube faces. Therefore any multiple of 6 is a valid value for NY, and the only other rigid constraint is that (NX*NY) = NP, where NP is the total number of processors assigned to the job. However, if possible, specifying NX = NY/6 will provide an optimal distribution of cores as it minimizes the amount of communication required. The number of cores requested should therefore ideally be 6*C*C, where C is an integer factor of N. For example, C=4 would give:
+
 
+
* NX = C = 4
+
* NY = 6*C = 24
+
* NP = 6*C*C = 64
+
 
+
This layout would be valid for any simulation where N is a multiple of 4. The absolute minimum case, C=1, provides NX=1, NY=6 and NP=6.
+
 
+
====Basic case 2: Running a coarse simulation with high-resolution met data====
+
 
+
So far, all the examples have been run using coarse (4x5) meteorological data, for the purposes of getting things started quickly. However, a major feature of GCHP is that it can read native-resolution meteorological data without regridding or preprocessing. To allow tests of this feature, a small archive of native-resolution meteorological data has been archived for 2015-07-01 through to 2015-07-10, and can be found at
+
 
+
<nowiki>/n/seasasfs01/gcgrid/data/GEOS_0.25x0.3125.d</nowiki>
+
 
+
To use this data in place of the standard GEOS-Chem meteorological data, you need to perform two changes. First, you need to change the target of your <code>MetDir</code> link. To remove your existing link, type
+
 
+
<nowiki> unlink MetDir </nowiki>
+
 
+
Then, to establish a new link, type (on Odyssey)
+
 
+
<nowiki> ln -s /n/seasasfs01/gcgrid/data/GEOS_0.25x0.3125.d/GEOS_FP MetDir </nowiki>
+
 
+
This will establish a link to the native resolution meteorological data. Now, open ExtData.rc and perform a find/replace, changing <code>2x25.nc</code> to <code>Native.nc</code> for all the meteorological data input files (collected at the top of ExtData.rc). Your GCHP run should now use the higher-resolution meteorological data. Note that this does come at a computational cost; however, this will have significantly reduced the artefacts that are associated with using coarse-resolution meteorological data on a foreign grid.
+
 
+
===Advanced test cases===
+
 
+
The following cases are deliberately light on setup information, to see how easy or difficult users find modifying the GCHP run and code directories to convince it to do what is needed. If you succeed in running any of these cases (or, possibly more importantly, if you find that you can’t), please e-mail Lizzie Lundgren at elundgren@seas.harvard.edu with details. The more detail the better; but please include at least the following:
+
*The test case name (if applicable)
+
*The resolution(s) you ran it at
+
*Whether the run completed or not
+
 
+
====Advanced case 1: Run GCHP with a restart file====
+
 
+
Run GCHP once for at least ten days in any chemically-active configuration, generate a restart file, and run GCHP again from that restart file. To help you get started, Odyssey users can find some non-zero restart files at
+
 
+
<nowiki>/n/regal/jacob_lab/seastham/GCHP_Restarts</nowiki>
+
 
+
Copy one of the files to your run directory and change GCHP.rc to read
+
 
+
<nowiki>GIGCchem_INTERNAL_RESTART_FILE: +gcchem_internal_checkpoint_c24.nc
+
GIGCchem_INTERNAL_CHECKPOINT_FILE: gcchem_internal_checkpoint_c24.nc</nowiki>
+
 
+
The + means that any missing values will be ignored rather than causing the simulation to fail. Note that the restart file has no date or time markers and will be overwritten at the end of the run, so make sure to back your restart files up if you wish to reuse them!
+
 
+
====Advanced case 2: GCHP speedrun====
+
 
+
Initialize GCHP with a non-zero restart file. Run with 66 tracers but with no processes except advective transport. Reduce run time as much as possible by setting unnecessary filepaths in ExtData.rc to “/dev/null”. This will result in them being set to default values without spending time reading files.
+
 
+
====Advanced case 3: Changing tracer count====
+
 
+
Add a new, passive tracer to GCHP. To do this, you will need to:
+
 
+
* Remove all tracers in <code>input.geos</code> and replace them with one tracer called PASV. Set the listed number of tracers to 1
+
* Modify the file <code>Chem_Registry.rc</code> in <code>CodeDir/GCHP/Registry</code> so that all tracers (<code>TRC_XYZ</code>) are removed, and replace them with the entry <code>TRC_PASV</code>. NOTE: You must perform a clean compile after changing <code>Chem_Registry.rc</code> or the changes will not take effect!
+
* Disable chemistry, deposition and emissions in <code>input.geos</code>
+
* Remove all tracer outputs in <code>HISTORY.rc</code> and replace them with one output giving <code>TRC_PASV</code>
+
 
+
If you find that you can run GCHP with these modifications, all that remains is to obtain a non-zero restart file. This will be available soon - if you reach this point, please contact [[User:Lizzie Lundgren|Lizzie Lundgren]].
+
 
+
[[GEOS-Chem_HP|Return to GCHP Main Page]]
+

Latest revision as of 15:41, 8 December 2020


The GCHP documentation has moved to https://gchp.readthedocs.io/. The GCHP documentation on http://wiki.seas.harvard.edu/ will stay online for several months, but it is outdated and no longer active!



Previous | Next | Getting Started with GCHP | GCHP Main Page

  1. Hardware and Software Requirements
  2. Setting Up the GCHP Environment
  3. Downloading Source Code and Data Directories
  4. Compiling
  5. Obtaining a Run Directory
  6. Running GCHP: Basics
  7. Running GCHP: Configuration
  8. Output Data
  9. Developing GCHP
  10. Run Configuration Files


Overview

All GCHP run directories have default simulation-specific run-time settings that are set when you create a run directory. You will likely want to change these settings. This page goes over how to do this.

Configuration files

GCHP is controlled using a set of configuration files that are included in the GCHP run directory. Files include:

  1. CAP.rc
  2. ExtData.rc
  3. GCHP.rc
  4. input.geos
  5. HEMCO_Config.rc
  6. HEMCO_Diagn.rc
  7. input.nml
  8. HISTORY.rc

Several run-time settings must be set consistently across multiple files. Inconsistencies may result in your program crashing or yielding unexpected results. To avoid mistakes and make run configuration easier, bash shell script runConfig.sh is included in all run directories to set the most commonly changed config file settings from one location. Sourcing this script will update multiple config files to use values specified in file.

Sourcing runConfig.sh is done automatically prior to running GCHP if using any of the example run scripts, or you can do it at the command line. Information about what settings are changed and in what files are standard output of the script. To source the script, type the following:

source runConfig.sh

You may also use it in silent mode if you wish to update files but not display settings on the screen:

source runConfig.sh --silent

While using runConfig.sh to configure common settings makes run configure much simpler, it comes with a major caveat. If you manually edit a config file setting that is also set in runConfig.sh then your manual update will be overrided via string replacement. Please get very familiar with the options in runConfig.sh and be conscientious about not updating the same setting elsewhere.

You generally will not need to know more about the GCHP configuration files beyond what is listed on this page. However, for a comprehensive description of all configuration files used by GCHP see the last section of this user manual.

Commonly Changed Run Options

Compute Configuration

Set Number of Nodes and Cores

To change the number of nodes and cores for your run you must update settings in two places: (1) runConfig.sh, and (2) your run script. The runConfig.sh file contains detailed instructions on how to set resource parameter options and what they mean. Look for the Compute Resources section in the script. Update your resource request in your run script to match the resources set in runConfig.sh.

It is important to be smart about your resource allocation. To do this it is useful to understand how GCHP works with respect to distribution of nodes and cores across the grid. At least one unique core is assigned to each face on the cubed sphere, resulting in a constraint of at least six cores to run GCHP. The same number of cores must be assigned to each face, resulting in another constraint of total number of cores being a multiple of six. Communication between the cores occurs only during transport processes.

While any number of cores is valid as long as it is a multiple of six (although there is an upper limit per resolution), you will typically start to see negative effects due to excessive communication if a core is handling less than around one hundred grid cells or a cluster of grid cells that are not approximately square. You can determine how many grid cells are handled per core by analyzing your grid resolution and resource allocation. For example, if running at C24 with six cores each face is handled by one core (6 faces / 6 cores) and contains 576 cells (24x24). Each core therefore processes 576 cells. Since each core handles one face, each core communicates with four other cores (four surrounding faces). Maximizing squareness of grid cells per core is done automatically within runConfig.sh if variable NXNY_AUTO is set to ON.

Further discussion about domain decomposition is in runConfig.sh section Domain Decomposition.

Split a Simulation Into Multiple Jobs

There is an option to split up a single simulation into separate serial jobs. To use this option, do the following:

  1. Update runConfig.sh with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted (NUM_RUNS). Carefully read the comments in runConfig.sh to ensure you understand how it works.
  2. Optionally turn on monthly diagnostic (Monthly_Diag). Only turn on monthly diagnostics if your run duration is monthly.
  3. Use gchp.multirun.run as your run script, or adapt it if your cluster does not use SLURM. It is located in the runScriptSamples subdirectory of your run directory. As with the regular gchp.run, you will need to update the file with compute resources consistent with runConfig.sh. Note that you should not submit the run script directly. It will be done automatically by the file described in the next step.
  4. Use gchp.multirun.sh to submit your job, or adapt it if your cluster does not use SLURM. It is located in the runScriptSamples subdirectory of your run directory. For example, to submit your series of jobs, type: ./gchp.multirun.sh

There is much documentation in the headers of both gchp.multirun.run and gchp.multirun.sh that is worth reading and getting familiar with, although not entirely necessary to get the multi-run option working. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3-hour simulation with 1-hour duration and number of runs equal to 3.

The multi-run script assumes use of SLURM, and a separate SLURM log file is created for each run. There is also log file called multirun.log with high-level information such as the start, end, duration, and job ids for all jobs submitted. If a run fails then all scheduled jobs are cancelled and a message about this is sent to that log file. Inspect this and your other log files, as well as output in the OutputDir/ directory prior to using for longer duration runs.

Change Domains Stack Size

For runs at very high resolution or small number of processors you may run into a domains stack size error. This is caused by exceeding the domains stack size memory limit set at run-time and the error will be apparent from the message in your log file. If this occurs you can increase the domains stack size in file input.nml. The default is set to 20000000.

Basic Run Settings

Set Cubed Sphere Grid Resolution

GCHP uses a cubed sphere grid rather than the traditional lat-lon grid used in GEOS-Chem Classic. While regular lat-lon grids are typically designated as ΔLat ⨉ ΔLon (e.g. 4⨉5), cubed sphere grids are designated by the side-length of the cube. In GCHP we specify this as CX (e.g. C24 or C180). The simple rule of thumb for determining the roughly equivalent lat-lon resolution for a given cubed sphere resolution is to divide the side length by 90. Using this rule you can quickly match C24 with about 4x5, C90 with 1 degree, C360 with quarter degree, and so on.

To change your grid resolution in the run directory edit the CS_RES integer parameter in runConfig.sh section Internal Cubed Sphere Resolution to the cube side length you wish to use. To use a uniform global grid resolution make sure that STRETCH_GRID is set to OFF.

Set Stretch Grid Resolution

GCHP has the capability to run with a stretched grid, meaning one portion of the globe is stretched to fine resolution. Set stretched grid parameter in runConfig.sh section Internal Cubed Sphere Resolution. See instructions in that section of the file.

Turn On/Off Model Components

You can toggle all primary GEOS-Chem components, including type of mixing, from within runConfig.sh. The settings in that file will update input.geos automatically. Look for section Turn Components On/Off, and other settings in input.geos. Other settings in this section beyond component on/off toggles using CH4 emissions in UCX, and initializing stratospheric H2O in UCX.

Change Model Timesteps

Model timesteps, both chemistry and dynamic, are configured within runConfig.sh. They are set to match GEOS-Chem Classic default values for low resolutions for comparison purposes but can be updated, with caution. Timesteps are automatically reduced for high resolution runs. Read the documentation in runConfig.sh section Timesteps for setting them.

Set Simulation Start and End Dates

Set simulation start and end in runConfig.sh section Simulation Start, End, Duration, # runs. Read the comments in the file for a complete description of the options. Typically a "CAP" runtime error indicates a problem with start, end, and duration settings. If you encounter an error with the words "CAP" near it then double-check that these settings make sense.

Inputs

Change Initial Restart File

All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in runConfig.sh.

You may overwrite the default restart file with your own by specifying the restart filename in runConfig.sh section Initial Restart File. Beware that it is your responsibility to make sure it is the proper grid resolution.

Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables, but all output restart files do.

Turn On/Off Emissions Inventories

Because file I/O impacts GCHP performance it is a good idea to turn off file read of emissions that you do not need. You can turn emissions inventories on or off the same way you would in GEOS-Chem Classic, by setting the inventories to true or false at the top of configuration file HEMCO_Config.rc. All emissions that are turned off in this way will be ignored when GCHP uses ExtData.rc to read files, thereby speeding up the model.

For emissions that do not have an on/off toggle at the top of the file, you can prevent GCHP from reading them by commenting them out in HEMCO_Config.rc. No updates to ExtData.rc would be necessary. If you alternatively comment out the emissions in ExtData.rc but not HEMCO_Config.rc then GCHP will fail with an error when looking for the file information.

Another option to skip file read for certain files is to replace the file path in ExtData.rc with /dev/null. However, if you want to turn these inputs back on at a later time you should preserve the original path by commenting out the original line.

Add New Emissions Files

There are two steps for adding new emissions inventories to GCHP:

  1. Add the inventory information to HEMCO_Config.rc.
  2. Add the inventory information to ExtData.rc.

To add information to HEMCO_Config.rc, follow the same rules as you would for adding a new emission inventory to GEOS-Chem Classic. Note that not all information in HEMCO_Config.rc is used by GCHP. This is because HEMCO is only used by GCHP to handle emissions after they are read, e.g. scaling and applying hierarchy. All functions related to HEMCO file read are skipped. This means that you could put garbage for the file path and units in HEMCO_Config.rc without running into problems with GCHP, as long as the syntax is what HEMCO expects. However, we recommend that you fill in HEMCO_Config.rc in the same way you would for GEOS-Chem Classic for consistency and also to avoid potential format check errors.

Staying consistent with the information that you put into HEMCO_Config.rc, add the inventory information to ExtData.rc following the guidelines listed at the top of the file and using existing inventories as examples. You can ignore all entries in HEMCO_Config.rc that are copies of another entry since putting these in ExtData.rc would result in reading the same variable in the same file twice. HEMCO interprets the copied variables, denoted by having dashes in the HEMCO_Config.rc entry, separate from file read.

A few common errors encountered when adding new input emissions files to GCHP are:

  1. Your input file contains integer values. Beware that the MAPL I/O component in GCHP does not read or write integers. If your data contains integers then you should reprocess the file to contain floating point values instead.
  2. Your data latitude and longitude dimensions are in the wrong order. Lat must always come before lon in your inputs arrays, a requirement true for both GCHP and GEOS-Chem Classic. For more information about this, see the [Preparing_data_files_for_use_with_HEMCO#Ordering_of_the_data|Preparing Data Files for Use with HEMCO wiki page]].
  3. Your 3D input data are mapped to the wrong levels in GEOS-Chem (silent error). If you read in 3D data and assign the resulting import to a GEOS-Chem state variable such as State_Chm or State_Met, then you must flip the vertical axis during the assignment. See files Includes_Before_Run.H and setting State_Chm%Species in Chem_GridCompMod.F90 for examples.
  4. You have a typo in either HEMCO_Config.rc or ExtData.rc. Error in HEMCO_Config.rc typically result in the model crashing right away. Errors in ExtData.rc typically result in a problem later on during ExtData read. Always try running with the MAPL debug flags on runConfig.sh (maximizes output to gchp.log) and Warnings and Verbose set to 3 in HEMCO_Config.rc (maximizes output to HEMCO.log) when encountering errors such as this. Another useful strategy is to find config file entries for similar input files and compare them against the entry for your new file. Directly comparing the file metadata may also lead to insights into the problem.

Outputs

Output Diagnostics Data on a Lat-Lon Grid

See documentation in the HISTORY.rc config file for instructions on how to output diagnostic collection on lat-lon grids.

Output Restart Files at Regular or Irregular Frequency

The MAPL component in GCHP has the option to output restart files (also called checkpoint files) prior to run end. The frequency of restart file write may be at regular time intervals (regular frequency) or at specific programmed times (irregular frequency). These periodic output restart files contain the date and time in their filenames.

Enabling this feature is a good idea if you plan on doing a long simulation and you are not splitting your run into multiple jobs. If the run crashes unexpectedly then you can restart mid-run rather than start over from the beginning.

Update settings for checkpoint restart outputs in runConfig.sh section Output Restarts. Instructions for configuring both regular and irregular frequency restart files are included in the file.

Turn On/Off Diagnostics

To turn diagnostic collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file HISTORY.rc. Collections cannot be turned on/off from runConfig.sh.

Set Diagnostic Frequency, Duration, and Mode

All diagnostic collections that come with the run directory have frequency, duration, and mode auto-set within runConfig.sh. The file contains a list of time-averaged collections and instantaneous collections, and allows setting a frequency and duration to apply to all collections listed for each. See section Output Diagnostics within runConfig.sh. To avoid auto-update of a certain collection, remove it from the list in runConfig.sh. If adding a new collection, you can add it to the file to enable auto-update of frequency, duration, and mode.

Add a New Diagnostics Collection

Adding a new diagnostics collection in GCHP is the same as for GEOS-Chem Classic netcdf diagnostics. You must add your collection to the collection list in HISTORY.rc and then define it further down in the file. Any 2D or 3D arrays that are stored within GEOS-Chem objects State_Met, State_Chm, or State_Diag, may be included as fields in a collection. State_Met variables must be preceded by "Met_", State_Chm variables must be preceded by "Chem_", and State_Diag variables should not have a prefix. See the HISTORY.rc file for examples.

Once implemented, you can either incorporate the new collection settings into runConfig.sh for auto-update, or you can manually configure all settings in HISTORY.rc. See the Output Diagnostics section of runConfig.sh for more information.

Generate Monthly Mean Diagnostics

There is an option to automatically generate monthly diagnostics by submitting month-long simulations as separate jobs. Splitting up the simulation into separate jobs is a requirement for monthly diagnostics because MAPL History requires a fixed number of hours set for diagnostic frequency and file duration. The monthly mean diagnostic option automatically updates HISTORY.rc diagnostic settings each month to reflect the number of days in that month taking into account leap years.

To use the monthly diagnostics option, first read and follow instructions for splitting a simulation into multiple jobs (see separate section on this page). Prior to submitting your run, enable monthly diagnostics in runConfig.sh by searching for variable "Monthly_Diag" and changing its value from 0 to 1. Be sure to always start your monthly diagnostic runs on the first day of the month.

Debugging

Enable Maximum Print Output

Besides compiling with CMAKE_BUILD_TYPE=Debug, there are a few settings you can configure to boost your chance of successful debugging. All of them involve sending additional print statements to the log files.

  1. Set Turn on debug printout? in input.geos to T to turn on extra GEOS-Chem print statements in the main log file.
  2. Set MAPL_EXTDATA_DEBUG_LEVEL in runConfig.sh to 1 to turn on extra MAPL print statements in ExtData, the component that handles input.
  3. Set the Verbose and Warnings settings in HEMCO_Config.rc to maximum values of 3 to send the maximum number of prints to HEMCO.log.

None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.



Previous | Next | Getting Started with GCHP | GCHP Main Page