Difference between revisions of "Running GCHP: Configuration"

From Geos-chem
Jump to: navigation, search
(Overview)
(21 intermediate revisions by the same user not shown)
Line 21: Line 21:
 
All sample run scripts include sourcing <tt>runConfig.sh</tt>. When <tt>runConfig.sh</tt> is sourced it prints out information on what settings are being changed to what value and in what file. This information is sent output log <tt>gchp.log</tt>.
 
All sample run scripts include sourcing <tt>runConfig.sh</tt>. When <tt>runConfig.sh</tt> is sourced it prints out information on what settings are being changed to what value and in what file. This information is sent output log <tt>gchp.log</tt>.
  
You generally will not need to know more about the GCHP configuration files beyond what is listed on this page. However, for more detailed information about the configuration files used by GCHP see the last section of this user manual which includes a list and description of all contents as well as a more detailed display of what <tt>runConfig.sh</tt> is actually doing.  
+
You generally will not need to know more about the GCHP configuration files beyond what is listed on this page. However, for more detailed information about the configuration files used by GCHP see the last section of this user manual which includes a list and description of all contents as well as a more detailed display of what <tt>runConfig.sh</tt> is actually doing. Even better is to look at the configuration files, look at the source code, and if in doubt, contact the GEOS-Chem Support Team.  
  
If there is something you want to configure in your GCHP run that is not described on this page please contact the GEOS-Chem Support Team with feedback.
+
If there is something you want to configure in your GCHP run that is not described on this page, or if you see an error, please contact the GEOS-Chem Support Team with feedback. You can also sign up for your own wiki account and expand these sections with clarifying information that you think would help other users.
  
 
== Run Configuration Options ==
 
== Run Configuration Options ==
Line 31: Line 31:
 
==== Set Number of Nodes and Cores ====
 
==== Set Number of Nodes and Cores ====
  
To change the number of nodes and cores for your run you must update settings in two places: (1) <tt>runConfig.sh</tt>, and (2) your run script. The <tt>runConfig.sh</tt> file contains detailed instructions on how to set resource parameter options as show below.
+
To change the number of nodes and cores for your run you must update settings in two places: (1) <tt>runConfig.sh</tt>, and (2) your run script. The <tt>runConfig.sh</tt> file contains detailed instructions on how to set resource parameter options as shown below in an example using 96 cores. This example is for GCHP 12.5.0 and may be slightly different for earlier versions.
  
[[File:gchp Compute_resources.png|Compute resources section of runConfig.sh]]
+
#------------------------------------------------
 +
Compute Resources
 +
#------------------------------------------------
 +
# Set number of cores, number of nodes, and number of cores per node.
 +
# Total cores must be divisible by 6. Cores per node must equal number
 +
# of cores divided by number of nodes. Make sure you have these
 +
# resources available.
 +
TOTAL_CORES=6
 +
NUM_NODES=1
 +
NUM_CORES_PER_NODE=6
 +
 +
# Cores are distributed across each of the six cubed sphere faces using
 +
# configurable parameters NX and NY. Each face is divided into NX by NY/6
 +
# regions and each of those regions is processed by a single core
 +
# independent of which node it belongs to. Making NX by NY/6 as square
 +
# as possible reduces communication overhead in GCHP.
 +
#
 +
# Set NXNY_AUTO to either auto-calculate NX and NY (ON) (recommended)
 +
# or set them manually (OFF).
 +
NXNY_AUTO=ON
 +
 +
# Rules and tips for setting NX and NY manually (NXNY_AUTO=OFF):
 +
#  1. NY must be an integer and a multiple of 6  
 +
#  2. NX*NY must equal total number of cores (NUM_NODES*NUM_CORES_PER_NODE)
 +
#  3. Choose NX and NY to optimize NX x NY/6 squareness
 +
#        Good examples: (NX=4,NY=24)  -> 96  cores at 4x4
 +
#                        (NX=6,NY=24)  -> 144 cores at 6x4
 +
#        Bad examples:  (NX=8,NY=12)  -> 96  cores at 8x2
 +
#                        (NX=12,NY=12) -> 144 cores at 12x2
 +
#  4. Domain decomposition requires that CS_RES/NX >= 4 and CS_RES*6/NY >= 4,
 +
#      which puts an upper limit on total cores per grid resolution.
 +
#        c24: 216 cores  (NX=6,  NY=36 )
 +
#        c48: 864 cores  (NX=12, NY=72 )
 +
#        c90: 3174 cores  (NX=22, NY=132)
 +
#        c180: 12150 cores (NX=45, NY=270)
 +
#        c360: 48600 cores (NX=90, NY=540)
 +
#      Using fewer cores may still trigger a domain decomposition error, e.g.:
 +
#        c48: 768 cores  (NX=16, NY=48)  --> 48/16=3 will trigger FV3 error
 +
NX=1 # Ignore if NXNY_AUTO=ON
 +
NY=6 # Ignore if NXNY_AUTO=ON
  
The sample SLURM run script will assign GCHP run resources based on settings in <tt>runConfig.sh</tt>. However, you must request the same number of nodes in your run script as in <tt>runConfig.sh</tt>. You may request additional cores and full memory per node to maximize available memory per core. For example, the below settings request 32 cores per node (entire nodes) and all memory per node. However, further down in the script only 16 cores per node are allocated for the GCHP run, consistent with the settings in the example of <tt>runConfig.sh</tt> above.
+
The sample SLURM run script will assign core resources based on settings in <tt>runConfig.sh</tt>. You can request additional cores in your run script to maximize memory available per core. However, you must request the same number of nodes in your run script as in <tt>runConfig.sh</tt>. For examples:
  
[[File:gchp Slurm.png|Compute resources section of runConfig.sh]]
+
#SBATCH -n 144                                                                                     
 +
#SBATCH -N 4                                                                                   
 +
#SBATCH --exclusive                                                                               
 +
#SBATCH -t 0-5:00                                                                                 
 +
#SBATCH -p huce_intel                                                                             
 +
#SBATCH --mem=MaxMemPerNode                                                                       
 +
#SBATCH --mail-type=ALL
 +
 
 +
In this example 144 cores are requested across 4 nodes. The <code>--exclusive</code> option prevents other users from using cores on that node, thereby maximizing memory available per core. In this example, if there are 32 cores per node, requesting 144 cores total achieves the same thing as the exclusive option and therefore is redundant. With the presence of the exclusive option the number of requested cores could be lowered to match the number of cores used in GCHP, in this case 96. This would have the advantage of allowing the run to be picked up by a node with as few as 24 cores, if available. However, the <code>--exclusive</code> option is sometimes disabled on clusters and may not work on your system. In this case, it is best to reserve an entire node by specifying all cores on the node.
  
 
It is important to be smart about your resource allocation. To do this it is useful to understand how GCHP works with respect to distribution of nodes and cores across the grid. At least one unique core is assigned to each face on the cubed sphere, resulting in a constraint of at least six cores to run GCHP. The same number of cores must be assigned to each face, resulting in another constraint of total number of cores being a multiple of six. Communication between the cores occurs only during transport processes.  
 
It is important to be smart about your resource allocation. To do this it is useful to understand how GCHP works with respect to distribution of nodes and cores across the grid. At least one unique core is assigned to each face on the cubed sphere, resulting in a constraint of at least six cores to run GCHP. The same number of cores must be assigned to each face, resulting in another constraint of total number of cores being a multiple of six. Communication between the cores occurs only during transport processes.  
Line 43: Line 90:
 
While any number of cores is valid as long as it is a multiple of six, you will typically start to see negative effects due to excessive communication if a core is handling less than around one hundred grid cells or a cluster of grid cells that are not approximately square. You can determine how many grid cells are handled per core by analyzing your grid resolution and resource allocation. For example, if running at C24 with six cores each face is handled by one core (6 faces / 6 cores) and contains 576 cells (24x24). Each core therefore processes 576 cells. Since each core handles one face, each core communicates with four other cores (four surrounding faces).
 
While any number of cores is valid as long as it is a multiple of six, you will typically start to see negative effects due to excessive communication if a core is handling less than around one hundred grid cells or a cluster of grid cells that are not approximately square. You can determine how many grid cells are handled per core by analyzing your grid resolution and resource allocation. For example, if running at C24 with six cores each face is handled by one core (6 faces / 6 cores) and contains 576 cells (24x24). Each core therefore processes 576 cells. Since each core handles one face, each core communicates with four other cores (four surrounding faces).
  
You can configure approximately how the cores are assigned to grid cell geometry by using the NX and NY configuration variables in GCHP as shown above. But what is this actually doing? Imagine lining up the six face grids adjacent to each other to get a single rectangular array. The rectangle will have N grid cells width (e.g. 24 if a C24 grid), and 6N grid cells height (since 6 faces). NX is the number of segments the width N is broken into for core distribution. NY is the number of segments the height 6N is broken into and must always be a multiple of six. NX * NY is always the total number of cores.  
+
You can configure approximately how the cores are assigned to grid cell geometry by using the NX and NY configuration variables in GCHP as shown above. Starting in 12.5.0 this can be done automatically for you. But what is this actually doing? Imagine lining up the six face grids adjacent to each other to get a single rectangular array. The rectangle will have N grid cells width (e.g. 24 if a C24 grid), and 6N grid cells height (since 6 faces). NX is the number of segments the width N is broken into for core distribution. NY is the number of segments the height 6N is broken into and must always be a multiple of six. NX * NY is always the total number of cores.  
  
 
For the case of a six core run, NX is equal to 1 and NY is equal to 6. This is because the entire N grid cells width is handled by 1 core (NX) and the 6N grid cells height is handled by 6 cores (NY), or one per face. If you instead wanted each face to be handled by four cores, and further constrain each core to handle one face quadrant, you would set NX equal to two and NY equal to twelve. A simple way of thinking about this is that core distribution across each face is with geometry NX x NY/6. In this last example that would be equivalent to 2x2.
 
For the case of a six core run, NX is equal to 1 and NY is equal to 6. This is because the entire N grid cells width is handled by 1 core (NX) and the 6N grid cells height is handled by 6 cores (NY), or one per face. If you instead wanted each face to be handled by four cores, and further constrain each core to handle one face quadrant, you would set NX equal to two and NY equal to twelve. A simple way of thinking about this is that core distribution across each face is with geometry NX x NY/6. In this last example that would be equivalent to 2x2.
Line 53: Line 100:
 
#Update <tt>runConfig.sh</tt> with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted. Carefully read these parts of <tt>runConfig.sh</tt> to ensure you understand how it works.
 
#Update <tt>runConfig.sh</tt> with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted. Carefully read these parts of <tt>runConfig.sh</tt> to ensure you understand how it works.
 
#Use <tt>gchp.multirun.run</tt> as your run script, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. As with the regular <tt>gchp.run</tt>, you will need to update the file with compute resources consistent with <tt>runConfig.sh</tt>.  '''Note that you should not submit the run script directly.''' It will be done automatically by the file described in the next step.
 
#Use <tt>gchp.multirun.run</tt> as your run script, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. As with the regular <tt>gchp.run</tt>, you will need to update the file with compute resources consistent with <tt>runConfig.sh</tt>.  '''Note that you should not submit the run script directly.''' It will be done automatically by the file described in the next step.
#Use <tt>gchp.multirun.sh</tt> to submit your job, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. For example, to submit your series of jobs, type:
+
#Use <tt>gchp.multirun.sh</tt> to submit your job, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. For example, to submit your series of jobs, type: <code>./gchp.multirun.sh</code>
 +
 
 +
The settings for the multi-run option in <tt>runConfig.sh</tt> are number of consecutive runs, and whether to turn on monthly diagnostics. Only turn on monthly diagnostics if your run duration is monthly.
  
  ./gchp.multirun.sh
+
  #------------------------------------------------
 +
#    Multi-run option
 +
#------------------------------------------------       
 +
# The simplest run is a single segment. Set Num_Runs=1 and Monthly_Diag=0.
 +
#
 +
# In some cases it is advantageous to split up your simulation into
 +
# multiple runs, what we call the multi-run option. Use this option as follows:
 +
#  1. Set Num_Runs below to total # of consecutive runs
 +
#  2. Set Monthly_Diag=1 to output monthly diagnostics; else 0.
 +
#  3. Copy gchp.multirun.sh and gchp.multirun.run from runScriptSamples/
 +
#      to run directory
 +
#  4. Configure resources at the top of gchp.multirun.run (assumes SLURM).
 +
#      This is the run script used for each individual run in the sequence.
 +
#  5. Set duration above to the duration of each INDIVIDUAL run
 +
#  6. Set end date after start date to span ALL runs
 +
#  7. Execute shell script gchp.multirun.sh at the command line
 +
#        $ ./gchp.multirun.sh
 +
#
 +
# When using monthly diagnostics:
 +
#  - Run segment duration must be 1-month (00000100 000000)
 +
#  - Start date must be within the first 28 days of the month
 +
#  - There is no need to set diag frequency and duration in this file
 +
#    since they will be over-written for each run based on days in month
 +
#
 +
Num_Runs=1
 +
Monthly_Diag=0
  
There is much documentation in the headers of both <tt>gchp.multirun.run</tt> and <tt>gchp.multirun.sh</tt> that is worth reading and getting familiar with. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3 hour simulation with 1 hour duration and number of runs equal to 3.  
+
There is much documentation in the headers of both <tt>gchp.multirun.run</tt> and <tt>gchp.multirun.sh</tt> that is worth reading and getting familiar with, although not entirely necessary to get the multi-run option working. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3-hour simulation with 1-hour duration and number of runs equal to 3.  
  
Besides the regular GCHP log file, which will be appended to for each consecutive run, there will be a "multirun.log" file with high-level information such as the start, end, duration, and job ids for all jobs submitted. Inspect this and your other log files, as well as output in the <tt>OutputDir/</tt> directory prior to using for longer duration runs.
+
The multi-run script assumes use of SLURM, and a separate SLURM log file is created for each run. There is also log file called <code>multirun.log</code> with high-level information such as the start, end, duration, and job ids for all jobs submitted. If a run fails then all scheduled jobs are cancelled and a message about this is sent to that log file. Inspect this and your other log files, as well as output in the <tt>OutputDir/</tt> directory prior to using for longer duration runs.
  
 
==== Change Domains Stack Size ====
 
==== Change Domains Stack Size ====
Line 72: Line 146:
 
To change your grid resolution in the run directory edit the "CS_RES" integer parameter in <tt>runConfig.sh</tt> to the cube side-length you wish to use.
 
To change your grid resolution in the run directory edit the "CS_RES" integer parameter in <tt>runConfig.sh</tt> to the cube side-length you wish to use.
  
[[File:gchp Cs_res.png|Set cubed sphere resolution by specifying integer number of sides per face in runConfig.sh]]
+
#------------------------------------------------
 +
#  Internal Cubed Sphere Resolution
 +
#------------------------------------------------
 +
CS_RES=24 # 24 ~ 4x5, 48 ~ 2x2.5, 90 ~ 1x1.25, 180 ~ 1/2 deg, 360 ~ 1/4 deg
  
 
==== Turn On/Off Model Components ====
 
==== Turn On/Off Model Components ====
Line 78: Line 155:
 
You can toggle all primary GEOS-Chem components, including type of mixing, from within <tt>runConfig.sh</tt>. The settings in that file will update <tt>input.geos</tt> automatically.
 
You can toggle all primary GEOS-Chem components, including type of mixing, from within <tt>runConfig.sh</tt>. The settings in that file will update <tt>input.geos</tt> automatically.
  
[[File:gchp Components.png|Turn on and off components from runConfig.sh rather than input.geos]]
+
#------------------------------------------------
 +
#    Turn Components On/Off
 +
#------------------------------------------------
 +
# Automatically turns on/off GEOS-Chem components in input.geos.
 +
#
 +
# WARNING: these settings will override manual updates you make to input.geos!
 +
#
 +
Turn_on_Chemistry=T
 +
Turn_on_emissions=T
 +
Turn_on_Dry_Deposition=T
 +
Turn_on_Wet_Deposition=T
 +
Turn_on_Transport=T
 +
Turn_on_Cloud_Conv=T
 +
Turn_on_PBL_Mixing=T
 +
Turn_on_Non_Local_Mixing=T
  
 
==== Change Model Timesteps ====
 
==== Change Model Timesteps ====
  
Model timesteps, both chemistry and dynamic, are configured within <tt>runConfig.sh</tt>. They are set to match GEOS-Chem Classic default values for comparison purposes but can be updated, with caution. Read the documentation in <tt>runConfig.sh</tt> for setting them to be fully aware of recommended settings and their implications.
+
Model timesteps, both chemistry and dynamic, are configured within <tt>runConfig.sh</tt>. They are set to match GEOS-Chem Classic default values for comparison purposes but can be updated, with caution. Read the documentation in <tt>runConfig.sh</tt> for setting them to be fully aware of recommended settings. Changing to higher resolutions will automatically change the timestep based on the rules set in <tt>runConfig.sh</tt>.
  
[[File:gchp Timesteps.png|Read the notes on timesteps within runConfig.sh prior to changing]]
+
#------------------------------------------------
 +
#    Timesteps
 +
#------------------------------------------------
 +
# Optimal timesteps are dependent on grid resolution and are automatically
 +
# set based on the GCHP Working Group's recommendation below. To override
 +
# these settings, comment out the code and manually define the following
 +
# variables:
 +
#    ChemEmiss_Timestep_sec    : chemistry timestep interval [s]
 +
#    TransConv_Timestep_sec    : dynamic timestep interval [s]
 +
#    TransConv_Timestep_HHMMSS  : dynamic timestep interval as HHMMSS string
 +
#
 +
# WARNING: Settings in this file will override settings in input.geos!
 +
#
 +
# NOTE: Default timesteps for c24 and c48, the cubed-sphere rough equivalents
 +
# of 4x5 and 2x2.5, are the same as defaults timesteps in GEOS-Chem Classic
 +
#
 +
if [[ $CS_RES -lt 180 ]]; then
 +
    ChemEmiss_Timestep_sec=1200
 +
    TransConv_Timestep_sec=600
 +
    TransConv_Timestep_HHMMSS=001000
 +
else
 +
    ChemEmiss_Timestep_sec=600
 +
    TransConv_Timestep_sec=300
 +
    TransConv_Timestep_HHMMSS=000500
 +
fi
  
 
==== Set Simulation Start and End Dates ====
 
==== Set Simulation Start and End Dates ====
Line 90: Line 205:
 
Set simulation start and end in <tt>runConfig.sh</tt>.
 
Set simulation start and end in <tt>runConfig.sh</tt>.
  
[[File:gchp Start_end.png|Set simulation start, end, and duration within runConfig.sh rather than input.geos]]
+
#------------------------------------------------
 +
#    Simulation Start/End/Duration
 +
#------------------------------------------------
 +
# For single-segment runs, duration should be less than or equal to the
 +
# difference between start and end time. If end time is past start time
 +
# plus duration, the simulation will end at start time plus duration rather
 +
# than end time.
 +
#
 +
# Setting duration such that two or more durations can occur between start
 +
# and end will enable multi-segmented runs. At the end of each run the
 +
# end time is stored as the new start time in output file cap_restart.
 +
# Rerunning without removing or editing cap_restart will start at the
 +
# start time in cap_restart rather than the start time listed below.
 +
# Use this feature with the multi-segmented runs / monthly diagnostics
 +
# section below. See more information about this on the GCHP wiki.
 +
#
 +
Start_Time="20160101 000000"
 +
End_Time="20160101 030000"
 +
Duration="00000000 030000"
  
 
There is also a "Duration" field in the file which must be set to reflect how long your run will last. If your end date is earlier than your start date plus duration then your GCHP run will fail. If your end date is later than your start date plus duration then your job will not make it to your configured end date; it will end at start date plus duration. If your end date is multiple durations past your start date then subsequent job submissions will start where your last run ended, so long as you do not delete file <tt>cap_restart</tt>. That file contains a new start string that will always be used if the file is present. You can take advantage of this file for splitting up a long simulation into multiple jobs. See further down on this page for automation of this task built into the run directory.
 
There is also a "Duration" field in the file which must be set to reflect how long your run will last. If your end date is earlier than your start date plus duration then your GCHP run will fail. If your end date is later than your start date plus duration then your job will not make it to your configured end date; it will end at start date plus duration. If your end date is multiple durations past your start date then subsequent job submissions will start where your last run ended, so long as you do not delete file <tt>cap_restart</tt>. That file contains a new start string that will always be used if the file is present. You can take advantage of this file for splitting up a long simulation into multiple jobs. See further down on this page for automation of this task built into the run directory.
Line 110: Line 243:
 
All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in <tt>runConfig.sh</tt>. All of the restart files are simply GEOS-Chem Classic restart files regridded to the cubed sphere.  
 
All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in <tt>runConfig.sh</tt>. All of the restart files are simply GEOS-Chem Classic restart files regridded to the cubed sphere.  
  
[[File:gchp Restart.png|The restart filename is set automatically in runConfig.sh, but you may overwrite it manually]]
+
#------------------------------------------------
 +
#    Initial Restart File
 +
#------------------------------------------------
 +
# By default the linked restart files in the run directories will be
 +
# used. Please note that HEMCO restart variables are stored in the same
 +
# restart file as species concentrations. Initial restart files available
 +
# on gcgrid do not contain HEMCO variables which will have the same effect
 +
# as turning the HEMCO restart file option off in GC classic. However, all
 +
# output restart files will contain HEMCO restart variables for your next run.
 +
INITIAL_RESTART=initial_GEOSChem_rst.c${CS_RES}_TransportTracers.nc
 +
 +
# You can specify a custom initial restart file here to overwrite:
 +
# INITIAL_RESTART=your_restart_filename_here
  
You may over-write the default restart file with your own by specifying the restart filename in <tt>runConfig.sh</tt>. Beware that it is your responsibility to make sure it is the proper grid resolution. Publicly available tools for regridding are listed in the GCHP Output Files page of this user manual.
+
You may over-write the default restart file with your own by specifying the restart filename in <tt>runConfig.sh</tt>. Beware that it is your responsibility to make sure it is the proper grid resolution.
  
Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables unless "HEMCO" appears in the filename. This is only the case for the benchmark restart files used for the 1-year benchmark simulation that relies on a valid spin-up.
+
Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables, but all output restart files do.
  
 
==== Turn On/Off Emissions Inventories ====
 
==== Turn On/Off Emissions Inventories ====
Line 125: Line 270:
  
 
==== Add New Input Files ====
 
==== Add New Input Files ====
 +
 +
'''''New in GCHP 12.5.0:''''' '''Online ESMF regridding removes the need for tile files. All parts of this section related to tile files can be ignored if using 12.5.0.'''
  
 
There are three main requirements for adding new emissions inventories to GCHP:  
 
There are three main requirements for adding new emissions inventories to GCHP:  
Line 153: Line 300:
 
==== Output Restart Files at Regular Frequency ====
 
==== Output Restart Files at Regular Frequency ====
  
The MAPL component in GCHP has the option to output restart files (also called checkpoint files) at regular intervals. Unlike the final restart file output at the end of a simulation, these regularly output restart files contain the date and time in their filename. Enabling this feature is a good idea if you plan on doing a long simulation and you are not splitting your run into multiple jobs. If the run crashes unexpectedly then you can restart mid-run rather than start over from the beginning. To set the checkpoint frequency, simply update the HHmmSS string for "Checkpoint_Freq" in <tt>runConfig.rc</tt>. Minutes and seconds must each be two digits but hours can be more than two.
+
The MAPL component in GCHP has the option to output restart files (also called checkpoint files) at regular intervals. Unlike the final restart file output at the end of a simulation, these regularly output restart files contain the date and time in their filename. Enabling this feature is a good idea if you plan on doing a long simulation and you are not splitting your run into multiple jobs. If the run crashes unexpectedly then you can restart mid-run rather than start over from the beginning. To set the checkpoint frequency, simply update the HHmmSS string for "Checkpoint_Freq" in <tt>runConfig.rc</tt>. Minutes and seconds must each be two digits but hours can be more than two. Each output checkpoint file will include the timestamp in the filename.
 +
 
 +
#------------------------------------------------
 +
#    Output Restart Files
 +
#------------------------------------------------
 +
# You can output restart files at regular intervals throughout your
 +
# simulation. These restarts are in addition to the end-of-run restart
 +
# which is always produced. To configure output restart file frequency,
 +
# set the variable below to a string of format HHmmSS. More than 2
 +
# digits for the hours string is permitted (e.g. 1680000 for 7 days).
 +
# Setting the frequency to 000000 will turn off this feature by setting
 +
# it to a very large number.
 +
Checkpoint_Freq="000000"
  
 
==== Turn On/Off Diagnostics ====
 
==== Turn On/Off Diagnostics ====
Line 161: Line 320:
 
To turn collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file <tt>HISTORY.rc</tt>.  
 
To turn collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file <tt>HISTORY.rc</tt>.  
  
[[File:gchp Collections.png]]
+
#===================================================================
 +
# Declare collection names and toggle on/off
 +
#===================================================================
 +
COLLECTIONS: #'AerosolMass'
 +
            #'Aerosols',
 +
            #'Budget',
 +
            #'CloudConvFlux',
 +
            #'ConcAfterChem',
 +
            #'DryDep',
 +
            'Emissions',
 +
            #'JValues',
 +
            #'LevelEdgeDiags',     
 +
            #'ProdLoss',
 +
            'SpeciesConc',
 +
            #'StateChm',
 +
            'StateMet_avg', 
 +
            'StateMet_inst', 
 +
            #'WetLossConv',
 +
            #'WetLossLS',
 +
::
  
 
Once a collection is turned on, you can comment diagnostics within it further down in the file by searching for the collection name with ".fields" suffix. Be aware that you cannot comment out the diagnostic that appears on the same line as the fields keyword. If you wish to suppress that specific diagnostic then move it to the next line and replace it with a diagnostic that you want to output.
 
Once a collection is turned on, you can comment diagnostics within it further down in the file by searching for the collection name with ".fields" suffix. Be aware that you cannot comment out the diagnostic that appears on the same line as the fields keyword. If you wish to suppress that specific diagnostic then move it to the next line and replace it with a diagnostic that you want to output.
  
[[File:gchp History.png|gchp History.png]]
+
#===================================================================
 +
# State_Met array diagnostics - time-averaged
 +
  StateMet_avg.template:     '%y4%m2%d2_%h2%n2z.nc4',
 +
  StateMet_avg.format:        'CFIO',
 +
  StateMet_avg.frequency:    010000
 +
  StateMet_avg.duration:      010000
 +
  StateMet_avg.mode:          'time-averaged'
 +
  StateMet_avg.fields:        'Met_AD              ', 'GIGCchem',
 +
                              #'Met_AIRDEN          ', 'GIGCchem',
 +
                              #'Met_AIRVOL          ', 'GIGCchem',
 +
                              #'Met_ALBD            ', 'GIGCchem',
 +
                              'Met_AREAM2          ', 'GIGCchem',
 +
                              #'Met_AVGW            ', 'GIGCchem',
 +
                              'Met_BXHEIGHT        ', 'GIGCchem',
 +
                              etc
  
 
==== Set Diagnostic Frequency, Duration, and Mode ====
 
==== Set Diagnostic Frequency, Duration, and Mode ====
  
All diagnostic collections that come with the run directory have frequency, duration, and mode defined within <tt>runConfig.sh</tt>. With the exception of SpeciesConc_inst and StateMet_inst, all collections are time-averaged (mode) with frequency and duration set to the simulation length you specified in <tt>CopyRunDirs.input</tt> when creating the run directory. Any of these defaults can be over-written by editing <tt>runConfig.sh</tt>. Be aware that manual updates of <tt>HISTORY.rc</tt> will be over-written by <tt>runConfig.sh</tt> settings.
+
'''WARNING: There is currently a bug in GCHP the prevents writing out more than one time per file. Duration in HISTORY.rc is ignored.'''
  
[[File:gchp Diag_freq_dur_mode.png|Set default diagnostic frequency, duration, and mode in runConfig.sh]]
+
All diagnostic collections that come with the run directory have frequency, duration, and mode defined within <tt>runConfig.sh</tt>. With the exception of SpeciesConc_inst and StateMet_inst, all collections are time-averaged (mode) with frequency and duration set to the simulation length you specified in <tt>CopyRunDirs.input</tt> when creating the run directory. Any of these defaults can be over-written by editing <tt>runConfig.sh</tt>. Be aware that manual updates of <tt>HISTORY.rc</tt> will be over-written by <tt>runConfig.sh</tt> settings.
  
[[File:gchp Diag_defaults.png|Default diagnostic collection settings can easily be changed in runConfig.sh]]
+
#------------------------------------------------
 +
#    Diagnostics
 +
#------------------------------------------------       
 +
# Frequency, duration, and mode used for all default HISTORY.rc diagnostic
 +
# collections are set from within this file. These are defined as:
 +
#
 +
#  Frequency = frequency of diagnostic calculation (HHmmSS)
 +
#  Duration  = frequency of diagnostic file  write (HHmmSS)
 +
#  Mode      = computation of diagnostics (time-averaged or instantaneous)
 +
#
 +
# Edit the frequency, duration, and mode below to change global settings.
 +
# See the list further below of what HISTORY.rc collections will be updated.
 +
#
 +
# NOTES:
 +
#  1. Freq and duration hours may exceed 2 digits, e.g. 7440000 for 31 days
 +
#  2. Freq and duration are ignored if Monthly_Diag is set to 1
 +
#  3. If you do not want settings for certain collections set automatically
 +
#    from this file, comment them out below.
 +
#  4. If you add a collection to HISTORY.rc and want its settings
 +
#    automatically updated from this file, add to the list below.
 +
#  5. To turn off collections completely, comment them out in HISTORY.rc.
 +
#
 +
common_freq="010000"          # Ignore if using multi-run monthly diag option
 +
common_dur="010000"          # Ignore if using multi-run monthly diag option
 +
common_mode="'time-averaged'" # "'time-averaged'" and "'instantaneous'"
 +
 +
SpeciesConc_freq=${common_freq}
 +
SpeciesConc_dur=${common_dur}
 +
SpeciesConc_mode=${common_mode}
 +
AerosolMass_freq=${common_freq}
 +
AerosolMass_dur=${common_dur}
 +
AerosolMass_mode=${common_mode}
 +
Aerosols_freq=${common_freq}
 +
Aerosols_dur=${common_dur}
 +
Aerosols_mode=${common_mode}
 +
Budget_freq=${common_freq}
 +
etc
  
 
==== Add a New Diagnostics Collection ====
 
==== Add a New Diagnostics Collection ====
Line 189: Line 417:
 
==== Additional Diagnostic Collection Options ====
 
==== Additional Diagnostic Collection Options ====
  
See file <tt>GCHP/Shared/MAPL_Base/TeX/HistoryIntro.tex</tt> for original MAPL documentation on MAPL History. Here is a brief overview of options that may be included for each collection that is taken from that document:
+
See file <tt>GCHP/Shared/MAPL_Base/TeX/HistoryIntro.tex</tt> for original MAPL documentation on MAPL History. Please note that we have not tested all of these functionalities and some of them to seem to not work in MAPL. Proceed with caution and let the GEOS-Chem Support Team know what you find. Here is a brief overview of options that may be included for each collection that is taken from that document:
  
 
'''template'''
 
'''template'''
Line 208: Line 436:
 
'''acc_interval'''  
 
'''acc_interval'''  
 
Integer (HHHHMMSS) for the acculation interval ($\le$ frequency) for time-averaged diagnostics. Default = <tt>frequency</tt>; ignored if <tt>mode</tt> is `instantaneous'.
 
Integer (HHHHMMSS) for the acculation interval ($\le$ frequency) for time-averaged diagnostics. Default = <tt>frequency</tt>; ignored if <tt>mode</tt> is `instantaneous'.
 +
 
'''ref_date'''     
 
'''ref_date'''     
 
Integer (YYYYMMDD) reference date for {\em frquency}; also the beginning date for the collection. Default is the Start date on the Clock.
 
Integer (YYYYMMDD) reference date for {\em frquency}; also the beginning date for the collection. Default is the Start date on the Clock.
Line 221: Line 450:
  
 
'''duration'''       
 
'''duration'''       
Integer (HHHHMMSS) for the duration of each file. Default = 00000000 (everything in one file).
+
Integer (HHHHMMSS) for the duration of each file. Default = 00000000 (everything in one file). '''''Duration is not currently functional in GCHP and will be ignored. Frequency is used instead for write frequency.'''''
  
 
'''resolution'''     
 
'''resolution'''     
Line 265: Line 494:
 
Besides compiling with "make compile_debug", there are a few run settings you can configure to boost your chance of successful debugging. All of them involve sending additional print statements to the log files.  
 
Besides compiling with "make compile_debug", there are a few run settings you can configure to boost your chance of successful debugging. All of them involve sending additional print statements to the log files.  
 
#Change "ND70" in input.geos from 0 to 1 to turn on extra GEOS-Chem print statements in the main log file.  
 
#Change "ND70" in input.geos from 0 to 1 to turn on extra GEOS-Chem print statements in the main log file.  
#Set the "MAPL_DEBUG" variable in <tt>runConfig.sh</tt> to a number greater than 0 to turn on extra MAPL print statements in MAPL ExtData. This is useful if you are having a problem reading input files. The higher the number the more prints will be sent to the log (and the slower your run will be). Usually 20 is sufficient, although you can go higher. Please be sure to remember to set MAPL_DEBUG back to 0 when you are done so as not to severely slow down your runs!
+
#Set the "MAPL_DEBUG_LEVEL" variable in <tt>runConfig.sh</tt> to a number greater than 0 to turn on extra MAPL print statements in MAPL ExtData. This is useful if you are having a problem reading input files. The higher the number the more prints will be sent to the log (and the slower your run will be). Usually 20 is sufficient, although you can go higher. Please be sure to remember to set MAPL_DEBUG back to 0 when you are done so as not to severely slow down your runs!
 
#Set the "Verbose" and "Warnings" settings in <tt>HEMCO_Config.rc</tt> to maximum values of 3 to send the maximum number of prints to <tt>HEMCO.log</tt>.
 
#Set the "Verbose" and "Warnings" settings in <tt>HEMCO_Config.rc</tt> to maximum values of 3 to send the maximum number of prints to <tt>HEMCO.log</tt>.
 +
#Set the "MEMORY_DEBUG_LEVEL" option, new in 12.5.0, to 1 to turn on additional memory usage prints per timestep.
  
[[File:gchp Debug.png|MAPL debug setting in runConfig.sh]]
+
#------------------------------------------------
 +
#    Debug Options
 +
#------------------------------------------------
 +
# Set MAPL debug flag to 0 for no extra MAPL debug log output, or 1 to
 +
# print information to log. Using this flag is most helpful for debugging
 +
# issues with file read (MAPL ExtData).
 +
#
 +
# Set memory debug flag to 0 to print memory only once per timestep. Set to
 +
# 1 to enable memory prints at additional locations throughout the run.
 +
#
 +
# For GEOS-Chem debug prints, turn on ND70 in input.geos manually.     
 +
#
 +
# WARNING: Turning on debug prints significantly slows down the model!
 +
#
 +
MAPL_DEBUG_LEVEL=0
 +
MEMORY_DEBUG_LEVEL=0
  
 
None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.
 
None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.

Revision as of 23:04, 15 August 2019

Previous | Next | Getting Started With GCHP | GCHP Main Page

  1. Hardware and Software Requirements
  2. Downloading Source Code and Data Directories
  3. Obtaining a Run Directory
  4. Setting Up the GCHP Environment
  5. Compiling
  6. Running GCHP: Basics
  7. Running GCHP: Configuration
  8. Output Data
  9. Developing GCHP
  10. Run Configuration Files


Overview

All default GCHP run directories are set up to run at c24 resolution with 0.25x0.325 GEOS-FP meteorology, 6 cores, and 1 node. This is the simplest possible run and a good test case for your initial setup. However, you will want to change these settings, and potentially several others, for your research runs. This page goes over how to do this.

GCHP has several configuration files, most of which end in suffix ".rc". Rather than update many files, some of which contain redundant information, we instead use utility shell script runConfig.sh to set most options in a single location. Sourcing the file automatically updates other configuration files prior to the run and eliminates the need for remembering what to update and where. However, it is important to note that that doing this will overwrite settings in other configuration files. You therefore should never manually update other configuration files unless you know the specific option is not available for setting in runConfig.sh.

All sample run scripts include sourcing runConfig.sh. When runConfig.sh is sourced it prints out information on what settings are being changed to what value and in what file. This information is sent output log gchp.log.

You generally will not need to know more about the GCHP configuration files beyond what is listed on this page. However, for more detailed information about the configuration files used by GCHP see the last section of this user manual which includes a list and description of all contents as well as a more detailed display of what runConfig.sh is actually doing. Even better is to look at the configuration files, look at the source code, and if in doubt, contact the GEOS-Chem Support Team.

If there is something you want to configure in your GCHP run that is not described on this page, or if you see an error, please contact the GEOS-Chem Support Team with feedback. You can also sign up for your own wiki account and expand these sections with clarifying information that you think would help other users.

Run Configuration Options

Compute Configuration

Set Number of Nodes and Cores

To change the number of nodes and cores for your run you must update settings in two places: (1) runConfig.sh, and (2) your run script. The runConfig.sh file contains detailed instructions on how to set resource parameter options as shown below in an example using 96 cores. This example is for GCHP 12.5.0 and may be slightly different for earlier versions.

#------------------------------------------------
#   Compute Resources
#------------------------------------------------
# Set number of cores, number of nodes, and number of cores per node.
# Total cores must be divisible by 6. Cores per node must equal number
# of cores divided by number of nodes. Make sure you have these
# resources available.
TOTAL_CORES=6
NUM_NODES=1
NUM_CORES_PER_NODE=6

# Cores are distributed across each of the six cubed sphere faces using
# configurable parameters NX and NY. Each face is divided into NX by NY/6
# regions and each of those regions is processed by a single core 
# independent of which node it belongs to. Making NX by NY/6 as square
# as possible reduces communication overhead in GCHP.
#
# Set NXNY_AUTO to either auto-calculate NX and NY (ON) (recommended)
# or set them manually (OFF).
NXNY_AUTO=ON

# Rules and tips for setting NX and NY manually (NXNY_AUTO=OFF):
#   1. NY must be an integer and a multiple of 6	  
#   2. NX*NY must equal total number of cores (NUM_NODES*NUM_CORES_PER_NODE)
#   3. Choose NX and NY to optimize NX x NY/6 squareness 
#         Good examples: (NX=4,NY=24)  -> 96  cores at 4x4
#                        (NX=6,NY=24)  -> 144 cores at 6x4
#         Bad examples:  (NX=8,NY=12)  -> 96  cores at 8x2
#                        (NX=12,NY=12) -> 144 cores at 12x2
#   4. Domain decomposition requires that CS_RES/NX >= 4 and CS_RES*6/NY >= 4,
#      which puts an upper limit on total cores per grid resolution.
#         c24: 216 cores   (NX=6,  NY=36 )
#         c48: 864 cores   (NX=12, NY=72 )
#         c90: 3174 cores  (NX=22, NY=132)
#        c180: 12150 cores (NX=45, NY=270)
#        c360: 48600 cores (NX=90, NY=540)
#      Using fewer cores may still trigger a domain decomposition error, e.g.:
#         c48: 768 cores   (NX=16, NY=48)  --> 48/16=3 will trigger FV3 error
NX=1 # Ignore if NXNY_AUTO=ON
NY=6 # Ignore if NXNY_AUTO=ON

The sample SLURM run script will assign core resources based on settings in runConfig.sh. You can request additional cores in your run script to maximize memory available per core. However, you must request the same number of nodes in your run script as in runConfig.sh. For examples:

#SBATCH -n 144                                                                                       
#SBATCH -N 4                                                                                     
#SBATCH --exclusive                                                                                 
#SBATCH -t 0-5:00                                                                                   
#SBATCH -p huce_intel                                                                               
#SBATCH --mem=MaxMemPerNode                                                                         
#SBATCH --mail-type=ALL

In this example 144 cores are requested across 4 nodes. The --exclusive option prevents other users from using cores on that node, thereby maximizing memory available per core. In this example, if there are 32 cores per node, requesting 144 cores total achieves the same thing as the exclusive option and therefore is redundant. With the presence of the exclusive option the number of requested cores could be lowered to match the number of cores used in GCHP, in this case 96. This would have the advantage of allowing the run to be picked up by a node with as few as 24 cores, if available. However, the --exclusive option is sometimes disabled on clusters and may not work on your system. In this case, it is best to reserve an entire node by specifying all cores on the node.

It is important to be smart about your resource allocation. To do this it is useful to understand how GCHP works with respect to distribution of nodes and cores across the grid. At least one unique core is assigned to each face on the cubed sphere, resulting in a constraint of at least six cores to run GCHP. The same number of cores must be assigned to each face, resulting in another constraint of total number of cores being a multiple of six. Communication between the cores occurs only during transport processes.

While any number of cores is valid as long as it is a multiple of six, you will typically start to see negative effects due to excessive communication if a core is handling less than around one hundred grid cells or a cluster of grid cells that are not approximately square. You can determine how many grid cells are handled per core by analyzing your grid resolution and resource allocation. For example, if running at C24 with six cores each face is handled by one core (6 faces / 6 cores) and contains 576 cells (24x24). Each core therefore processes 576 cells. Since each core handles one face, each core communicates with four other cores (four surrounding faces).

You can configure approximately how the cores are assigned to grid cell geometry by using the NX and NY configuration variables in GCHP as shown above. Starting in 12.5.0 this can be done automatically for you. But what is this actually doing? Imagine lining up the six face grids adjacent to each other to get a single rectangular array. The rectangle will have N grid cells width (e.g. 24 if a C24 grid), and 6N grid cells height (since 6 faces). NX is the number of segments the width N is broken into for core distribution. NY is the number of segments the height 6N is broken into and must always be a multiple of six. NX * NY is always the total number of cores.

For the case of a six core run, NX is equal to 1 and NY is equal to 6. This is because the entire N grid cells width is handled by 1 core (NX) and the 6N grid cells height is handled by 6 cores (NY), or one per face. If you instead wanted each face to be handled by four cores, and further constrain each core to handle one face quadrant, you would set NX equal to two and NY equal to twelve. A simple way of thinking about this is that core distribution across each face is with geometry NX x NY/6. In this last example that would be equivalent to 2x2.

Split a Simulation Into Multiple Jobs

There is an option to split up a single simulation into separate serial jobs. To use this option, do the following:

  1. Update runConfig.sh with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted. Carefully read these parts of runConfig.sh to ensure you understand how it works.
  2. Use gchp.multirun.run as your run script, or adapt it if your cluster does not use SLURM. It is located in the runScriptSamples subdirectory of your run directory. As with the regular gchp.run, you will need to update the file with compute resources consistent with runConfig.sh. Note that you should not submit the run script directly. It will be done automatically by the file described in the next step.
  3. Use gchp.multirun.sh to submit your job, or adapt it if your cluster does not use SLURM. It is located in the runScriptSamples subdirectory of your run directory. For example, to submit your series of jobs, type: ./gchp.multirun.sh

The settings for the multi-run option in runConfig.sh are number of consecutive runs, and whether to turn on monthly diagnostics. Only turn on monthly diagnostics if your run duration is monthly.

#------------------------------------------------
#    Multi-run option
#------------------------------------------------        
# The simplest run is a single segment. Set Num_Runs=1 and Monthly_Diag=0.
#
# In some cases it is advantageous to split up your simulation into 
# multiple runs, what we call the multi-run option. Use this option as follows:
#   1. Set Num_Runs below to total # of consecutive runs
#   2. Set Monthly_Diag=1 to output monthly diagnostics; else 0.
#   3. Copy gchp.multirun.sh and gchp.multirun.run from runScriptSamples/
#      to run directory
#   4. Configure resources at the top of gchp.multirun.run (assumes SLURM).
#      This is the run script used for each individual run in the sequence.
#   5. Set duration above to the duration of each INDIVIDUAL run
#   6. Set end date after start date to span ALL runs
#   7. Execute shell script gchp.multirun.sh at the command line
#         $ ./gchp.multirun.sh
#
# When using monthly diagnostics:
#   - Run segment duration must be 1-month (00000100 000000)
#   - Start date must be within the first 28 days of the month
#   - There is no need to set diag frequency and duration in this file
#     since they will be over-written for each run based on days in month
#
Num_Runs=1
Monthly_Diag=0

There is much documentation in the headers of both gchp.multirun.run and gchp.multirun.sh that is worth reading and getting familiar with, although not entirely necessary to get the multi-run option working. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3-hour simulation with 1-hour duration and number of runs equal to 3.

The multi-run script assumes use of SLURM, and a separate SLURM log file is created for each run. There is also log file called multirun.log with high-level information such as the start, end, duration, and job ids for all jobs submitted. If a run fails then all scheduled jobs are cancelled and a message about this is sent to that log file. Inspect this and your other log files, as well as output in the OutputDir/ directory prior to using for longer duration runs.

Change Domains Stack Size

For runs at very high resolution or small number of processors you may run into a domains stack size error. This is caused by exceeding the domains stack size memory limit set at run-time and the error will be apparent from the message in your log file. If this occurs you can increase the domains stack size in file input.nml. The default is set to 20000000.

Basic Run Settings

Set Cubed Sphere Grid Resolution

GCHP uses a cubed sphere grid rather than the traditional lat-lon grid used in GEOS-Chem Classic. While regular lat-lon grids are typically designated as ΔLat ⨉ ΔLon (e.g. 4⨉5), cubed sphere grids are designated by the side-length of the cube. In GCHP we specify this as CX (e.g. C24 or C180). The simple rule of thumb for determining the roughly equivalent lat/lon for a given cubed sphere resolution is to divide the side length by 90. Using this rule you can quickly match C24 with 4x5, C90 with 1 degree, C360 with quarter degree, and so on.

To change your grid resolution in the run directory edit the "CS_RES" integer parameter in runConfig.sh to the cube side-length you wish to use.

#------------------------------------------------
#   Internal Cubed Sphere Resolution
#------------------------------------------------
CS_RES=24 # 24 ~ 4x5, 48 ~ 2x2.5, 90 ~ 1x1.25, 180 ~ 1/2 deg, 360 ~ 1/4 deg

Turn On/Off Model Components

You can toggle all primary GEOS-Chem components, including type of mixing, from within runConfig.sh. The settings in that file will update input.geos automatically.

#------------------------------------------------
#    Turn Components On/Off
#------------------------------------------------
# Automatically turns on/off GEOS-Chem components in input.geos.
#
# WARNING: these settings will override manual updates you make to input.geos!
#
Turn_on_Chemistry=T
Turn_on_emissions=T
Turn_on_Dry_Deposition=T
Turn_on_Wet_Deposition=T
Turn_on_Transport=T
Turn_on_Cloud_Conv=T
Turn_on_PBL_Mixing=T
Turn_on_Non_Local_Mixing=T

Change Model Timesteps

Model timesteps, both chemistry and dynamic, are configured within runConfig.sh. They are set to match GEOS-Chem Classic default values for comparison purposes but can be updated, with caution. Read the documentation in runConfig.sh for setting them to be fully aware of recommended settings. Changing to higher resolutions will automatically change the timestep based on the rules set in runConfig.sh.

#------------------------------------------------
#    Timesteps
#------------------------------------------------
# Optimal timesteps are dependent on grid resolution and are automatically
# set based on the GCHP Working Group's recommendation below. To override
# these settings, comment out the code and manually define the following
# variables:
#    ChemEmiss_Timestep_sec     : chemistry timestep interval [s]
#    TransConv_Timestep_sec     : dynamic timestep interval [s]
#    TransConv_Timestep_HHMMSS  : dynamic timestep interval as HHMMSS string
#
# WARNING: Settings in this file will override settings in input.geos!
#
# NOTE: Default timesteps for c24 and c48, the cubed-sphere rough equivalents
# of 4x5 and 2x2.5, are the same as defaults timesteps in GEOS-Chem Classic
#
if  $CS_RES -lt 180 ; then
   ChemEmiss_Timestep_sec=1200
   TransConv_Timestep_sec=600
   TransConv_Timestep_HHMMSS=001000
else
   ChemEmiss_Timestep_sec=600
   TransConv_Timestep_sec=300
   TransConv_Timestep_HHMMSS=000500
fi

Set Simulation Start and End Dates

Set simulation start and end in runConfig.sh.

#------------------------------------------------
#    Simulation Start/End/Duration
#------------------------------------------------
# For single-segment runs, duration should be less than or equal to the
# difference between start and end time. If end time is past start time
# plus duration, the simulation will end at start time plus duration rather
# than end time.
#
# Setting duration such that two or more durations can occur between start
# and end will enable multi-segmented runs. At the end of each run the 
# end time is stored as the new start time in output file cap_restart.
# Rerunning without removing or editing cap_restart will start at the
# start time in cap_restart rather than the start time listed below. 
# Use this feature with the multi-segmented runs / monthly diagnostics
# section below. See more information about this on the GCHP wiki.
#
Start_Time="20160101 000000"
End_Time="20160101 030000"
Duration="00000000 030000"

There is also a "Duration" field in the file which must be set to reflect how long your run will last. If your end date is earlier than your start date plus duration then your GCHP run will fail. If your end date is later than your start date plus duration then your job will not make it to your configured end date; it will end at start date plus duration. If your end date is multiple durations past your start date then subsequent job submissions will start where your last run ended, so long as you do not delete file cap_restart. That file contains a new start string that will always be used if the file is present. You can take advantage of this file for splitting up a long simulation into multiple jobs. See further down on this page for automation of this task built into the run directory.

Typically a "CAP" error indicates a problem with start, end, and duration settings. If you encounter an error with the words "CAP" near it then double-check that these settings make sense.

Inputs

Change Input Meteorology Grid Resolution and/or Source

For versions 12.1.0 and later you can specify meteorology source when creating a run directory. The grid resolutions will automatically be set to 0.25x0.3125 for GEOS-FP and 0.5x0.625 for MERRA2. If you wish to change meteorology source using these versions then simply create a new run directory. To change the grid resolution of the meteorology, update all meteorology paths and filenames in ExtData.rc.

For versions prior to 12.1.0 the GCHP run directories are set by default to use 0.25x0.3125 GEOS-FP meteorology. To change to MERRA2, redefine the MetDir symbolic link to point to the MERRA2 directory. To change the grid resolution of input meteorology, update all meteorology paths and filenames in ExtData.rc.

When changing meteorology source and/or grid resolution, be sure that you have the data available for the time period you plan on simulating. In addition, note that meteorology listed in ExtData.rc includes both data for the time period you plan on running at as well as constants files (2011 to GEOS-FP and 2015 for MERRA2). See the downloading GEOS-Chem data page for more information on meteorology sources available and how to download them.

Change Your Initial Restart File

All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in runConfig.sh. All of the restart files are simply GEOS-Chem Classic restart files regridded to the cubed sphere.

#------------------------------------------------
#    Initial Restart File
#------------------------------------------------
# By default the linked restart files in the run directories will be 
# used. Please note that HEMCO restart variables are stored in the same
# restart file as species concentrations. Initial restart files available 
# on gcgrid do not contain HEMCO variables which will have the same effect
# as turning the HEMCO restart file option off in GC classic. However, all 
# output restart files will contain HEMCO restart variables for your next run.
INITIAL_RESTART=initial_GEOSChem_rst.c${CS_RES}_TransportTracers.nc

# You can specify a custom initial restart file here to overwrite:
# INITIAL_RESTART=your_restart_filename_here

You may over-write the default restart file with your own by specifying the restart filename in runConfig.sh. Beware that it is your responsibility to make sure it is the proper grid resolution.

Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables, but all output restart files do.

Turn On/Off Emissions Inventories

Because file I/O impacts GCHP performance it is a good idea to turn of file read of emissions that you do not need. You can turn emissions inventories on or off the same way you would in GEOS-Chem Classic, by setting the inventories to true or false at the top of configuration file HEMCO_Config.rc. All emissions that are turned off in this way will be ignored when GCHP uses ExtData.rc to read files, thereby speeding up the model.

For emissions that do not have an on/off toggle at the top of the file, you can prevent GCHP from reading them by commenting them out in HEMCO_Config.rc. No updates to ExtData.rc would be necessary. If you alternatively comment out the emissions in ExtData.rc but not HEMCO_Config.rc then GCHP will fail with an error when looking for the file information.

Another option to skip file read for certain files is to replace the file path in ExtData.rc with /dev/null. However, if you want to turn these inputs back on at a later time you should preserve the original path in a comment.

Add New Input Files

New in GCHP 12.5.0: Online ESMF regridding removes the need for tile files. All parts of this section related to tile files can be ignored if using 12.5.0.

There are three main requirements for adding new emissions inventories to GCHP:

  1. Add the inventory information to HEMCO_Config.rc. If you wish to add new inputs to the model that are not handled by HEMCO then you can skip this step.
  2. Add the inventory information to ExtData.rc.
  3. Have a tile file available that maps the inventory's lat/lon grid to the cubed sphere grid for the resolution you will use.

To add information to HEMCO_Config.rc, follow the same rules as you would for adding a new emission inventory to GEOS-Chem Classic. Note that not all information in HEMCO_Config.rc is used by GCHP. This is because HEMCO is only used by GCHP to handle emissions after they are read, e.g. scaling and applying hierarchy. All functions related to HEMCO file read are skipped. This means that you could put garbage for the file path and units in HEMCO_Config.rc without running into problems with GCHP. However, we recommend that you fill in HEMCO_Config.rc in the same way you would for GEOS-Chem Classic for consistency and also to avoid potential format check errors.

Staying consistent with the information that you put into HEMCO_Config.rc, add the inventory information to ExtData.rc following the guidelines listed at the top of the file and using existing inventories as examples. You can ignore all entries in HEMCO_Config.rc that are copies of another entry since putting these in ExtData.rc would result in reading the same variable in the same file twice. Doing so would be costly in GCHP because each file is opened and closed for each variable in the file. HEMCO interprets the copied variables, denoted by having dashes in the HEMCO_Config.rc entry, separate from file read.

At this point it is best to run a very short simulation with GCHP with MAPL debug prints on (see section on debugging below). If your file(s) need a new tile file then the model will crash. Tile files have already been created for many lat/lon grids and these are stored in ExtData/GCHP/TileFiles. The GCHP log file error will include the tile file name that GCHP expects to be available for regridding your new inventory. In that filename DC = dateline centered, PC = pole centered, DE = dateline edge, and PE = pole edge. UU is reserved for files on regional grids. Once you have this information you should be able to generate your own tile file by downloading the tempestremap and CSGrid repositories from GitHub and following these steps:

  • tempestremap: This tool will generate a netcdf tile file for mapping lat/lon coordinates to cubed sphere. This is a fortran tool that should work in your existing GCHP environment. Simply do make clean and then make to build the tempestrehap code. Then use runGlobal.sh or runRegional.template.sh to generate global or regional bound tile files. When using runGlobal.sh, you will need to specify whether your data is dateline-centered and/or pole-centered, as determined from the log file error message. We recommend generating a tile file for all supported cubed-sphere resolutions (nC = 24, 48, 90, 180, and 360). NOTE: There seems to be a 2 GB limit when creating tile files with tempestremap.
  • CSGrid: This tool will convert the netCDF file created by tempestremap to binary for compatibility with GCHP. CSGrid requires a Matlab license. If you have Matlab you can use exampleScripts/create_Tempest_TileFile_LL2CS.m to convert your netCDF output in tempestremap/TileFiles from netcdf to binary. Send the resulting file to the GCST and they can add it to ExtData/GCHP/TileFiles.

Once read in by GCHP, your data will be stored as MAPL Import variables with the same names that appear in the first column of ExtData.rc. If your input files are handled by HEMCO then you do not need to do anything else to handle the MAPL Imports. However, if your new inputs are not handled by HEMCO then you will need to take the additional steps of adding source code to transfer your MAPL Imports to something that GEOS-Chem can understand. If you wish to assign a MAPL Import directly to a State_Met or other state field in GEOS-Chem, you can do this in GCHP file "Includes_Before_Run.H". The lines in that file are executed prior to every dynamic timestep in GCHP and currently contain the setting of all State_Met fields derived from MAPL Imports. For more advanced use cases, read through GCHP file Chem_GridCompMod.F90 for examples, specifically searching for calls to subroutine MAPL_GetPointer. Contact the GEOS-Chem Support Team for more information on how to use MAPL Imports within GEOS-Chem.

A few common errors encountered when adding new input files to GCHP are:

  1. Your input file contains integer values. Beware that the MAPL I/O component in GCHP does not read or write integers. If your data contains integers then you should reprocess the file to contain floating point values instead. If you try to input integers you will get an error such as this:
    >>Reading TESTDATA from ./MainDataDir/testfile.nc
    CFIO: Reading ./MainDataDir/testfile.nc at 19850101 000000
    CFIO_GetVar: error getting scale
    CFIO_CFIO_GetVar failed
    problem in ESMF_CFIOSdfVarRead
  2. Your data latitude and longitude dimensions are in the wrong order. Lat must always come before lon in your inputs arrays, a requirement true for both GCHP and GEOS-Chem Classic. For more information about this, see the [Preparing_data_files_for_use_with_HEMCO#Ordering_of_the_data|Preparing Data Files for Use with HEMCO wiki page]]. The symptom of this error in GCHP is:
    CFIO: Reading {filename} at {YYYYMMDD} {HHmmSS}
    Error reading
    variable using NF90_GET_VAR -57
    NetCDF: Start+count exceeds dimension bound
  3. You do not have a tile file that regrids between your input file data resolution and the internal resolution of GCHP. This will result in an ExtData error in MAPL_HorzTransform.
  4. Your 3D input data are mapped to the wrong levels in GEOS-Chem (silent error). If you read in 3D data and assign the resulting import to a GEOS-Chem state variable such as State_Chm or State_Met, then you must flip the vertical axis during the assignment. See files Includes_Before_Run.H and setting State_Chm%Species in Chem_GridCompMod.F90 for examples.
  5. You have a typo in either HEMCO_Config.rc or ExtData.rc. Error in HEMCO_Config.rc typically result in the model crashing right away. Errors in ExtData.rc typically result in a problem later on during ExtData read. Always try running with DEBUG=20 in runConfig.sh (maximizes output to gchp.log) and Warnings and Verbose set to 3 in HEMCO_Config.rc (maximizes output to HEMCO.log) when encountering errors such as this. Another useful strategy is to find rc-file entries for similar input files and compare them against the entry for your new file. Directly comparing the file metadata may also lead to insights into the problem.

Outputs

Output Restart Files at Regular Frequency

The MAPL component in GCHP has the option to output restart files (also called checkpoint files) at regular intervals. Unlike the final restart file output at the end of a simulation, these regularly output restart files contain the date and time in their filename. Enabling this feature is a good idea if you plan on doing a long simulation and you are not splitting your run into multiple jobs. If the run crashes unexpectedly then you can restart mid-run rather than start over from the beginning. To set the checkpoint frequency, simply update the HHmmSS string for "Checkpoint_Freq" in runConfig.rc. Minutes and seconds must each be two digits but hours can be more than two. Each output checkpoint file will include the timestamp in the filename.

#------------------------------------------------
#    Output Restart Files
#------------------------------------------------
# You can output restart files at regular intervals throughout your
# simulation. These restarts are in addition to the end-of-run restart
# which is always produced. To configure output restart file frequency,
# set the variable below to a string of format HHmmSS. More than 2
# digits for the hours string is permitted (e.g. 1680000 for 7 days).
# Setting the frequency to 000000 will turn off this feature by setting
# it to a very large number.
Checkpoint_Freq="000000"

Turn On/Off Diagnostics

All GCHP run directories have four collections on by default: time-averaged species concentrations, instantaneous species concentrations, time-averaged meteorology, and instantaneous meteorology. All species are enabled while only a subset of meteorology variables are enabled. There are several other collections already implemented but they are off by default for the standard and benchmark simulations, and on by default for the RnPbBe simulation.

To turn collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file HISTORY.rc.

#===================================================================
# Declare collection names and toggle on/off
#===================================================================
COLLECTIONS: #'AerosolMass'
            #'Aerosols',
            #'Budget',
            #'CloudConvFlux',
            #'ConcAfterChem',
            #'DryDep',
            'Emissions',
            #'JValues',
            #'LevelEdgeDiags',      
            #'ProdLoss',
            'SpeciesConc',
            #'StateChm',
            'StateMet_avg',  
            'StateMet_inst',  
            #'WetLossConv',
            #'WetLossLS',
::

Once a collection is turned on, you can comment diagnostics within it further down in the file by searching for the collection name with ".fields" suffix. Be aware that you cannot comment out the diagnostic that appears on the same line as the fields keyword. If you wish to suppress that specific diagnostic then move it to the next line and replace it with a diagnostic that you want to output.

#===================================================================
# State_Met array diagnostics - time-averaged
 StateMet_avg.template:      '%y4%m2%d2_%h2%n2z.nc4',
 StateMet_avg.format:        'CFIO',
 StateMet_avg.frequency:     010000
 StateMet_avg.duration:      010000
 StateMet_avg.mode:          'time-averaged'
 StateMet_avg.fields:        'Met_AD               ', 'GIGCchem',
                             #'Met_AIRDEN          ', 'GIGCchem',
                             #'Met_AIRVOL          ', 'GIGCchem',
                             #'Met_ALBD            ', 'GIGCchem',
                             'Met_AREAM2           ', 'GIGCchem',
                             #'Met_AVGW            ', 'GIGCchem',
                             'Met_BXHEIGHT         ', 'GIGCchem',
                             etc

Set Diagnostic Frequency, Duration, and Mode

WARNING: There is currently a bug in GCHP the prevents writing out more than one time per file. Duration in HISTORY.rc is ignored.

All diagnostic collections that come with the run directory have frequency, duration, and mode defined within runConfig.sh. With the exception of SpeciesConc_inst and StateMet_inst, all collections are time-averaged (mode) with frequency and duration set to the simulation length you specified in CopyRunDirs.input when creating the run directory. Any of these defaults can be over-written by editing runConfig.sh. Be aware that manual updates of HISTORY.rc will be over-written by runConfig.sh settings.

#------------------------------------------------
#    Diagnostics
#------------------------------------------------        
# Frequency, duration, and mode used for all default HISTORY.rc diagnostic
# collections are set from within this file. These are defined as:
#
#   Frequency = frequency of diagnostic calculation (HHmmSS)
#   Duration  = frequency of diagnostic file  write (HHmmSS)
#   Mode      = computation of diagnostics (time-averaged or instantaneous)
#
# Edit the frequency, duration, and mode below to change global settings.
# See the list further below of what HISTORY.rc collections will be updated.
# 
# NOTES: 
#  1. Freq and duration hours may exceed 2 digits, e.g. 7440000 for 31 days
#  2. Freq and duration are ignored if Monthly_Diag is set to 1
#  3. If you do not want settings for certain collections set automatically
#     from this file, comment them out below.
#  4. If you add a collection to HISTORY.rc and want its settings
#     automatically updated from this file, add to the list below.
#  5. To turn off collections completely, comment them out in HISTORY.rc.
#
common_freq="010000"          # Ignore if using multi-run monthly diag option
common_dur="010000"           # Ignore if using multi-run monthly diag option
common_mode="'time-averaged'" # "'time-averaged'" and "'instantaneous'"

SpeciesConc_freq=${common_freq}
SpeciesConc_dur=${common_dur}
SpeciesConc_mode=${common_mode}
AerosolMass_freq=${common_freq}
AerosolMass_dur=${common_dur}
AerosolMass_mode=${common_mode}
Aerosols_freq=${common_freq}
Aerosols_dur=${common_dur}
Aerosols_mode=${common_mode}
Budget_freq=${common_freq}
etc

Add a New Diagnostics Collection

Adding a new diagnostics collection in GCHP is the same as for GEOS-Chem Classic netcdf diagnostics. You must add your collection to the collection list in HISTORY.rc and then define it further down in the file. Any 2D or 3D arrays that are stored within State_Met, State_Chm, or State_Diag, and that are successfully incorporated into the GEOS-Chem Registry may be included as fields in a collection. State_Met variables must be preceded by "met_", State_Chm variables must be preceded by "chm_", and State_Diag variables should not have a prefix. See GeosCore/state_diag_mod.F90 for examples of how existing State_Diag arrays are implemented.

Once implemented, you can either incorporate the new collection settings into runConfig.sh for auto-update, or you can manually configure all settings in HISTORY.rc.

Generate Monthly Mean Diagnostics

There is an option to automatically generate monthly diagnostics by submitting month-long simulations as separate jobs. Splitting up the simulation into separate jobs is a requirement for monthly diagnostics because MAPL History requires a fixed number of hours set for diagnostic frequency and file duration. The monthly mean diagnostic option automatically updates HISTORY.rc diagnostic settings each month to reflect the number of days in that month taking into account leap years.

To use the monthly diagnostics option, first read and follow instructions for splitting a simulation into multiple jobs (see separate section on this page). Prior to submitting your run, enable monthly diagnostics in runConfig.sh by searching for variable "Monthly_Diag" and changing its value from 0 to 1. Be sure to always start your monthly diagnostic runs on the first day of the month.

Additional Diagnostic Collection Options

See file GCHP/Shared/MAPL_Base/TeX/HistoryIntro.tex for original MAPL documentation on MAPL History. Please note that we have not tested all of these functionalities and some of them to seem to not work in MAPL. Proceed with caution and let the GEOS-Chem Support Team know what you find. Here is a brief overview of options that may be included for each collection that is taken from that document:

template Character string defining the time stamping template that is appended to collection to create a particular file name. The template uses GrADS convensions. The default value depends on the duration of the file.

descr Character string describing the collection. Defaults to `expdsc'.

format Character string to select file format ("CFIO", "CFIOasync", "flat"). "CFIO" uses MAPL_CFIO and produces netcdf output. "CFIOasync" uses MAPL_CFIO but delegates the actual I/O to the MAPL_CFIOServer (see MAPL_CFIOServer documenation for details). Default = "flat".

frequency Integer (HHHHMMSS) for the frequency of time groups in the collection. Default = 060000.

mode Character string equal to `instantaneous' or `time-averaged'. Default = 'instantaneous'.

acc_interval Integer (HHHHMMSS) for the acculation interval ($\le$ frequency) for time-averaged diagnostics. Default = frequency; ignored if mode is `instantaneous'.

ref_date Integer (YYYYMMDD) reference date for {\em frquency}; also the beginning date for the collection. Default is the Start date on the Clock.

ref_time Integer (HHMMSS) Same a ref_date.

end_date Integer (YYYYMMDD) ending date to stop diagnostic output. Default: no end

end_time Integer (HHMMSS) ending time to stop diagnostic output. Default: no end.

duration Integer (HHHHMMSS) for the duration of each file. Default = 00000000 (everything in one file). Duration is not currently functional in GCHP and will be ignored. Frequency is used instead for write frequency.

resolution Optional resolution (IM JM) for the ouput stream. Transforms betwee two regulate LogRect grid in index space. Default is the native resolution.

xyoffset Optional Flag for output grid offset when interpolating. Must be between 0 and 3. (Cryptic Meaning: 0:DcPc, 1:DePc, 2:DcPe, 3:DePe). Ignored when resolution results in no interpolation (native). Default: 0 (DatelineCenterPoleCenter).

levels Optional list of output levels (Default is all levels on Native Grid). If vvars is not specified, these are layer indices. Otherwise see vvars, vunits, vscale.

vvars Optional field to use as the vertical coordinate and functional form of vertical interpolation. A second argument specifies the component the field comes from. Example 1: the entry 'log(PLE)','DYNAMICS' uses PLE from the FV3 advection component as the vertical coordinate and interpolates to levels linearly in its log. Example 2: 'THETA','DYN' a way of producing isentropic output. Only log(*), pow(*), and real number and straight linear interpolation are supported.

vunit Character string to use for units attribute of the vertical coordinate in file. The default is the MAPL_CFIO default. This affects only the name in the file. It does not do the conversion. See vscale

vscale Optional Scaling to convert VVARS units to VUNIT units. Default: no conversion.

regrid_exch Name of the exchange grid that can be used for interpolation between two LogRect grids or from a tile grid to a LogRect grid. Default: no exchange grid interpolation. irregular grid.

regrid_name Name of the Log-Rect grid to interpolate to when going from a tile to Field to a gridde output. regrid_exch must be set, otherwise it is ignored.

conservative Set to a non-zero integer to turn on conservative regridding when going from a native cube-sphere grid to lat-lon output. Default: 0

deflate Set deflate level (0-9) of NETCDF output when format is CFIO or CFIOasync. Default: 0

subset Optional subset (lonMin lonMax latMin latMax) for the output when performing non-conservative cube-sphere to lat-lon regridding of the output.

chunksize Optional user specified chunking of NETCDF output when format is CFIO or CFIOasync, (Lon chunksize, Lat chunksize, Lev chunksize, Time chunksize)

Debugging

Enable Maximum Print Output

Besides compiling with "make compile_debug", there are a few run settings you can configure to boost your chance of successful debugging. All of them involve sending additional print statements to the log files.

  1. Change "ND70" in input.geos from 0 to 1 to turn on extra GEOS-Chem print statements in the main log file.
  2. Set the "MAPL_DEBUG_LEVEL" variable in runConfig.sh to a number greater than 0 to turn on extra MAPL print statements in MAPL ExtData. This is useful if you are having a problem reading input files. The higher the number the more prints will be sent to the log (and the slower your run will be). Usually 20 is sufficient, although you can go higher. Please be sure to remember to set MAPL_DEBUG back to 0 when you are done so as not to severely slow down your runs!
  3. Set the "Verbose" and "Warnings" settings in HEMCO_Config.rc to maximum values of 3 to send the maximum number of prints to HEMCO.log.
  4. Set the "MEMORY_DEBUG_LEVEL" option, new in 12.5.0, to 1 to turn on additional memory usage prints per timestep.
#------------------------------------------------
#    Debug Options
#------------------------------------------------
# Set MAPL debug flag to 0 for no extra MAPL debug log output, or 1 to
# print information to log. Using this flag is most helpful for debugging
# issues with file read (MAPL ExtData).
#
# Set memory debug flag to 0 to print memory only once per timestep. Set to
# 1 to enable memory prints at additional locations throughout the run.
#
# For GEOS-Chem debug prints, turn on ND70 in input.geos manually.       
#
# WARNING: Turning on debug prints significantly slows down the model!
#
MAPL_DEBUG_LEVEL=0
MEMORY_DEBUG_LEVEL=0

None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.

Turn On/Off MAPL Timers and Memory Logging

Your GCHP log file will include timing and memory information by default, and this is usually a good thing. If for some reason you want to turn these features off you can do so in file CAP.rc. Search for "MAPL_ENABLE_TIMERS" and "MAPL_ENABLE_MEMUTILS" and simply change "YES" to "NO". Remember to turn them back on again if you later need to to debug.


Previous | Next | Getting Started With GCHP | GCHP Main Page