Difference between revisions of "Running GCHP: Configuration"

From Geos-chem
Jump to: navigation, search
Line 17: Line 17:
 
== Overview ==
 
== Overview ==
  
The GCHP run directory is set up by default so that you can do a simple 1-hr standard simulation at c24 resolution using 2x25 input meteorology, 6 cores, and the MVAPICH2 implementation of MPI. Output is configured to include NetCDF files for the internal c24 resolution as well c24 regridded to 4x5. Configuring GCHP to run using different settings is a matter of editing the GCHP configuration files described in the previous section of this user guide. This page provides several examples on how to do this, including how to:
+
All default GCHP run directories are set up to run at c24 resolution with 0.25x0.325 GEOS-FP meteorology, 6 cores, and 1 node. This is the simplest possible run and a good test case for your initial setup. However, you will want to change these settings, and potentially several others, for your research runs. This page goes over how to do this.
  
* change meteorology lat/lon grid resolution
+
== Set Number of Nodes and Cores ==
* change internal cubed sphere grid resolution
+
* increase the number of cores on one node
+
* run GCHP on multiple nodes
+
* use a non-default restart file
+
* update list of species
+
  
Please make sure you review and understand all topics in the earlier [[GCHP_Basic_Example_Run|GCHP Basic Example Run]] page prior to proceeding. Run execution details such as the pre-run checklist and run directory tools to execute GCHP are not covered on this page. Since you will be editing configuration files, you may find it useful to put your run directory under version control or keep a clean copy of a run directory available for easy reference of default values.
+
To change the number of nodes and cores for your run you must update settings in two places: (1) runConfig.sh, and (2) your run script.
  
Some of the cases on this page are deliberately light on setup information in order for us to see how easy or difficult users find modifying the GCHP run and code directories. If you succeed in running any of these cases (or, more importantly, if you find that you can’t), please e-mail [[GEOS-Chem_HP#GCHP_Development_Team|contact us]] with details. The more detail the better, but please include at least the following:
+
== Set Cubed Sphere Grid Resolution ==
#A description of what you are trying to do
+
#All of your configuration (<tt>*.rc</tt>) and log (<tt>*.log</tt>) files
+
#What system you are running on
+
#Your <tt>.bashrc</tt> file
+
#A snapshot of your currently loaded libraries
+
  
Documentation of advanced GCHP usage is a work in progress. Please be patient with us as out documentation catches up with GCHP code development! We greatly encourage you to help us by signing up for your own GEOS-Chem wiki account and contributing your findings to the [[GCHP_Encountered_Issues|GCHP Encountered Issues]] page.
+
Changing your grid resolution involves simply changing the "CS_RES" parameter in <tt>runConfig.sh</tt>.
  
== Case 1: Using multiple nodes ==
+
== Change Input Meteorology Grid Resolution and/or Source ==
  
Run a simulation using C24 resolution and increase the number of CPUs to 12 spread evenly across 2 nodes (e.g. <tt>C = 12, M = 2</tt>). Note that GCHP requires that cores are always distributed evenly across nodes.  
+
Changing input meteorology requires two updates: (1) redefine the <tt>MetDir</tt> symbolic link to point to the appropriate source directory, and (2) update all meteorology paths and filenames in <tt>ExtData.rc</tt>. Currently only GEOS-FP and MERRA2 meteorology are supported in GCHP. Be sure that you have data available at the grid resolution you wish to run at for time period you plan on simulating. See the primary GEOS-Chem wiki page for information on meteorology data available.
  
You will need to make the following updates (<span style="color:green">'''highlighted in GREEN'''</span>) to the core layout:
+
== Change Your Initial Restart File ==
  
{| border=1 cellspacing=0 cellpadding=5
+
All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in <tt>runConfig.sh</tt>. All of the restart files are simply GEOS-Chem Classic restart files regridded to the cubed sphere.
|-bgcolor="#CCCCCC" valign="top"
+
!width="150px"|File
+
!width="400px"|Changes for core layout CxM
+
  
|-valign="top"
+
Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables unless "HEMCO" appears in the filename. This is only the case for the benchmark restart files used for the 1-year benchmark simulation that relies on a valid spin-up.
|<tt>GCHP.rc</tt>
+
|<tt>NX = <span style="color:green">'''M'''</span><br>NY = <span style="color:green">'''C/M'''</span></tt>
+
  
|-valign="top"
+
You may over-write the default restart file with your own by specifying the restart filename in <tt>runConfig.sh</tt>. Beware that it is your responsibility to make sure it is the proper grid resolution. Publicly available tools for regridding are listed in the GCHP Output Files page of this user manual.
|<tt>HISTORY.rc</tt>
+
|<tt>CoresPerNode = <span style="color:green">'''C/M'''</span></tt>
+
  
|}
+
== Output Restart Files at Regular Frequency ==
  
For the specific case of C24 using 12 CPUs on 2 nodes, you should have:
+
While most of the GCHP run-time options are set from <tt>runConfig.sh</tt>, the option for outputting restart files beyond the usual end-of-run restart file is not. This is simply because the default setting of every 30 days is usually adequate. To change this frequency, update the HHmmSS string for "RECORD_FREQUENCY" in file <tt>GCHP.rc</tt>. Minutes and seconds must each be two digits but hours can be more than two.
  
{| border=1 cellspacing=0 cellpadding=5
+
== Turn On/Off Model Components ==
|-bgcolor="#CCCCCC"
+
!width="150px"|File
+
!width="400px"|Changes for core layout CxM
+
  
|-valign="top"
+
You can turn all primary GEOS-Chem components, including type of PBL mixing, from within <tt>runConfig.sh</tt>. The settings in that file will update <tt>input.geos</tt> automatically.
|<tt>GCHP.rc</tt>
+
|<tt>NX = 2<br>NY = 6</tt>
+
  
|-valign="top"
+
== Change Model Timesteps ==
|<tt>HISTORY.rc</tt>
+
|<tt>CoresPerNode = 6</tt>
+
  
|}
+
Model timesteps, both chemistry and dynamic, are configured within <tt>runConfig.sh</tt>. They are set to match GEOS-Chem Classic default values for comparison purposes but can be updated, with caution. Read the documentation in <tt>runConfig.sh</tt> for setting them to be fully aware of recommended settings and their implications.
  
Once you have made these changes, then you can proceed to run the code.  We have included instructions for both SLURM and GridEngine.
+
== Set Simulation Start and End Dates ==
  
'''1. Running GCHP in an Odyssey interactive session (using SLURM with MVAPICH2 MPI)'''
+
Set simulation start and end in <tt>runConfig.sh</tt>. There is also a "DURATION" field in the file which must be set to reflect how long your run will last. If your end date is earlier than your start date plus duration then your GCHP run will fail. If your end date is later than your start date plus duration then your job will not make it to your configured end date; it will end at start date plus duration. If your end date is multiple durations past your start date then subsequent job submissions will start where your last run ended, so long as you do not delete file <tt>cap_restart</tt>. That file contains a new start string that will always be used if the file is present. You can take advantage of this file for splitting up a long simulation into multiple jobs. See further down on this page for automation of this task built into the run directory.
  
You can run GCHP from within an interactive SLURM session by typing the following command:
+
Typically a "CAP" error indicates a problem with start, end, and duration settings. If you encounter an error with the words "CAP" near it then double-check that these settings make sense.
  
srun -n 12 -N 2 --mpi=pmi2 ./geos 2>&1 | tee GCHP.log
+
== Turn On/Off Diagnostics ==
  
<span style="color:red">'''IMPORTANT! You must specify the total number of CPUs (across all nodes) with <tt>-n</tt>.  So the proper setting is <tt>-n 12</tt> (= 6 CPUs/node * 2 nodes). An easy mistake is to specify <tt>-n 6</tt> (i.e. the number of CPUs/node), but doing so will cause the run to terminate with an error.'''</span>
+
All GCHP have four collections on by default: time-averaged species concentrations, instantaneous species concentrations, time-averaged meteorology, and instantaneous meteorology. All species are enabled while only a subset of meteorology variables are enabled. There are several other collections already implemented but they are off by default for the standard and benchmark simulations, and on by default for the RnPbBe simulation.
  
'''2. Running GCHP in an Odyssey batch session (using SLURM with MVAPICH2 MPI)'''
+
To turn collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file <tt>HISTORY.rc</tt>. Once a collection is turned on, you can comment diagnostics within it further down in the file by searching for the collection name with ".fields" suffix. Be aware that you cannot comment out the diagnostic that appears on the same line as the fields keyword. If you wish to suppress that specific diagnostic then move it to the next line and replace it with a diagnostic that you want to output.
+
You can also use the <tt>GCHP_slurm.run</tt> script to run a GCHP job in a computational queue. First make sure that your <tt>GCHP_slurm.run</tt> script contains the following lines of code:
+
  
#SBATCH -n 12
+
== Set Diagnostic Frequency, Duration, and Mode ==
#SBATCH -N 2
+
+
... further down in the script ...
+
 
+
# Run GCHP. Make sure the # of cpus match those above!!
+
time -p srun -n $SLURM_NTASKS -N $SLURM_NNODES --mpi=pmi2 ./geos >> $log
+
  
And then type:
+
All diagnostic collections that come with the run directory have frequency, duration, and mode defined within <tt>runConfig.sh</tt>. With the exception of SpeciesConc_inst and StateMet_inst, all collections are time-averaged (mode) with frequency and duration set to the simulation length you specified in <tt>CopyRunDirs.input</tt> when creating the run directory. Any of these defaults can be over-written by editing <tt>runConfig.sh</tt>. Be aware that manual updates of <tt>HISTORY.rc</tt> will be over-written by <tt>runConfig.sh</tt> settings.
  
sbatch GCHP_slurm.run
+
== Add a New Diagnostics Collection ==
  
SLURM will set the environment variable <tt>$SLURM_NTASKS</tt> to the same value specified with the <tt>#SBATCH -n</tt> (in this case, 12). SLURM will also set <tt>$SLURM_NNODES</tt> to the value specified with <tt>#SBATCH -N</tt> (in this case, 2).  This will allow you to only set the total number of CPUs and number of nodes in one place (at the top of the script, in the <tt>#SBATCH</tt> section), and to have those values propagate down to the <tt>srun</tt> command.
+
Adding a new diagnostics collection in GCHP is the same as for GEOS-Chem Classic netcdf diagnostics. You must add your collection to the collection list in <tt>HISTORY.rc</tt> and then define it further down in the file. Any 2D or 3D arrays that are stored within State_Met, State_Chm, or State_Diag, and that are successfully incorporated into the GEOS-Chem Registry may be included as fields in a collection. State_Met variables must be preceded by "met_", State_Chm variables must be preceded by "chm_", and State_Diag variables should not have a prefix. See <tt>GeosCore/state_diag_mod.F90</tt> for examples of how existing State_Diag arrays are implemented.
  
'''3. Running GCHP in a Glooscap interactive session (using GridEngine with OpenMPI)'''
+
Once implemented, you can either incorporate the new collection settings into <tt>runConfig.sh</tt> for auto-update, or you can manually configure all settings in <tt>HISTORY.rc</tt>.
  
Use <tt>mpirun</tt> to submit the job as normal:
+
== Split a Simulation Into Multiple Jobs ==
  
mpirun -n 12 ./geos 2>&1 | tee GCHP.log
+
There is an option to split up a single simulation into separate serial jobs. To use this option, do the following:
  
'''4. Running GCHP in a Glooscap batch session (using GridEngine with OpenMPI)'''
+
#Update <tt>runConfig.sh</tt> with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted. Carefully read these parts of <tt>runConfig.sh</tt> to ensure you understand how it works.
 +
#Use <tt>gchp.multirun.run</tt> as your run script, or adapt it if your cluster does not use SLURM. As with the regular <tt>gchp.run</tt>, you will need to update the file with compute resources consistent with <tt>runConfig.sh</tt> and with your local bashrc. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory. Note that you should not submit the run script directly. It will be done automatically by the file described in the next step.
 +
#Use <tt>gchp.multirun.sh</tt> to submit your job, or adapt it if your cluster does not use SLURM. It is located in the <tt>runScriptSamples</tt> subdirectory of your run directory.
  
Make sure your <tt>GCHP_gridengine.run</tt> script contains these lines of code:
+
There is much documentation in the headers of both <tt>gchp.multirun.run</tt> and <tt>gchp.multirun.sh</tt> that is worth reading and getting familiar with. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3 hour simulation with 1 hour duration and number of runs equal to 3.
  
#$ -pe ompi* 12
+
Besides the regular GCHP log file, which will be appended to for each consecutive run, there will be a "multirun.log" file with high-level information such as the start, end, duration, and job ids for all jobs submitted. Inspect this and your other log files, as well as output in the <tt>OutputDir/</tt> directory prior to using the monthly diagnostics option.
 
+
... further down in the script ...
+
+
mpirun -n 12 ./geos 2>&1 | tee GCHP.log
+
  
Here, <tt>ompi*</tt> indicates the queue that is set up specifically for runs using OpenMP
+
== Generate Monthly Mean Diagnostics ==
  
Then type:
+
There is an option to automatically generate monthly diagnostics by submitting month-long simulations as separate jobs. Splitting up the simulation into separate jobs is a requirement for monthly diagnostics because MAPL History requires a fixed number of hours set for diagnostic frequency and file duration. The monthly mean diagnostic option automatically updates <tt>HISTORY.rc</rr> diagnostic settings each month to reflect the number of days in that month taking into account leap years.
  
qsub GCHP_gridengine.run
+
To use the monthly diagnostics option, first read and follow instructions for splitting a simulation into multiple jobs (see section above). Prior to submitting your run, enable monthly diagnostics in <tt>gchp.multirun.run</tt> by searching for variable "Monthly_Diag" and changing its value from 0 to 1. Be sure to read the documentation surrounding the monthly diagnostic option in that file to be sure you understand what you are doing and are meeting all the requirements.
  
== Case 2: Changing internal resolution ==
+
== Debug Using Maximum Print Output ==
  
<span style="color:red">'''''NOTE: To do this example, you will need to obtain a GCHP restart file on the C48 grid.'''''</span>
+
Besides compiling with "make compile_debug", there are a few run settings you can configure to booster your chance of successful debugging. All of them involve sending additional print statements to the log files.
 +
#You can change "ND70" in input.geos from 0 to 1 to turn on extra GEOS-Chem print statements in the main log file.
 +
#You can set the "DEBUG" variable in <tt>runConfig.sh</tt> to a number greater than 0 to turn on extra MAPL print statements. The higher the number the more prints will be sent to the log (and the slower your run will be). Usually 20 is sufficient, although you can go higher.
 +
#You can set the "Verbose" and "Warnings" settings in <tt>HEMCO_Config.rc</tt> to maximum values of 3 to send the maximum number of prints to <tt>HEMCO.log</tt>.
  
Run a simulation using C48 resolution (cube face side length <tt>N = 48</tt> grid cells, ~2&deg; x 2.5&deg;). Use of the same number CPUs as in the previous example on two nodes (e.g. <tt>C = 12, M = 2</tt>).  
+
None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.
  
You will need to make the following updates (<span style="color:green">'''highlighted in GREEN'''</span>) to change of resolution and number of CPUs:
+
== Change Domains Stack Size ==
  
{| border=1 cellspacing=0 cellpadding=5
+
For runs at very high resolution or small number of processors you may run into a domains stack size error. This is caused by exceeding the domains stack size memory limit set at run-time and will be apparent in your log file. If this occurs you can increase the domains stack size in file <tt>input.nml</tt>. The default is set to 20000000.
|-bgcolor="#CCCCCC" valign="top"
+
!width="150px"|File
+
!width="400px"|Changes for grid resolution CN
+
  
|-valign="top"
+
== Turn Off MAPL Timers and Memory Logging ==
|<tt>GCHP.rc</tt>
+
|<tt>IM = <span style="color:green">'''N'''</span><br>JM = <span style="color:green">'''6N'''</span><br>GRIDNAME = PE<span style="color:green">'''N'''</span>x<span style="color:green">'''6N'''</span>-CF</tt>
+
  
|-valign="top"
+
Your GCHP log file will includes timing and memory information by default, and this is usually a good thing. If for some reason you want to turn these features off you can do so in file <tt>CAP.rc</tt>. Search for "MAPL_ENABLE_TIMERS" and "MAPL_ENABLE_MEMUTILS" and simply change "YES" to "NO".
|<tt>fvcore_layout.rc</tt>
+
|<tt>npx = <span style="color:green">'''N'''</span><br>npy = <span style="color:green">'''N'''</span></tt>
+
  
|}
+
== Feedback Welcome! ==
  
For the specific case of C48 using 12 CPUs on 2 node, you should have:
+
If there is something you want to do that is not readily explained on this page then we want to hear from you. Please contact the GCST or direct message Lizzie Lundgren on the [[GEOS-Chem_High_Performance_Working_Group#Slack|GCHP Slack workspace]] with your feedback.  
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="150px"|File
+
!width="400px"|Changes for grid resolution CN
+
 
+
|-valign="top"
+
|<tt>GCHP.rc</tt>
+
|<tt>IM = 48<br>JM = 288<br>GRIDNAME = PE48x288-CF</tt>
+
 
+
|-valign="top"
+
|<tt>fvcore_layout.rc</tt>
+
|<tt>npx = 48<br>npy = 48 </tt>
+
 
+
|}
+
 
+
A note regarding <tt>NX</tt> and <tt>NY</tt>: <tt>NX</tt> and <tt>NY</tt> specify the domain decomposition; that is, how the surface of the cubed sphere will be split up between the cores. <tt>NX</tt> corresponds to the number of processors to use per <tt>N</tt> cells in the <tt>X</tt> direction, where <tt>N</tt> is the cube side length. <tt>NY</tt> corresponds to the number of processors per <tt>N</tt> cells in the <tt>Y</tt> direction, but must also include an additional factor of 6, corresponding to the number of cube faces. Therefore any multiple of 6 is a valid value for <tt>NY</tt>, and the only other rigid constraint is that <tt>(NX*NY) = NP</tt>, where <tt>NP</tt> is the total number of processors assigned to the job. However, if possible, specifying <tt>NX = NY/6</tt> will provide an optimal distribution of cores as it minimizes the amount of communication required. The number of cores requested should therefore ideally be <tt>6*C*C</tt>, where <tt>C</tt> is an integer factor of <tt>N</tt>. For example, <tt>C=4</tt> would give:
+
 
+
* <tt>NX = C = 4</tt>
+
* <tt>NY = 6*C = 24</tt>
+
* <tt>NP = 6*C*C = 64</tt>
+
 
+
This layout would be valid for any simulation where <tt>N</tt> is a multiple of 4. The absolute minimum case, <tt>C=1</tt>, provides <tt>NX=1</tt>, <tt>NY=6</tt> and <tt>NP=6</tt>.
+
 
+
You can follow the same procedure outlined in [[#Case 1: Using multiple nodes|Case 1 from above]] to submit the job, summarized here below:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-valign="top"
+
!width="150px" bgcolor="#CCCCCC"|Type of session
+
!width="400px" bgcolor="#CCCCCC"|Follow this procedure
+
 
+
|-valign="top"
+
|SLURM interactive
+
|Type
+
<nowiki>srun -n 12 -N 2 ./geos 2>&1 | tee GCHP.log</nowiki>
+
 
+
|-valign="top"
+
|SLURM batch
+
|Make sure that <tt>GCHP_slurm.run</tt> contains
+
 
+
#SBATCH -n 12
+
#SBATCH -N 2
+
 
+
Then type
+
 
+
sbatch GCHP_slurm.run
+
 
+
|-valign="top"
+
|Grid Engine interactive
+
|Type
+
<nowiki>mpirun -n 12 ./geos 2>&1 | tee GCHP.log</nowiki>
+
 
+
|-valign="top"
+
|Grid Engine batch
+
|Make sure that <tt>GCHP_gridengine.run</tt> contains
+
 
+
#$ -pe ompi* 12
+
 
+
Then type
+
 
+
qsub GCHP_gridengine.run
+
 
+
|}
+
 
+
== Case 3: Changing your restart file ==
+
 
+
Run GCHP once for at least ten days in any chemically-active configuration, generate a restart file, and run GCHP again from that restart file.
+
 
+
Copy a new restart file to your run directory and change the restart and checkpoint file entries in <tt>GCHP.rc</tt>. For example:
+
 
+
GIGCchem_INTERNAL_RESTART_FILE: +gcchem_internal_checkpoint_c24.nc
+
GIGCchem_INTERNAL_CHECKPOINT_FILE: gcchem_internal_checkpoint_c24.nc
+
 
+
The <tt>+</tt> means that any missing values will be ignored rather than causing the simulation to fail. Note that the restart file has no date or time markers and will be overwritten at the end of the run, so make sure to back your restart files up if you wish to reuse them!
+
 
+
'''NOTE: You do not need to change the name of the restart file in <tt>input.geos</tt>. That entry is ignored and the settings in <tt>GCHP.rc</tt> are used instead.'''
+
 
+
If you would like try running GCHP at a different simulation you will need to change your restart file to the proper resolution. Seb Eastham (Harvard) created a Fortran tool to regrid restart files to cubed-sphere. See https://bitbucket.org/sdeastham/csregridtool for the source code and contact him with questions.
+
 
+
== Case 4: Updating species list ==
+
For this exercise, we will add a new, passive tracer to GCHP. We will also generate a restart file for this species using pre-existing data.
+
To do this, you will need to:
+
 
+
1. Make sure NCO is available on your system. To load on Odyssey, type
+
        <tt>module load nco</tt>
+
2. Remove all tracers in <tt>input.geos</tt> and replace them with one tracer called PASV. Set the listed number of species to 1
+
 
+
3. Using NCO, create a restart file for species PASV (called 'SPC_PASV'). We can 'spoof' this with the following steps
+
 
+
First, create a netcdf file with one tracer in it, preferably copied from another file. Here, we will use the existing restart file. Type
+
        <tt>ncks -v SPC_O3 initial_GEOSChem_rst.c24_standard.nc Dummy.nc</tt>
+
Then, rename the variable from <tt>SPC_O3</tt> to <tt>SPC_PASV</tt>
+
        <tt>ncrename -v SPC_O3,SPC_PASV Dummy.nc</tt>
+
 
+
4. Disable chemistry, deposition and emissions in <tt>input.geos</tt>
+
 
+
5. Remove all tracer outputs in <tt>HISTORY.rc</tt> and replace them with one output giving <tt>SPC_PASV</tt>
+
 
+
== Case 5: Running a high-resolution simulation ==
+
 
+
[[#Case 3: Running a coarse simulation with high-resolution input met data|Case 3 above]] showed us how to run a coarse simulation with high-resolution input met data. We will build off of that by using the high-resolution input met data and running at C180 resolution (<tt>N = 180</tt>, ~0.5&deg; x 0.625&deg;). In this example, we will use 32 CPUs on 2 nodes. We will also change the timestep to <tt>Δt = 300</tt> here. See the [[Getting_Started_With_GCHP#Frequently_Asked_Questions|GCHP FAQ]] for a table of cubed-sphere resolutions and recommended timestep settings.
+
 
+
'''''NOTE: Cubed-sphere meteorological data products are not yet available from GMAO, but are expected in the future.'''''
+
 
+
You will need to make the following updates (<span style="color:green">'''highlighted in GREEN'''</span>) to the resolution and timestep.
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC" valign="top"
+
!width="150px"|File
+
!width="250px"|Changes for grid resolution CN
+
!width="300px"|Changes for timestep Δt
+
!width="250px"|Changes for core layout CxM
+
 
+
|-valign="top"
+
|<tt>GCHP.rc</tt>
+
|<tt>IM = <span style="color:green">'''N'''</span><br>JM = <span style="color:green">'''6N'''</span><br>GRIDNAME = PE<span style="color:green">'''N'''</span>x<span style="color:green">'''6N'''</span>-CF</tt>
+
|<tt>HEARTBEAT_DT = <span style="color:green">'''&Delta;t'''</span><br>GIGCchem_DT=<span style="color:green">'''2*&Delta;t'''</span><br>*_DT = <span style="color:green">'''&Delta;t'''</span></tt>
+
|<tt>NX = <span style="color:green">'''M'''</span><br>NY = <span style="color:green">'''C/M'''</span></tt>
+
 
+
|-valign="top"
+
|<tt>CAP.rc</tt>
+
|
+
|<tt>HEARTBEAT_DT = <span style="color:green">'''&Delta;t'''</span></tt>
+
|
+
 
+
|-valign="top"
+
|<tt>fvcore_layout.rc</tt>
+
|<tt>npx = <span style="color:green">'''N'''</span><br>npy = <span style="color:green">'''N'''</span></tt>
+
|<tt>dt = <span style="color:green">'''&Delta;t'''</span></tt>
+
|
+
 
+
|-valign="top"
+
|<tt>HISTORY.rc</tt>
+
|
+
|
+
|<tt>CoresPerNode = <span style="color:green">'''C/M'''</span></tt>
+
 
+
|-valign="top"
+
|<tt>input.geos</tt>
+
|
+
|<tt>Transport Timestep [min] = <span style="color:green">'''&Delta;t/60'''</span><br>Convect Timestep [min] = <span style="color:green">'''&Delta;t/60'''</span><br>Chemistry Timestep [min] = <span style="color:green">'''2*&Delta;t/60'''</span><br>Emiss Timestep [min] = <span style="color:green">'''2*&Delta;t/60'''</span></tt>
+
|
+
 
+
|}
+
 
+
For the specific case of C180 with a timestep of 300 s using 32 CPUs on 2 nodes, you should have:
+
 
+
{| border=1 cellspacing=0 cellpadding=5
+
|-bgcolor="#CCCCCC"
+
!width="150px"|File
+
!width="250px"|Changes for grid resolution CN
+
!width="300px"|Changes for timestep Δt
+
!width="250px"|Changes for core layout CxM
+
 
+
|-valign="top"
+
|<tt>GCHP.rc</tt>
+
|<tt>IM = 180<br>JM = 1080<br>GRIDNAME = PE180x1080-CF</tt>
+
|<tt>HEARTBEAT_DT = 300<br>GIGCchem_DT=600<br>*_DT = 300</tt>
+
|<tt>NX = 2<br>NY = 16</tt>
+
 
+
|-valign="top"
+
|<tt>CAP.rc</tt>
+
|
+
|<tt>HEARTBEAT_DT = 300</tt>
+
|
+
 
+
|-valign="top"
+
|<tt>fvcore_layout.rc</tt>
+
|<tt>npx = 180<br>npy = 180</tt>
+
|<tt>dt = 300</tt>
+
|
+
 
+
|-valign="top"
+
|<tt>HISTORY.rc</tt>
+
|
+
|
+
|<tt>CoresPerNode = 16</tt>
+
 
+
|-valign="top"
+
|<tt>input.geos</tt>
+
|
+
|<tt>Transport Timestep [min] = 5<br>Convect Timestep [min] = 5<br>Chemistry Timestep [min] = 10<br>Emiss Timestep [min] = 10</tt>
+
|
+
 
+
|}
+
 
+
You must also change the "forecast time" of the variables SPHU2, TMPU2 and PS2 in ExtData.rc. The forecast time is the number (formatted as HHMMSS) specified immediately after the semicolon in each of the relevant ExtData.rc entries. For a timestep of 5 minutes, this would mean changing the number after the semicolon to be 000500 (0 hours, 5 minutes, 0 seconds). Not doing this will result in the simulation not conserving mass during transport!
+
 
+
Submit the job a batch script with the following settings:
+
 
+
#SBATCH -n 32
+
#SBATCH -N 2
+
 
+
... further down in the script ...
+
+
# Run GCHP. Make sure the # of cpus match those above!!
+
time -p srun -n $SLURM_NTASKS -N $SLURM_NNODES --mpi=pmi2 ./geos >> $log
+
  
 
----------------------------------------
 
----------------------------------------
  
 
'''''[[GCHP_Run_Configuration_Files|Previous]] | [[GEOS-Chem_HP_Output_Data| Next]] | [[GEOS-Chem_HP|GCHP Home]]'''''
 
'''''[[GCHP_Run_Configuration_Files|Previous]] | [[GEOS-Chem_HP_Output_Data| Next]] | [[GEOS-Chem_HP|GCHP Home]]'''''

Revision as of 22:45, 7 August 2018

Previous | Next | Getting Started with GCHP

  1. Hardware and Software Requirements
  2. Downloading Source Code
  3. Obtaining a Run Directory
  4. Setting Up the GCHP Environment
  5. Compiling
  6. Basic Example Run
  7. Run Configuration Files
  8. Advanced Run Examples
  9. Output Data
  10. Developing GCHP


Please note that this page is under construction as the run set up process has been greatly simplified between versions v11-02b and v11-02c. Please refer to the Tutorial slides for v11-02c HP to see how to configure a GCHP run using the runConfig.sh bash script in v11-02c. Contact the GEOS-Chem Support Team with questions about advanced run setup.

Overview

All default GCHP run directories are set up to run at c24 resolution with 0.25x0.325 GEOS-FP meteorology, 6 cores, and 1 node. This is the simplest possible run and a good test case for your initial setup. However, you will want to change these settings, and potentially several others, for your research runs. This page goes over how to do this.

Set Number of Nodes and Cores

To change the number of nodes and cores for your run you must update settings in two places: (1) runConfig.sh, and (2) your run script.

Set Cubed Sphere Grid Resolution

Changing your grid resolution involves simply changing the "CS_RES" parameter in runConfig.sh.

Change Input Meteorology Grid Resolution and/or Source

Changing input meteorology requires two updates: (1) redefine the MetDir symbolic link to point to the appropriate source directory, and (2) update all meteorology paths and filenames in ExtData.rc. Currently only GEOS-FP and MERRA2 meteorology are supported in GCHP. Be sure that you have data available at the grid resolution you wish to run at for time period you plan on simulating. See the primary GEOS-Chem wiki page for information on meteorology data available.

Change Your Initial Restart File

All GCHP run directories come with symbolic links to initial restart files for commonly used cubed sphere resolutions. The appropriate restart file is automatically chosen based on the cubed sphere resolution you set in runConfig.sh. All of the restart files are simply GEOS-Chem Classic restart files regridded to the cubed sphere.

Unlike GEOS-Chem Classic, HEMCO restart files are not used in GCHP. HEMCO restart variables may be included in the initial species restart file, or they may be excluded and HEMCO will start with default values. GCHP initial restart files that come with the run directories do not include HEMCO restart variables unless "HEMCO" appears in the filename. This is only the case for the benchmark restart files used for the 1-year benchmark simulation that relies on a valid spin-up.

You may over-write the default restart file with your own by specifying the restart filename in runConfig.sh. Beware that it is your responsibility to make sure it is the proper grid resolution. Publicly available tools for regridding are listed in the GCHP Output Files page of this user manual.

Output Restart Files at Regular Frequency

While most of the GCHP run-time options are set from runConfig.sh, the option for outputting restart files beyond the usual end-of-run restart file is not. This is simply because the default setting of every 30 days is usually adequate. To change this frequency, update the HHmmSS string for "RECORD_FREQUENCY" in file GCHP.rc. Minutes and seconds must each be two digits but hours can be more than two.

Turn On/Off Model Components

You can turn all primary GEOS-Chem components, including type of PBL mixing, from within runConfig.sh. The settings in that file will update input.geos automatically.

Change Model Timesteps

Model timesteps, both chemistry and dynamic, are configured within runConfig.sh. They are set to match GEOS-Chem Classic default values for comparison purposes but can be updated, with caution. Read the documentation in runConfig.sh for setting them to be fully aware of recommended settings and their implications.

Set Simulation Start and End Dates

Set simulation start and end in runConfig.sh. There is also a "DURATION" field in the file which must be set to reflect how long your run will last. If your end date is earlier than your start date plus duration then your GCHP run will fail. If your end date is later than your start date plus duration then your job will not make it to your configured end date; it will end at start date plus duration. If your end date is multiple durations past your start date then subsequent job submissions will start where your last run ended, so long as you do not delete file cap_restart. That file contains a new start string that will always be used if the file is present. You can take advantage of this file for splitting up a long simulation into multiple jobs. See further down on this page for automation of this task built into the run directory.

Typically a "CAP" error indicates a problem with start, end, and duration settings. If you encounter an error with the words "CAP" near it then double-check that these settings make sense.

Turn On/Off Diagnostics

All GCHP have four collections on by default: time-averaged species concentrations, instantaneous species concentrations, time-averaged meteorology, and instantaneous meteorology. All species are enabled while only a subset of meteorology variables are enabled. There are several other collections already implemented but they are off by default for the standard and benchmark simulations, and on by default for the RnPbBe simulation.

To turn collections on or off, comment ("#") collection names in the "COLLECTIONS" list at the top of file HISTORY.rc. Once a collection is turned on, you can comment diagnostics within it further down in the file by searching for the collection name with ".fields" suffix. Be aware that you cannot comment out the diagnostic that appears on the same line as the fields keyword. If you wish to suppress that specific diagnostic then move it to the next line and replace it with a diagnostic that you want to output.

Set Diagnostic Frequency, Duration, and Mode

All diagnostic collections that come with the run directory have frequency, duration, and mode defined within runConfig.sh. With the exception of SpeciesConc_inst and StateMet_inst, all collections are time-averaged (mode) with frequency and duration set to the simulation length you specified in CopyRunDirs.input when creating the run directory. Any of these defaults can be over-written by editing runConfig.sh. Be aware that manual updates of HISTORY.rc will be over-written by runConfig.sh settings.

Add a New Diagnostics Collection

Adding a new diagnostics collection in GCHP is the same as for GEOS-Chem Classic netcdf diagnostics. You must add your collection to the collection list in HISTORY.rc and then define it further down in the file. Any 2D or 3D arrays that are stored within State_Met, State_Chm, or State_Diag, and that are successfully incorporated into the GEOS-Chem Registry may be included as fields in a collection. State_Met variables must be preceded by "met_", State_Chm variables must be preceded by "chm_", and State_Diag variables should not have a prefix. See GeosCore/state_diag_mod.F90 for examples of how existing State_Diag arrays are implemented.

Once implemented, you can either incorporate the new collection settings into runConfig.sh for auto-update, or you can manually configure all settings in HISTORY.rc.

Split a Simulation Into Multiple Jobs

There is an option to split up a single simulation into separate serial jobs. To use this option, do the following:

  1. Update runConfig.sh with your full simulation (all runs) start and end dates, and the duration per segment (single run). Also update the number of runs options to reflect to total number of jobs that will be submitted. Carefully read these parts of runConfig.sh to ensure you understand how it works.
  2. Use gchp.multirun.run as your run script, or adapt it if your cluster does not use SLURM. As with the regular gchp.run, you will need to update the file with compute resources consistent with runConfig.sh and with your local bashrc. It is located in the runScriptSamples subdirectory of your run directory. Note that you should not submit the run script directly. It will be done automatically by the file described in the next step.
  3. Use gchp.multirun.sh to submit your job, or adapt it if your cluster does not use SLURM. It is located in the runScriptSamples subdirectory of your run directory.

There is much documentation in the headers of both gchp.multirun.run and gchp.multirun.sh that is worth reading and getting familiar with. If you have not done so already, it is worth trying out a simple multi-segmented run of short duration to demonstrate that the multi-segmented run configuration and scripts work on your system. For example, you could do a 3 hour simulation with 1 hour duration and number of runs equal to 3.

Besides the regular GCHP log file, which will be appended to for each consecutive run, there will be a "multirun.log" file with high-level information such as the start, end, duration, and job ids for all jobs submitted. Inspect this and your other log files, as well as output in the OutputDir/ directory prior to using the monthly diagnostics option.

Generate Monthly Mean Diagnostics

There is an option to automatically generate monthly diagnostics by submitting month-long simulations as separate jobs. Splitting up the simulation into separate jobs is a requirement for monthly diagnostics because MAPL History requires a fixed number of hours set for diagnostic frequency and file duration. The monthly mean diagnostic option automatically updates HISTORY.rc</rr> diagnostic settings each month to reflect the number of days in that month taking into account leap years.

To use the monthly diagnostics option, first read and follow instructions for splitting a simulation into multiple jobs (see section above). Prior to submitting your run, enable monthly diagnostics in <tt>gchp.multirun.run by searching for variable "Monthly_Diag" and changing its value from 0 to 1. Be sure to read the documentation surrounding the monthly diagnostic option in that file to be sure you understand what you are doing and are meeting all the requirements.

Debug Using Maximum Print Output

Besides compiling with "make compile_debug", there are a few run settings you can configure to booster your chance of successful debugging. All of them involve sending additional print statements to the log files.

  1. You can change "ND70" in input.geos from 0 to 1 to turn on extra GEOS-Chem print statements in the main log file.
  2. You can set the "DEBUG" variable in runConfig.sh to a number greater than 0 to turn on extra MAPL print statements. The higher the number the more prints will be sent to the log (and the slower your run will be). Usually 20 is sufficient, although you can go higher.
  3. You can set the "Verbose" and "Warnings" settings in HEMCO_Config.rc to maximum values of 3 to send the maximum number of prints to HEMCO.log.

None of these options require recompiling. Be aware that all of them will slow down your simulation. Be sure to set them back to the default values after you are finished debugging.

Change Domains Stack Size

For runs at very high resolution or small number of processors you may run into a domains stack size error. This is caused by exceeding the domains stack size memory limit set at run-time and will be apparent in your log file. If this occurs you can increase the domains stack size in file input.nml. The default is set to 20000000.

Turn Off MAPL Timers and Memory Logging

Your GCHP log file will includes timing and memory information by default, and this is usually a good thing. If for some reason you want to turn these features off you can do so in file CAP.rc. Search for "MAPL_ENABLE_TIMERS" and "MAPL_ENABLE_MEMUTILS" and simply change "YES" to "NO".

Feedback Welcome!

If there is something you want to do that is not readily explained on this page then we want to hear from you. Please contact the GCST or direct message Lizzie Lundgren on the GCHP Slack workspace with your feedback.


Previous | Next | GCHP Home