Difference between revisions of "Running GCHP: Basics"

From Geos-chem
Jump to: navigation, search
(Run Methods)
 
(69 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''''[[Compiling_GCHP|Previous]] | [[GCHP_Run_Configuration_Files|Next]] | [[Getting_Started_With_GCHP|Getting Started with GCHP]]'''''
+
----
 +
<span style="color:crimson;font-size:120%">'''The GCHP documentation has moved to https://gchp.readthedocs.io/.''' The GCHP documentation on http://wiki.seas.harvard.edu/ will stay online for several months, but it is outdated and no longer active!</span>
 +
----
 +
 
 +
'''''[[Obtaining_a_GCHP_Run_Directory|Previous]] | [[Running_GCHP:_Configuration|Next]] | [[Getting Started with GCHP]] | [[GCHP Main Page]]'''''
 
#[[GCHP_Hardware_and_Software_Requirements|Hardware and Software Requirements]]
 
#[[GCHP_Hardware_and_Software_Requirements|Hardware and Software Requirements]]
#[[Downloading_GCHP|Downloading Source Code]]
 
#[[Obtaining_a_GCHP_Run_Directory|Obtaining a Run Directory]]
 
 
#[[Setting_Up_the_GCHP_Environment|Setting Up the GCHP Environment]]
 
#[[Setting_Up_the_GCHP_Environment|Setting Up the GCHP Environment]]
 +
#[[Downloading_GCHP|Downloading Source Code and Data Directories]]
 
#[[Compiling_GCHP|Compiling]]
 
#[[Compiling_GCHP|Compiling]]
#<span style="color:blue">'''Basic Example Run'''</span>
+
#[[Obtaining_a_GCHP_Run_Directory|Obtaining a Run Directory]]
#[[GCHP_Run_Configuration_Files|Run Configuration Files]]
+
#<span style="color:blue">'''Running GCHP: Basics'''</span>
#[[Running_GCHP|Advanced Run Examples]]
+
#[[Running_GCHP:_Configuration|Running GCHP: Configuration]]
#[[GEOS-Chem_HP_Output_Data|Output Data]]
+
#[[GCHP_Output_Data|Output Data]]
 
#[[Developing_GCHP|Developing GCHP]]
 
#[[Developing_GCHP|Developing GCHP]]
 +
#[[GCHP_Run_Configuration_Files|Run Configuration Files]]
 
<br>
 
<br>
  
 
== Overview ==
 
== Overview ==
  
The default GCHP run directories are configured for a 1-hr simulation at c24 resolution using 0.25x0.325 GEOS-FP meteorology, six cores, and one node. This simple configuration is a good test case to check that GCHP runs on your system. This page presents the basic information needed to run GCHP for this test case.
+
This page presents the basic information needed to run GCHP as well as how to verify a successful run and reuse a run directory. A pre-run checklist is included at the end to help prevent run errors. The GCHP "standard" simulation run directory is configured for a 1-hr simulation at c24 resolution and is a good first test case to check that GCHP runs on your system.
  
== Pre-run Checklist ==
+
== How to Run GCHP ==
  
Prior to running GCHP, always run through the following checklist to ensure everything is set up properly:
+
You can run GCHP locally from within your run directory ("interactively") or by submitting your run to a job scheduler if one is available. Either way, it is useful to put run commands into a reusable script we call the run script. Executing the script will either run GCHP or submit a job that will run GCHP.
  
#Your run directory contains the executable <tt>geos</tt>.
+
There is a symbolic link in the GCHP run directory called <tt>runScriptSamples</tt> that points to a directory in the source code containing example run scripts. Each file includes extra commands that make the run process easier and less prone to user error. These commands include:
#All symbolic links are present in your run directory and point to a valid path. These include <tt>TileFiles</tt>, <tt>MetDir</tt>, <tt>MainDataDir</tt>, <tt>ChemDataDir</tt>, <tt>CodeDir</tt>, and an initial restart file.
+
#Source environment file symbolic link <tt>gchp.env</tt> to ensure run environment consistent with build
#The input meteorology resolution in <tt>ExtData.rc</tt> (inspect with "grep MetDir ExtData.rc") and <tt>MetDir</tt> (inspect with "file MetDir") are as you intend.
+
#Source config file <tt>runConfig.sh</tt> to set run-time configuration
#File <tt>runConfig.sh</tt> has all run settings that you intend to use.
+
#Delete any previous run output files that might interfere with the new run if present
#The restart file grid resolution matches the cubed sphere resolution set in <tt>runConfig.sh</tt>
+
#Send standard output to run-time log file <tt>gchp.log</tt>
#You have a run script. See <tt>runScriptSamples/</tt> for examples. <tt>gchp.run</tt> is the most basic example.
+
#Rename the output restart file to include "restart" and datetime
#The resource allocation in <tt>runConfig.sh</tt> and your run script are consistent.
+
#The run script sources the bashrc file that you used for compiling GCHP.
+
#File <tt>cap_restart</tt> is not present in the run directory. If it is present, you can manually delete it or do "make cleanup_output" to remove files from your previous run. If you want to save files from your previous run, you can use the <tt>archiveRun.sh</tt> script to save them prior to cleaning up the run directory (e.g. ./archive_run.sh my_saved_run)
+
  
== Run Methods ==
+
=== Run Interactively ===
  
You can run GCHP by executing the appropriate run command directly on the command line from within your run directory or by submitting your run as a batch job.  
+
Copy or adapt example run script <tt>gchp.local.run</tt> to run GCHP locally on your machine. Before running, open your run script and set <tt>nCores</tt> to the number of processors you plan to use. Make sure you have this number of processors available locally. It must be at least 6. Next, open file <tt>runConfig.sh</tt> and set <tt>NUM_CORES</tt>, <tt>NUM_NODES</tt>, and <tt>NUM_CORES_PER_NODE</tt> to be consistent with your run script.
  
=== Running as a Batch Job (Recommended) ===
+
To run, type the following at the command prompt:
  
Sample run scripts are included in the <tt>runScriptSamples/</tt> subdirectory for submitting your run as a scheduled job. All example run scripts send standard output to file <tt>GCHP.log</tt> by default and require manually configuring your bashrc filename and job-specific resources such as number of cores and nodes. Unless otherwise noted in the run script filename, all sample run scripts assume use of SLURM (simple linux utility for resource management). If your system is not SLURM, you can adapt the sample run scripts to work on your system.
+
./gchp.local.run
  
To submit your SLURM batch file, simply type:
+
Standard output will be displayed on your screen in addition to being sent to log file <tt>gchp.log</tt>.
 +
 
 +
=== Run as a Batch Job ===
 +
 
 +
Batch job run scripts will vary based on what job scheduler you have available. Most of the example run scripts are for use with SLURM, and the most basic example of these is <tt>gchp.run</tt>. You may copy any of the example run scripts to your run directory and adapt for your system and preferences as needed.
 +
 
 +
At the top of all batch job scripts are configurable run settings. Most critically are requested # cores, # nodes, time, and memory. Figuring out the optimal values for your run can take some trial and error. For a basic six core standard simulation job on one node you should request at least ___ min and __ Gb. The more cores you request the faster GCHP will run.
 +
 
 +
To submit a batch job using SLURM:
  
 
   sbatch gchp.run
 
   sbatch gchp.run
  
Job submission is different for other system. For example, to submit a Grid Engine batch file, type:
+
To submit a batch job using Grid Engine:
  
 
   qsub gchp.run
 
   qsub gchp.run
  
If your computational cluster uses a different job scheduler (e.g. LSF or PBS), then check with your IT staff about how to submit batch jobs.  
+
Standard output will be sent to log file <tt>gchp.log</tt> once the job is started unless you change that feature of the run script. Standard error will be sent to a file specific to your scheduler, e.g. <tt>slurm-''jobid''.out</tt> if using SLURM, unless you configure your run script to do otherwise.
  
=== Running Interactively ===
+
If your computational cluster uses a different job scheduler, e.g. Grid Engine, LSF, or PBS, check with your IT staff or search the internet for how to configure and submit batch jobs. For each job scheduler, batch job configurable settings and acceptable formats are available on the internet and are often accessible from the command line. For example, type <tt>man sbatch</tt> to scroll through options for SLURM, including various ways of specifying number of cores, time and memory requested.
  
Before running GCHP interactively, check that your environment is set up properly and you have at least 6 cores available with 6G memory per core. Then execute the following command from within your run directory if using SLURM (you will need to adapt this for other systems):
+
== Verify a Successful Run ==
  
srun -n 6 --mpi=pmi2 ./geos 2>&1 | tee GCHP.log
+
There are several ways to verify that your run was successful.  
  
This command can be broken down as follows:
+
# NetCDF files are present in the <tt>OutputDir</tt> subdirectory
 +
# Standard output file <tt>gchp.log</tt> ends with <tt>Model Throughput</tt> timing information
 +
# The job scheduler log does not contain any error messages
  
{| border=1 cellspacing=0 cellpadding=5
+
If it looks like something went wrong, scan through the log files to determine where there may have been an error. Here are a few debugging tips:
|-bgcolor="#CCCCCC"
+
*Review all of your configuration files to ensure you have proper setup
!width="200ox"|Command
+
*<tt>MAPL_Cap</tt> errors typically indicate an error with your start time, end time, and/or duration set in <tt>runConfig.sh</tt>
!width="800px"|What it does
+
*<tt>MAPL_ExtData</tt> errors often indicate an error with your input files specified in either <tt>HEMCO_Config.rc</tt> or <tt>ExtData.rc</tt>
 +
*<tt>MAPL_HistoryGridComp</tt> errors are related to your configured output in <tt>HISTORY.rc</tt>
  
|-valign="top"
+
If you cannot figure out where the problem is please do not hesitate to create a [https://github.com/geoschem/gchpctm/issues GCHPctm GitHub issue].
|<tt>srun ... ./geos</tt>
+
|Runs executable <tt>geos</tt> as a parallel job
+
  
|-valign="top"
+
== Reuse a Run Directory ==
|<tt>-n 6</tt>
+
|Specifies how many individual CPU cores are requested for the run. The number given here should always be the total number of cores, regardless of how many nodes they are spread over. The number of CPUs that you request must be a multiple of 6 (at least one core for each of the [http://geos-chem.org/cubed_sphere/CubeSphere_step-by-step.html cubed-sphere faces], and the same number of cores for each face).
+
  
|-valign="top"
+
=== Archive Run Output ===
|<tt>--mpi-pmi2</tt>
+
|Specifies usage of the MVAPICH2 implementation of MPI. Do not include this if not using MVAPICH2.
+
  
|-valign="top"
+
Reusing a GCHP run directory comes with the perils of losing your old work. To mitigate this issue there is utility shell script <tt>archiveRun.sh</tt>. This script archives data output and configuration files to a subdirectory that will not be deleted if you clean your run directory.
|<tt><nowiki>2>&1 | tee GCHP.log</nowiki></tt>
+
|Specifies that all MAPL output, both standard and error, be written to both the screen and to file <tt>GCHP.log</tt>.  
+
  
|}
+
Archiving runs is useful for other reasons as well, including:
 +
*Save all settings and logs for later reference after a run crashes
 +
*Generate data from the same executable using different run-time settings for comparison, e.g. c48 versus c180
 +
*Run short runs in quick succession for debugging
  
You can also adapt a GCHP run script for interactive use. Simply comment out the scheduler-specific code such as #SBATCH headers for SLURM.
+
To archive a run, pass the archive script a descriptive subdirectory name where data will be archived. For example:
  
== Verifying a Successful Run ==
+
./archiveRun.sh 1mo_c24_24hrdiag
  
There are several ways to verify that your run was successful.  
+
All files are archived to subfolders in the new directory. Which files are copied and to where are displayed on the screen. Diagnostic files in the <tt>OutputDir</tt> directory are moved rather than copied so as not to duplicate large files. You will be prompted at the command line to accept this change prior to data move.
  
# NetCDF files are present in the <tt>OutputDir</tt> subdirectory.
+
=== Clean the Run Directory ===
# <tt>GCHP.log</tt> ends with timing information for the run.
+
# Your scheduler log (e.g. output from SLURM) does not contain any obvious errors.
+
# <tt>GCHP.log</tt> contains text with format "AGCM Date: YYYY/MM/DD  Time: HH:mm:ss" for each timestep (e.g. 00:10, 00:20, 00:30, 00:40, 00:50, and 01:00 for a 1-hr run). 
+
  
If it looks like something went wrong, check all log files (type "ls *.log" in run directory to list them) as well as your scheduler output file (if one exists) to determine where there may have been an error. Beware that if you have a problem in one of your configuration files then you will likely see a MAPL error with traceback to the <tt>GCHP/Shared</tt> directory. Review all of your configuration files to ensure you have proper setup.
+
You should always clean your run directory prior to your next run. This avoids confusion about what output was generated when and with what settings. Under certain circumstances it also avoids having your new run crash. GCHP will crash if:
 +
*Output file <tt>cap_restart</tt> is present and you did not change your start/end times
 +
*Your last run failed in such a way that the restart file was not renamed in the post-run commands in the run script
 +
 
 +
The example run scripts include extra commands to clean the run directory of the two problematic files listed above. However, you may write your own run script and omit them in which case not cleaning the run directory prior to rerun will cause problems.
 +
 
 +
To make run directory cleaning simple is utility shell script <tt>cleanRunDir.sh</tt>. To clean the run directory simply execute this script.
 +
 
 +
  ./cleanRunDir.sh
 +
 
 +
All GCHP output files, including diagnostics files in <tt>OutputDir</tt>, will then be deleted. Only restart files with names that begin with <tt>gcchem</tt> are deleted. This preserve the initial restart symbolic links that come with the run directory.
 +
 
 +
== Pre-run Checklist ==
  
GCHP errors can be cryptic. If you find yourself debugging within MAPL then you may be on the wrong track as most issues can be resolved by updating the run settings. Please send an email to the GEOS-Chem Support Team if you hit a wall deciphering the problem. You can also reach out the GCHP community in the GCHP Slack workspace.
+
Prior to running GCHP, always run through the following checklist to ensure everything is set up properly.  
 +
#Your run directory contains the executable <tt>gchp</tt>.
 +
#All symbolic links in your run directory are valid (no broken links)
 +
#You have looked through and set all configurable settings in <tt>runConfig.sh</tt> (discussed in the next chapter)
 +
#If running via a job scheduler: you have a run script and the resource allocation in <tt>runConfig.sh</tt> and your run script are consistent (# nodes and cores)
 +
#If running interactively: the resource allocation in <tt>runConfig.sh</tt> is available locally
 +
#If reusing a run directory (optional but recommended): you have archived your last run with <tt>./archiveRun.sh</tt> if you want to keep it and you have deleted old output files with <tt>./cleanRunDir.sh</tt>
  
 
--------------------------------------
 
--------------------------------------
'''''[[Compiling_GCHP|Previous]] | [[GCHP_Run_Configuration_Files|Next]] | [[GEOS-Chem_HP|GCHP Home]]'''''
+
'''''[[Obtaining_a_GCHP_Run_Directory|Previous]] | [[Running_GCHP:_Configuration|Next]] | [[Getting Started with GCHP]] | [[GCHP Main Page]]'''''

Latest revision as of 15:40, 8 December 2020


The GCHP documentation has moved to https://gchp.readthedocs.io/. The GCHP documentation on http://wiki.seas.harvard.edu/ will stay online for several months, but it is outdated and no longer active!


Previous | Next | Getting Started with GCHP | GCHP Main Page

  1. Hardware and Software Requirements
  2. Setting Up the GCHP Environment
  3. Downloading Source Code and Data Directories
  4. Compiling
  5. Obtaining a Run Directory
  6. Running GCHP: Basics
  7. Running GCHP: Configuration
  8. Output Data
  9. Developing GCHP
  10. Run Configuration Files


Overview

This page presents the basic information needed to run GCHP as well as how to verify a successful run and reuse a run directory. A pre-run checklist is included at the end to help prevent run errors. The GCHP "standard" simulation run directory is configured for a 1-hr simulation at c24 resolution and is a good first test case to check that GCHP runs on your system.

How to Run GCHP

You can run GCHP locally from within your run directory ("interactively") or by submitting your run to a job scheduler if one is available. Either way, it is useful to put run commands into a reusable script we call the run script. Executing the script will either run GCHP or submit a job that will run GCHP.

There is a symbolic link in the GCHP run directory called runScriptSamples that points to a directory in the source code containing example run scripts. Each file includes extra commands that make the run process easier and less prone to user error. These commands include:

  1. Source environment file symbolic link gchp.env to ensure run environment consistent with build
  2. Source config file runConfig.sh to set run-time configuration
  3. Delete any previous run output files that might interfere with the new run if present
  4. Send standard output to run-time log file gchp.log
  5. Rename the output restart file to include "restart" and datetime

Run Interactively

Copy or adapt example run script gchp.local.run to run GCHP locally on your machine. Before running, open your run script and set nCores to the number of processors you plan to use. Make sure you have this number of processors available locally. It must be at least 6. Next, open file runConfig.sh and set NUM_CORES, NUM_NODES, and NUM_CORES_PER_NODE to be consistent with your run script.

To run, type the following at the command prompt:

./gchp.local.run

Standard output will be displayed on your screen in addition to being sent to log file gchp.log.

Run as a Batch Job

Batch job run scripts will vary based on what job scheduler you have available. Most of the example run scripts are for use with SLURM, and the most basic example of these is gchp.run. You may copy any of the example run scripts to your run directory and adapt for your system and preferences as needed.

At the top of all batch job scripts are configurable run settings. Most critically are requested # cores, # nodes, time, and memory. Figuring out the optimal values for your run can take some trial and error. For a basic six core standard simulation job on one node you should request at least ___ min and __ Gb. The more cores you request the faster GCHP will run.

To submit a batch job using SLURM:

 sbatch gchp.run

To submit a batch job using Grid Engine:

 qsub gchp.run

Standard output will be sent to log file gchp.log once the job is started unless you change that feature of the run script. Standard error will be sent to a file specific to your scheduler, e.g. slurm-jobid.out if using SLURM, unless you configure your run script to do otherwise.

If your computational cluster uses a different job scheduler, e.g. Grid Engine, LSF, or PBS, check with your IT staff or search the internet for how to configure and submit batch jobs. For each job scheduler, batch job configurable settings and acceptable formats are available on the internet and are often accessible from the command line. For example, type man sbatch to scroll through options for SLURM, including various ways of specifying number of cores, time and memory requested.

Verify a Successful Run

There are several ways to verify that your run was successful.

  1. NetCDF files are present in the OutputDir subdirectory
  2. Standard output file gchp.log ends with Model Throughput timing information
  3. The job scheduler log does not contain any error messages

If it looks like something went wrong, scan through the log files to determine where there may have been an error. Here are a few debugging tips:

  • Review all of your configuration files to ensure you have proper setup
  • MAPL_Cap errors typically indicate an error with your start time, end time, and/or duration set in runConfig.sh
  • MAPL_ExtData errors often indicate an error with your input files specified in either HEMCO_Config.rc or ExtData.rc
  • MAPL_HistoryGridComp errors are related to your configured output in HISTORY.rc

If you cannot figure out where the problem is please do not hesitate to create a GCHPctm GitHub issue.

Reuse a Run Directory

Archive Run Output

Reusing a GCHP run directory comes with the perils of losing your old work. To mitigate this issue there is utility shell script archiveRun.sh. This script archives data output and configuration files to a subdirectory that will not be deleted if you clean your run directory.

Archiving runs is useful for other reasons as well, including:

  • Save all settings and logs for later reference after a run crashes
  • Generate data from the same executable using different run-time settings for comparison, e.g. c48 versus c180
  • Run short runs in quick succession for debugging

To archive a run, pass the archive script a descriptive subdirectory name where data will be archived. For example:

./archiveRun.sh 1mo_c24_24hrdiag

All files are archived to subfolders in the new directory. Which files are copied and to where are displayed on the screen. Diagnostic files in the OutputDir directory are moved rather than copied so as not to duplicate large files. You will be prompted at the command line to accept this change prior to data move.

Clean the Run Directory

You should always clean your run directory prior to your next run. This avoids confusion about what output was generated when and with what settings. Under certain circumstances it also avoids having your new run crash. GCHP will crash if:

  • Output file cap_restart is present and you did not change your start/end times
  • Your last run failed in such a way that the restart file was not renamed in the post-run commands in the run script

The example run scripts include extra commands to clean the run directory of the two problematic files listed above. However, you may write your own run script and omit them in which case not cleaning the run directory prior to rerun will cause problems.

To make run directory cleaning simple is utility shell script cleanRunDir.sh. To clean the run directory simply execute this script.

 ./cleanRunDir.sh

All GCHP output files, including diagnostics files in OutputDir, will then be deleted. Only restart files with names that begin with gcchem are deleted. This preserve the initial restart symbolic links that come with the run directory.

Pre-run Checklist

Prior to running GCHP, always run through the following checklist to ensure everything is set up properly.

  1. Your run directory contains the executable gchp.
  2. All symbolic links in your run directory are valid (no broken links)
  3. You have looked through and set all configurable settings in runConfig.sh (discussed in the next chapter)
  4. If running via a job scheduler: you have a run script and the resource allocation in runConfig.sh and your run script are consistent (# nodes and cores)
  5. If running interactively: the resource allocation in runConfig.sh is available locally
  6. If reusing a run directory (optional but recommended): you have archived your last run with ./archiveRun.sh if you want to keep it and you have deleted old output files with ./cleanRunDir.sh

Previous | Next | Getting Started with GCHP | GCHP Main Page