Running GCHP: Basics

From Geos-chem
Revision as of 15:25, 8 August 2018 by Lizzie Lundgren (Talk | contribs) (Overview)

Jump to: navigation, search

Previous | Next | Getting Started with GCHP

  1. Hardware and Software Requirements
  2. Downloading Source Code
  3. Obtaining a Run Directory
  4. Setting Up the GCHP Environment
  5. Compiling
  6. Basic Example Run
  7. Run Configuration Files
  8. Advanced Run Examples
  9. Output Data
  10. Developing GCHP


Overview

The default GCHP run directories are configured for a 1-hr simulation at c24 resolution using 0.25x0.325 GEOS-FP meteorology, six cores, one node. This simple configuration is a good test case to check that GCHP runs on your system. This page presents the basic information needed to run GCHP using this configuration.

Pre-run Checklist

Prior to running GCHP, always run through the following checklist to ensure everything is set up properly:

  1. All symbolic links are present in your run directory and point to a valid path. These include TileFiles, MetDir, MainDataDir, ChemDataDir, CodeDir, and four restart files of form initial_GEOSChem_rst.cxx_standard.nc.
  2. Your GCHP .bashrc is sourced if you plan on running interactively or recompiling. A quick way to check is to run ./build.sh help. You will be prompted to source your .bashrc if MPI_ROOT set in .bashrc is not set.
  3. Your run directory contains the executable geos.
  4. The input meteorology resolution in ExtData.rc and MetDir are the same and as you intend. All met-field data filenames are listed at the start of the file and include grid resolution.
  5. The cap_restart file is deleted. You will get a segmentation fault if this output file is present after recompiling GCHP, as documented in the Troubleshooting GCHP wiki page.
  6. The runConfig.sh file has all run settings that you intend to use, including internal resolution (e.g. c24).
  7. The resource allocation in runConfig.sh and your run script (if using) are consistent.

Run Methods

You can run GCHP by executing the appropriate run command directly on the command line from within your run directory or by submitting your run as a batch job. The GCHP run directory is set up for running with 6 cores on 1 node using the MVAPICH2 implementation of MPI. Configuring GCHP to run on additional cores and multiple nodes or with a different implementation of MPI are covered in the advanced runs chapter later in this guide. This page gives an introduction to running GCHP using the default settings in the run directory. All commands provided are compatible for systems that use the SLURM workload manager. If you have a different system you may need to adjust the run command.

Running as a Batch Job

Two sample run scripts are included in the GCHP run directory for submitting your run as a scheduled job. Both run scripts send standard output to file GCHP.log by default.

  1. GCHP_slurm.run is for SLURM workload management systems and is customized for use on the Harvard Odyssey compute cluster.
  2. GCHP_gridengine.run is for Grid Engine systems, and is customized for use on the "Glooscap" Compute Canada cluster.

The SLURM run script also copied to the main run directory as gchp.run. If your system is not SLURM, you can replace gchp.run contents with whatever works on your system to submit a GCHP job. For example, if you have an SGE system on your cluster, you can simply copy runScriptSamples/gchp_gridengine.run to gchp.run.

Make sure that you source your environment file, the same one used for compilation, within your run script prior to submitting your job. Running with libraries different than those you compiled with will cause the run to fail. Also make sure that you source runConfig.sh (after inspection) to set configuration parameters at run-time.

To submit your SLURM batch file, simply type:

 sbatch gchp.run

To submit the Grid Engine batch file, type:

 qsub gchp.run

If your computational cluster uses a different job scheduler (e.g. LSF or PBS), then check with your IT staff about how to submit batch jobs. We recommend using gchp_slurm.run as a template because it includes other features that are useful, such as automatic logging, running runConfig.sh to set input parameters, and sourcing the environment file.
Important: If using a run script, your resource allocation (e.g. number of nodes specified with SBATCH in SLURM) must be consistent with configuration file runConfig.sh.

Running Interactively on SLURM

Before running GCHP interactively, check that your environment is set up properly and you have at least 6 cores available with 6G memory per core. Then execute the following command from within your run directory:

srun -n 6 --mpi=pmi2 ./geos 2>&1 | tee GCHP.log

This command can be broken down as follows:

Command What it does
srun ... ./geos Runs executable geos as a parallel job
-n 6 Specifies how many individual CPU cores are requested for the run. The number given here should always be the total number of cores, regardless of how many nodes they are spread over. The number of CPUs that you request must be a multiple of 6 (at least one core for each of the cubed-sphere faces, and the same number of cores for each face).
--mpi-pmi2 Specifies usage of the MVAPICH2 implementation of MPI.
2>&1 | tee GCHP.log Specifies that all MAPL output, both standard and error, be written to both the screen and to file GCHP.log.

The output log file GCHP.log is created by MAPL and does not include the usual log output you see with GEOS-Chem classic. The traditional GEOS-Chem log output (e.g. from write statements in GeosCore files) is automatically sent to a file with name defined in configuration file GCHP.rc (more on that in the next chapter). By default, this log file has name PET0000.GEOSCHEMchem.log, where "PET0000" represents the first persistent execution thread. Unlike MAPL, which sends output to the log from ALL threads, GEOS-Chem only outputs from a single thread. This behavior is forced using the AM_I_ROOT logical flag in conditionals throughout the GEOS-Chem source code.

Verifying a Successful Run

There are several ways to verify that your run was successful.

  1. NetCDF files are present in the OutputDir subdirectory.
  2. GCHP.log ends with timing information for the run.
  3. GCHP.log contains text with format "AGCM Date: YYYY/MM/DD Time: HH:mm:ss" for each timestep (e.g. 00:10, 00:20, 00:30, 00:40, 00:50, and 01:00 for a 1-hr run).

If it looks like something went wrong, check all log files as well as your scheduler output file (if one exists) to determine where there may have been an error. Beware that if you have a problem in one of your configuration (*.rc) files then you will likely see a MAPL error with traceback to the GCHP/Shared directory. Review all of your configuration files to ensure you have proper setup (more on how to do this in the next chapter).

There is a Troubleshooting GCHP wiki page for users to document problems encountered and their solutions. This is the first resource to consult when you run into a problem you cannot sort out. We encourage you to sign up for a wiki account and add your issue (whether you have a solution yet or not) and contribute to this group effort.

GCHP errors can be cryptic so please reach out to us if you hit a wall deciphering the problem. Be sure to include your log, configuration, .bashrc, and run script files in your message.


Previous | Next | GCHP Home