Difference between revisions of "Running GCHP: Basics"

From Geos-chem
Jump to: navigation, search
(Archiving a Run)
 
(8 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
== Overview ==
 
== Overview ==
  
This page presents the basic information needed to run GCHP as well as how to verify a successful run and reuse a run directory. The default GCHP run directories (except benchmark) are configured for a 1-hr simulation at c24 resolution using native resolution meteorology, six cores, and one node. This simple configuration is a good test case to check that GCHP runs on your system. Typically simulations require about 50G. More advanced instructions for configuring your GCHP run with different settings is in the next chapter.
+
This page presents the basic information needed to run GCHP as well as how to verify a successful run and reuse a run directory. The default GCHP run directories (except benchmark) are configured for a 1-hr simulation at c24 resolution using native resolution meteorology, six cores, and one node. This simple configuration is a good test case to check that GCHP runs on your system. Typically simulations require about 90G for a full chemistry run due to memory needed for ESMF regridding. More advanced instructions for configuring your GCHP run with different settings is in the next chapter.
  
 
== Pre-run Checklist ==
 
== Pre-run Checklist ==
Line 74: Line 74:
 
One of the benefits of GCHP relative to GEOS-Chem Classic is that you can reuse a run directory for different grid resolutions without recompiling. You can also copy your executable between different simulation run directories as long as you are using the same code. However, reusing a run directory comes with the perils of losing your old work. To mitigate this issue there is utility shell script <tt>archiveRun.sh</tt> to archive data output and configuration files to a subdirectory within your run directory. All you need to do is pass a non-existent subdirectory name of your choosing. Here is an example:
 
One of the benefits of GCHP relative to GEOS-Chem Classic is that you can reuse a run directory for different grid resolutions without recompiling. You can also copy your executable between different simulation run directories as long as you are using the same code. However, reusing a run directory comes with the perils of losing your old work. To mitigate this issue there is utility shell script <tt>archiveRun.sh</tt> to archive data output and configuration files to a subdirectory within your run directory. All you need to do is pass a non-existent subdirectory name of your choosing. Here is an example:
  
  ./archiveRun.sh c48_test
+
  ./archiveRun.sh c24_3hr
  
 
The following output is then printed to screen to show you exactly what is being archived and where:
 
The following output is then printed to screen to show you exactly what is being archived and where:
  
  Archiving files...
+
  Archiving files to directory c24_3hr
-> c48_test/build/lastbuild
+
Moving files and directories...
-> c48_test/build/compile.log
+
  Warning: No files to move from Plots
  -> c48_test/config/input.geos
+
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0030z.nc4
-> c48_test/config/CAP.rc
+
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0130z.nc4
-> c48_test/config/ExtData.rc
+
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0230z.nc4
-> c48_test/config/fvcore_layout.rc
+
  Copying files...
-> c48_test/config/GCHP.rc
+
  -> c24_3hr/config/input.geos
-> c48_test/config/HEMCO_Config.rc
+
  -> c24_3hr/config/CAP.rc
-> c48_test/config/HEMCO_Diagn.rc
+
  -> c24_3hr/config/ExtData.rc
-> c48_test/config/HISTORY.rc
+
  -> c24_3hr/config/fvcore_layout.rc
-> c48_test/restarts/gcchem_internal_checkpoint_c24.nc
+
  -> c24_3hr/config/GCHP.rc
-> c48_test/logs/compile.log
+
  -> c24_3hr/config/HEMCO_Config.rc
-> c48_test/logs/gchp.log
+
  -> c24_3hr/config/HEMCO_Diagn.rc
-> c48_test/logs/HEMCO.log
+
  -> c24_3hr/config/HISTORY.rc
-> c48_test/logs/PET00000.GEOSCHEMchem.log
+
  -> c24_3hr/config/runConfig.sh
-> c48_test/logs/slurm-50168021.out
+
  -> c24_3hr/config/gchp.local.run
-> c48_test/run/runConfig.sh
+
  -> c24_3hr/config/gchp.run
-> c48_test/run/gchp.run
+
  -> c24_3hr/config/gchp.env
-> c48_test/run/gchp.ifort17_openmpi_odyssey.env
+
  -> c24_3hr/logs/compile.log
Warning: *.multirun.sh not found
+
  -> c24_3hr/logs/gchp.log
 +
  -> c24_3hr/logs/HEMCO.log
 +
  -> c24_3hr/logs/mem_transportTracers_1mo.log
 +
  -> c24_3hr/logs/PET00000.GEOSCHEMchem.log
 +
  Warning: slurm-* not found
 +
  -> c24_3hr/checkpoints/gcchem_internal_checkpoint.20160101_0000z.nc4
 +
  -> c24_3hr/checkpoints/gcchem_internal_checkpoint.restart.20160101_030000.nc4
 +
  -> c24_3hr/checkpoints/cap_restart
 +
  -> c24_3hr/restart/initial_GEOSChem_rst.c24_TransportTracers.nc
 
  Complete!
 
  Complete!
  
All files except output data are copied so that you can still see them after archiving. The output data are moved rather than copied, leaving your <tt>OutputDir</tt> directory empty. In this particular example I archive a single segment run (single job) which is why there is a warning about a multirun file being missing. This can be ignored.  
+
All files except output diagnostics data are copied so that you can still see them after archiving. This includes restart files which remain in your run directory until you delete them. However, the diagnostic data are moved rather than copied, leaving your <tt>OutputDir</tt> directory empty. In this particular example I ran interactively so no SLURM file was found, and I archived a single segment run (single job) which is why there is a warning about a multirun file being missing. This can be ignored. If you do a multi-run, which involves running multiple consecutive jobs, archiving will move data and copy other files from all runs to your archive directory. And if you do a run as a batch job using SLURM, the SLURM files will be sent to the logs archive directory.
  
 
Since the <tt>archiveRun.sh</tt> is a simple bash script you may edit it to do customized archiving based on your own preferences.
 
Since the <tt>archiveRun.sh</tt> is a simple bash script you may edit it to do customized archiving based on your own preferences.
Line 128: Line 136:
 
=== Rerunning Without Cleaning ===
 
=== Rerunning Without Cleaning ===
  
You can reuse a run directory without cleaning it and without archiving your last run. Files will generally simply be replaced by files generated in the next run. This will work okay with one exception. The output <tt>cap_restart</tt> file must be removed prior to subsequent runs if you are starting a run from scratch. The <tt>cap_restart</tt> file contains a date and time string for the end of your last run. GCHP will attempt to start your next run at this date and time if the file is present. This is useful for splitting up a run into multiple jobs. Unless you are doing this you should always delete <tt>cap_restart</tt> before a new run. This is included in all sample run scripts except the multi-run run script which has special handling of <tt>cap_restart</tt> to pick up where the last run left off. See the next chapter for more information on the multi-run option.
+
You can reuse a run directory without cleaning it and without archiving your last run. Files will generally simply be replaced by files generated in the next run. This will work okay with twos exceptions.  
 +
 
 +
====cap_restart must be deleted====
 +
The output <tt>cap_restart</tt> file must be removed prior to subsequent runs if you are starting a run from scratch. The <tt>cap_restart</tt> file contains a date and time string for the end of your last run. GCHP will attempt to start your next run at this date and time if the file is present. This is useful for splitting up a run into multiple jobs. Unless you are doing this you should always delete <tt>cap_restart</tt> before a new run. This is included in all sample run scripts except the multi-run run script which has special handling of <tt>cap_restart</tt> to pick up where the last run left off. See the next chapter for more information on the multi-run option.
 +
 
 +
====gcchem_internal_checkpoint must be deleted or renamed====
 +
The GCHP output restart filename is configured in <code>GCHP.rc</code>. If a file with that name exists at the start of a run GCHP will fail at the end when it tries to overwrite it. This is a quirk with the new version of MAPL introduced in GCHP 12.5.0. To get around this, all sample run scripts located in the run directory <code>runScriptSamples</code> directory rename the output checkpoint file to a file containing 'restart' and timestamp. However, if your run fails with early exit you may have the original restart file present since it is created at the start of the run, remaining empty until successful end. Using <code>make cleanup_output</code> to clean up your run directory prior to rerunning will prevent this issue since it include deletion of all files starting with "gcchem".
  
 
--------------------------------------
 
--------------------------------------
 
'''''[[Compiling_GCHP|Previous]] | [[Running_GCHP:_Configuration|Next]] | [[Getting Started With GCHP]] | [[GCHP Main Page]]'''''
 
'''''[[Compiling_GCHP|Previous]] | [[Running_GCHP:_Configuration|Next]] | [[Getting Started With GCHP]] | [[GCHP Main Page]]'''''

Latest revision as of 22:17, 15 August 2019

Previous | Next | Getting Started With GCHP | GCHP Main Page

  1. Hardware and Software Requirements
  2. Downloading Source Code and Data Directories
  3. Obtaining a Run Directory
  4. Setting Up the GCHP Environment
  5. Compiling
  6. Running GCHP: Basics
  7. Running GCHP: Configuration
  8. Output Data
  9. Developing GCHP
  10. Run Configuration Files


Overview

This page presents the basic information needed to run GCHP as well as how to verify a successful run and reuse a run directory. The default GCHP run directories (except benchmark) are configured for a 1-hr simulation at c24 resolution using native resolution meteorology, six cores, and one node. This simple configuration is a good test case to check that GCHP runs on your system. Typically simulations require about 90G for a full chemistry run due to memory needed for ESMF regridding. More advanced instructions for configuring your GCHP run with different settings is in the next chapter.

Pre-run Checklist

Prior to running GCHP, always run through the following checklist to ensure everything is set up properly.

  1. Your run directory contains the executable geos.
  2. All symbolic links in your run directory are valid (no broken links)
  3. You have looked through and set all configurable settings in runConfig.sh (discussed in the next chapter)
  4. You have a run script (see below for information about run scripts)
  5. If submitting your job, the resource allocation in runConfig.sh and your run script are consistent (# nodes and cores)
  6. If running interactively, the resource allocation in runConfig.sh is available locally
  7. If reusing a run directory, you have archived your last run or discarded it with 'make cleanup_output' (optional but recommended; discussed below)

How to Run GCHP

You can run GCHP locally from within your run directory (interactively) or by submitting your run to your cluster's job scheduler. To make running GCHP simpler there is a folder in the GCHP run directory called runScriptSamples that contains example scripts to run GCHP. Each file includes additional steps to make the run process easier, including sourcing your environment file so all libraries are loaded, sourcing config file runConfig.sh to set run-time configuration (more on this in the next chapter), and sending standard output to log file gchp.log.

Running Interactively

Use example run script gchp.local.run to run GCHP locally on your machine. Before running, check that you have at least 6 cores available at your disposal. Then copy gchp.local.run to the main level of your run directory and type the following at the command prompt. Make sure you have gone through the pre-run checklist first.

./gchp.local.run

If your run crashes during transport then you need additional memory. Either request an interactive session on your cluster with additional memory or consider running GCHP as a batch job by submitting your run to a job scheduler.

Running as a Batch Job

The recommended job script example is gchp.run which is custom for use with SLURM on the Harvard University Odyssey cluster. However, it may be adapted for other systems. You may also adapt the interactive run script gchp.local.run for your system as well. The "multirun" scripts are for submitting multiple consecutive jobs in a row, a useful feature for generating monthly diagnostics, and are more advanced. Read more about that option in the chapter on configuring a run.

Example job-submission run scripts send standard output to file gchp.log by default and require manually configuring your job-specific resources such as number of cores and nodes.

If using SLURM, submit your batch job with this command:

 sbatch gchp.run

Job submission is different for other systems. For example, to submit a Grid Engine batch file, type:

 qsub gchp.run

If your computational cluster uses a different job scheduler (e.g. LSF or PBS), then check with your IT staff about how to submit batch jobs. Please also consider submitting your working run script for inclusion in the run script examples folder in future versions. This will make workflow easier for both you and potentially other users.

Verifying a Successful Run

There are several ways to verify that your run was successful.

  1. NetCDF files are present in the OutputDir subdirectory.
  2. gchp.log ends with timing information for the run.
  3. Your scheduler log (e.g. output from SLURM) does not contain any obvious errors.
  4. gchp.log contains text with format "AGCM Date: YYYY/MM/DD Time: HH:mm:ss" for each timestep you ran at.

If it looks like something went wrong, scan through gchp.log (sometimes the error is near the top) as well as your scheduler output file (if one exists) to determine where there may have been an error. Beware that if you have a problem in one of your configuration files then you will likely see a MAPL error with traceback to the GCHP/Shared directory. Review all of your configuration files to ensure you have proper setup. Errors in "CAP" typically indicate an error with your start time, end time, and/or duration set in runConfig.sh (more on this file in the next chapter). Errors in "ExtData" often indicate an error with your input files specified in either HEMCO_Config.rc or ExtData.rc. Errors in "HISTORY" are related to your configured output in HISTORY.rc

GCHP errors can be cryptic. If you find yourself debugging within MAPL then you may be on the wrong track as most issues can be resolved by updating the run settings. If you cannot figure out where you are going wrong please create an issue on the GCHP GitHub issue tracker located at https://github.com/geoschem/gchp/issues.

Reusing a Run Directory

Archiving a Run

One of the benefits of GCHP relative to GEOS-Chem Classic is that you can reuse a run directory for different grid resolutions without recompiling. You can also copy your executable between different simulation run directories as long as you are using the same code. However, reusing a run directory comes with the perils of losing your old work. To mitigate this issue there is utility shell script archiveRun.sh to archive data output and configuration files to a subdirectory within your run directory. All you need to do is pass a non-existent subdirectory name of your choosing. Here is an example:

./archiveRun.sh c24_3hr

The following output is then printed to screen to show you exactly what is being archived and where:

Archiving files to directory c24_3hr
Moving files and directories...
  Warning: No files to move from Plots
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0030z.nc4
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0130z.nc4
  -> c24_3hr/diagnostics/GCHP.SpeciesConc.20160101_0230z.nc4
Copying files...
  -> c24_3hr/config/input.geos
  -> c24_3hr/config/CAP.rc
  -> c24_3hr/config/ExtData.rc
  -> c24_3hr/config/fvcore_layout.rc
  -> c24_3hr/config/GCHP.rc
  -> c24_3hr/config/HEMCO_Config.rc
  -> c24_3hr/config/HEMCO_Diagn.rc
  -> c24_3hr/config/HISTORY.rc
  -> c24_3hr/config/runConfig.sh
  -> c24_3hr/config/gchp.local.run
  -> c24_3hr/config/gchp.run
  -> c24_3hr/config/gchp.env
  -> c24_3hr/logs/compile.log
  -> c24_3hr/logs/gchp.log
  -> c24_3hr/logs/HEMCO.log
  -> c24_3hr/logs/mem_transportTracers_1mo.log
  -> c24_3hr/logs/PET00000.GEOSCHEMchem.log
  Warning: slurm-* not found
  -> c24_3hr/checkpoints/gcchem_internal_checkpoint.20160101_0000z.nc4
  -> c24_3hr/checkpoints/gcchem_internal_checkpoint.restart.20160101_030000.nc4
  -> c24_3hr/checkpoints/cap_restart
  -> c24_3hr/restart/initial_GEOSChem_rst.c24_TransportTracers.nc
Complete!

All files except output diagnostics data are copied so that you can still see them after archiving. This includes restart files which remain in your run directory until you delete them. However, the diagnostic data are moved rather than copied, leaving your OutputDir directory empty. In this particular example I ran interactively so no SLURM file was found, and I archived a single segment run (single job) which is why there is a warning about a multirun file being missing. This can be ignored. If you do a multi-run, which involves running multiple consecutive jobs, archiving will move data and copy other files from all runs to your archive directory. And if you do a run as a batch job using SLURM, the SLURM files will be sent to the logs archive directory.

Since the archiveRun.sh is a simple bash script you may edit it to do customized archiving based on your own preferences.

Cleaning the Run Directory

If you have archived your last run, or simply do not want to keep it, you should then clean your run directory prior to your next run by doing "make cleanup_output". Here is an example of output printed when cleaning the run directory:

rm -f /n/home/gchp_RnPbBe/OutputDir/*.nc4
rm -f trac_avg.*
rm -f tracerinfo.dat
rm -f diaginfo.dat
rm -f cap_restart
rm -f gcchem*
rm -f *.rcx
rm -f *~
rm -f gchp.log
rm -f HEMCO.log
rm -f PET*.log
rm -f multirun.log
rm -f logfile.000000.out
rm -f slurm-*
rm -f 1
rm -f EGRESS

Rerunning Without Cleaning

You can reuse a run directory without cleaning it and without archiving your last run. Files will generally simply be replaced by files generated in the next run. This will work okay with twos exceptions.

cap_restart must be deleted

The output cap_restart file must be removed prior to subsequent runs if you are starting a run from scratch. The cap_restart file contains a date and time string for the end of your last run. GCHP will attempt to start your next run at this date and time if the file is present. This is useful for splitting up a run into multiple jobs. Unless you are doing this you should always delete cap_restart before a new run. This is included in all sample run scripts except the multi-run run script which has special handling of cap_restart to pick up where the last run left off. See the next chapter for more information on the multi-run option.

gcchem_internal_checkpoint must be deleted or renamed

The GCHP output restart filename is configured in GCHP.rc. If a file with that name exists at the start of a run GCHP will fail at the end when it tries to overwrite it. This is a quirk with the new version of MAPL introduced in GCHP 12.5.0. To get around this, all sample run scripts located in the run directory runScriptSamples directory rename the output checkpoint file to a file containing 'restart' and timestamp. However, if your run fails with early exit you may have the original restart file present since it is created at the start of the run, remaining empty until successful end. Using make cleanup_output to clean up your run directory prior to rerunning will prevent this issue since it include deletion of all files starting with "gcchem".


Previous | Next | Getting Started With GCHP | GCHP Main Page