Difference between revisions of "Running GCHP: Configuration"

From Geos-chem
Jump to: navigation, search
Line 1: Line 1:
 
__FORCETOC__
 
__FORCETOC__
 
'''''[[GCHP_Run_Configuration_Files|Previous]] | [[GEOS-Chem_HP_Output_Data| Next]] | [[Getting_Started_With_GCHP|Getting Started with GCHP]]'''''
 
'''''[[GCHP_Run_Configuration_Files|Previous]] | [[GEOS-Chem_HP_Output_Data| Next]] | [[Getting_Started_With_GCHP|Getting Started with GCHP]]'''''
 +
#[[GCHP_Hardware_and_Software_Requirements|Hardware and Software Requirements]]
 
#[[Downloading_GCHP|Downloading Source Code]]
 
#[[Downloading_GCHP|Downloading Source Code]]
 
#[[Obtaining_a_GCHP_Run_Directory|Obtaining a Run Directory]]
 
#[[Obtaining_a_GCHP_Run_Directory|Obtaining a Run Directory]]

Revision as of 20:28, 2 August 2018

Previous | Next | Getting Started with GCHP

  1. Hardware and Software Requirements
  2. Downloading Source Code
  3. Obtaining a Run Directory
  4. Setting Up the GCHP Environment
  5. Compiling
  6. Basic Example Run
  7. Run Configuration Files
  8. Advanced Run Examples
  9. Output Data
  10. Developing GCHP


Please note that this page is under construction as the run set up process has been greatly simplified between versions v11-02b and v11-02c. Please refer to the Tutorial slides for v11-02c HP to see how to configure a GCHP run using the runConfig.sh bash script in v11-02c. Contact the GEOS-Chem Support Team with questions about advanced run setup.

Overview

The GCHP run directory is set up by default so that you can do a simple 1-hr standard simulation at c24 resolution using 2x25 input meteorology, 6 cores, and the MVAPICH2 implementation of MPI. Output is configured to include NetCDF files for the internal c24 resolution as well c24 regridded to 4x5. Configuring GCHP to run using different settings is a matter of editing the GCHP configuration files described in the previous section of this user guide. This page provides several examples on how to do this, including how to:

  • change meteorology lat/lon grid resolution
  • change internal cubed sphere grid resolution
  • increase the number of cores on one node
  • run GCHP on multiple nodes
  • use a non-default restart file
  • update list of species

Please make sure you review and understand all topics in the earlier GCHP Basic Example Run page prior to proceeding. Run execution details such as the pre-run checklist and run directory tools to execute GCHP are not covered on this page. Since you will be editing configuration files, you may find it useful to put your run directory under version control or keep a clean copy of a run directory available for easy reference of default values.

Some of the cases on this page are deliberately light on setup information in order for us to see how easy or difficult users find modifying the GCHP run and code directories. If you succeed in running any of these cases (or, more importantly, if you find that you can’t), please e-mail contact us with details. The more detail the better, but please include at least the following:

  1. A description of what you are trying to do
  2. All of your configuration (*.rc) and log (*.log) files
  3. What system you are running on
  4. Your .bashrc file
  5. A snapshot of your currently loaded libraries

Documentation of advanced GCHP usage is a work in progress. Please be patient with us as out documentation catches up with GCHP code development! We greatly encourage you to help us by signing up for your own GEOS-Chem wiki account and contributing your findings to the GCHP Encountered Issues page.

Case 1: Using multiple nodes

Run a simulation using C24 resolution and increase the number of CPUs to 12 spread evenly across 2 nodes (e.g. C = 12, M = 2). Note that GCHP requires that cores are always distributed evenly across nodes.

You will need to make the following updates (highlighted in GREEN) to the core layout:

File Changes for core layout CxM
GCHP.rc NX = M
NY = C/M
HISTORY.rc CoresPerNode = C/M

For the specific case of C24 using 12 CPUs on 2 nodes, you should have:

File Changes for core layout CxM
GCHP.rc NX = 2
NY = 6
HISTORY.rc CoresPerNode = 6

Once you have made these changes, then you can proceed to run the code. We have included instructions for both SLURM and GridEngine.

1. Running GCHP in an Odyssey interactive session (using SLURM with MVAPICH2 MPI)

You can run GCHP from within an interactive SLURM session by typing the following command:

srun -n 12 -N 2 --mpi=pmi2 ./geos 2>&1 | tee GCHP.log

IMPORTANT! You must specify the total number of CPUs (across all nodes) with -n. So the proper setting is -n 12 (= 6 CPUs/node * 2 nodes). An easy mistake is to specify -n 6 (i.e. the number of CPUs/node), but doing so will cause the run to terminate with an error.

2. Running GCHP in an Odyssey batch session (using SLURM with MVAPICH2 MPI)

You can also use the GCHP_slurm.run script to run a GCHP job in a computational queue. First make sure that your GCHP_slurm.run script contains the following lines of code:

#SBATCH -n 12
#SBATCH -N 2

... further down in the script ...
 
# Run GCHP. Make sure the # of cpus match those above!!
time -p srun -n $SLURM_NTASKS -N $SLURM_NNODES --mpi=pmi2 ./geos >> $log

And then type:

sbatch GCHP_slurm.run

SLURM will set the environment variable $SLURM_NTASKS to the same value specified with the #SBATCH -n (in this case, 12). SLURM will also set $SLURM_NNODES to the value specified with #SBATCH -N (in this case, 2). This will allow you to only set the total number of CPUs and number of nodes in one place (at the top of the script, in the #SBATCH section), and to have those values propagate down to the srun command.

3. Running GCHP in a Glooscap interactive session (using GridEngine with OpenMPI)

Use mpirun to submit the job as normal:

mpirun -n 12 ./geos 2>&1 | tee GCHP.log

4. Running GCHP in a Glooscap batch session (using GridEngine with OpenMPI)

Make sure your GCHP_gridengine.run script contains these lines of code:

#$ -pe ompi* 12
 
... further down in the script ...

mpirun -n 12 ./geos 2>&1 | tee GCHP.log

Here, ompi* indicates the queue that is set up specifically for runs using OpenMP

Then type:

qsub GCHP_gridengine.run

Case 2: Changing internal resolution

NOTE: To do this example, you will need to obtain a GCHP restart file on the C48 grid.

Run a simulation using C48 resolution (cube face side length N = 48 grid cells, ~2° x 2.5°). Use of the same number CPUs as in the previous example on two nodes (e.g. C = 12, M = 2).

You will need to make the following updates (highlighted in GREEN) to change of resolution and number of CPUs:

File Changes for grid resolution CN
GCHP.rc IM = N
JM = 6N
GRIDNAME = PENx6N-CF
fvcore_layout.rc npx = N
npy = N

For the specific case of C48 using 12 CPUs on 2 node, you should have:

File Changes for grid resolution CN
GCHP.rc IM = 48
JM = 288
GRIDNAME = PE48x288-CF
fvcore_layout.rc npx = 48
npy = 48

A note regarding NX and NY: NX and NY specify the domain decomposition; that is, how the surface of the cubed sphere will be split up between the cores. NX corresponds to the number of processors to use per N cells in the X direction, where N is the cube side length. NY corresponds to the number of processors per N cells in the Y direction, but must also include an additional factor of 6, corresponding to the number of cube faces. Therefore any multiple of 6 is a valid value for NY, and the only other rigid constraint is that (NX*NY) = NP, where NP is the total number of processors assigned to the job. However, if possible, specifying NX = NY/6 will provide an optimal distribution of cores as it minimizes the amount of communication required. The number of cores requested should therefore ideally be 6*C*C, where C is an integer factor of N. For example, C=4 would give:

  • NX = C = 4
  • NY = 6*C = 24
  • NP = 6*C*C = 64

This layout would be valid for any simulation where N is a multiple of 4. The absolute minimum case, C=1, provides NX=1, NY=6 and NP=6.

You can follow the same procedure outlined in Case 1 from above to submit the job, summarized here below:

Type of session Follow this procedure
SLURM interactive Type
srun -n 12 -N 2 ./geos 2>&1 | tee GCHP.log
SLURM batch Make sure that GCHP_slurm.run contains
#SBATCH -n 12
#SBATCH -N 2

Then type

sbatch GCHP_slurm.run
Grid Engine interactive Type
mpirun -n 12 ./geos 2>&1 | tee GCHP.log
Grid Engine batch Make sure that GCHP_gridengine.run contains
#$ -pe ompi* 12

Then type

qsub GCHP_gridengine.run

Case 3: Changing your restart file

Run GCHP once for at least ten days in any chemically-active configuration, generate a restart file, and run GCHP again from that restart file.

Copy a new restart file to your run directory and change the restart and checkpoint file entries in GCHP.rc. For example:

GIGCchem_INTERNAL_RESTART_FILE: +gcchem_internal_checkpoint_c24.nc
GIGCchem_INTERNAL_CHECKPOINT_FILE: gcchem_internal_checkpoint_c24.nc

The + means that any missing values will be ignored rather than causing the simulation to fail. Note that the restart file has no date or time markers and will be overwritten at the end of the run, so make sure to back your restart files up if you wish to reuse them!

NOTE: You do not need to change the name of the restart file in input.geos. That entry is ignored and the settings in GCHP.rc are used instead.

If you would like try running GCHP at a different simulation you will need to change your restart file to the proper resolution. Seb Eastham (Harvard) created a Fortran tool to regrid restart files to cubed-sphere. See https://bitbucket.org/sdeastham/csregridtool for the source code and contact him with questions.

Case 4: Updating species list

For this exercise, we will add a new, passive tracer to GCHP. We will also generate a restart file for this species using pre-existing data. To do this, you will need to:

1. Make sure NCO is available on your system. To load on Odyssey, type

       module load nco

2. Remove all tracers in input.geos and replace them with one tracer called PASV. Set the listed number of species to 1

3. Using NCO, create a restart file for species PASV (called 'SPC_PASV'). We can 'spoof' this with the following steps

First, create a netcdf file with one tracer in it, preferably copied from another file. Here, we will use the existing restart file. Type

       ncks -v SPC_O3 initial_GEOSChem_rst.c24_standard.nc Dummy.nc

Then, rename the variable from SPC_O3 to SPC_PASV

       ncrename -v SPC_O3,SPC_PASV Dummy.nc

4. Disable chemistry, deposition and emissions in input.geos

5. Remove all tracer outputs in HISTORY.rc and replace them with one output giving SPC_PASV

Case 5: Running a high-resolution simulation

Case 3 above showed us how to run a coarse simulation with high-resolution input met data. We will build off of that by using the high-resolution input met data and running at C180 resolution (N = 180, ~0.5° x 0.625°). In this example, we will use 32 CPUs on 2 nodes. We will also change the timestep to Δt = 300 here. See the GCHP FAQ for a table of cubed-sphere resolutions and recommended timestep settings.

NOTE: Cubed-sphere meteorological data products are not yet available from GMAO, but are expected in the future.

You will need to make the following updates (highlighted in GREEN) to the resolution and timestep.

File Changes for grid resolution CN Changes for timestep Δt Changes for core layout CxM
GCHP.rc IM = N
JM = 6N
GRIDNAME = PENx6N-CF
HEARTBEAT_DT = Δt
GIGCchem_DT=2*Δt
*_DT = Δt
NX = M
NY = C/M
CAP.rc HEARTBEAT_DT = Δt
fvcore_layout.rc npx = N
npy = N
dt = Δt
HISTORY.rc CoresPerNode = C/M
input.geos Transport Timestep [min] = Δt/60
Convect Timestep [min] = Δt/60
Chemistry Timestep [min] = 2*Δt/60
Emiss Timestep [min] = 2*Δt/60

For the specific case of C180 with a timestep of 300 s using 32 CPUs on 2 nodes, you should have:

File Changes for grid resolution CN Changes for timestep Δt Changes for core layout CxM
GCHP.rc IM = 180
JM = 1080
GRIDNAME = PE180x1080-CF
HEARTBEAT_DT = 300
GIGCchem_DT=600
*_DT = 300
NX = 2
NY = 16
CAP.rc HEARTBEAT_DT = 300
fvcore_layout.rc npx = 180
npy = 180
dt = 300
HISTORY.rc CoresPerNode = 16
input.geos Transport Timestep [min] = 5
Convect Timestep [min] = 5
Chemistry Timestep [min] = 10
Emiss Timestep [min] = 10

You must also change the "forecast time" of the variables SPHU2, TMPU2 and PS2 in ExtData.rc. The forecast time is the number (formatted as HHMMSS) specified immediately after the semicolon in each of the relevant ExtData.rc entries. For a timestep of 5 minutes, this would mean changing the number after the semicolon to be 000500 (0 hours, 5 minutes, 0 seconds). Not doing this will result in the simulation not conserving mass during transport!

Submit the job a batch script with the following settings:

  1. SBATCH -n 32
  2. SBATCH -N 2

... further down in the script ...

  1. Run GCHP. Make sure the # of cpus match those above!!

time -p srun -n $SLURM_NTASKS -N $SLURM_NNODES --mpi=pmi2 ./geos >> $log


Previous | Next | GCHP Home