Difference between revisions of "Running GCHP: Configuration"

From Geos-chem
Jump to: navigation, search
(Created page with "Return to GCHP Main Page == GCHP Run Directory == Download a GCHP run directory following GEOS-Chem_HP_Dev_Kit#Obtaining_a_GCHP_Run_Directory|these instru...")
 
Line 1: Line 1:
[[GEOS-Chem_HP|Return to GCHP Main Page]]
+
[[GEOS-Chem_HP|Go to GCHP Main Page]]
  
 
== GCHP Run Directory ==
 
== GCHP Run Directory ==
Line 720: Line 720:
 
If you find that you can run GCHP with these modifications, all that remains is to obtain a non-zero restart file. This will be available soon - if you reach this point, please contact [[User:Lizzie Lundgren|Lizzie Lundgren]].
 
If you find that you can run GCHP with these modifications, all that remains is to obtain a non-zero restart file. This will be available soon - if you reach this point, please contact [[User:Lizzie Lundgren|Lizzie Lundgren]].
  
[[GEOS-Chem_HP|Return to GCHP Main Page]]
+
[[GEOS-Chem_HP|Go to GCHP Main Page]]

Revision as of 17:05, 20 April 2017

Go to GCHP Main Page

GCHP Run Directory

Download a GCHP run directory following these instructions on the GEOS-Chem HP Dev Kit wiki page. The run directory will include the following files:

Resource Configuration (.rc) Files:

  1. CAP.rc
  2. GCHP.rc
  3. HISTORY.rc
  4. ExtData.rc
  5. HEMCO_Config.rc
  6. fvcore_layout.rc

Other Configuration Files:

  1. input.geos
  2. ExtData.Discover

Restart files:

  1. geoschemchem_internal_rst.nc
  2. cap_restart

GEOS-Chem ASCII data files:

  1. chemga.dat
  2. dust.dat
  3. FJX_j2j.dat
  4. FJX_spec.dat
  5. globchem.dat
  6. jv_spec_aod.dat (used in v10-01 public?)
  7. jv_spec_mie.dat
  8. mglob.dat
  9. org.dat
  10. so4.dat
  11. soot.dat
  12. ssa.dat
  13. ssc.dat
  14. ratj.d (used in v10-01 public?)

Other files:

  1. input.nml
  2. ExtData

Resource Configuration Files

Figure 1. A basic hierarchy of Gridded Components in the GCHP.

GCHP is controlled using a set of Resource Configuration (RC) files.
These files contain information required by GCHP including hardware allocation, initialization and runtime parameters, locations of necessary files, and where, what and how to read and write data to and from GCHP.

  • CAP.rc
  • GCHP.rc
  • HISTORY.rc
  • fvcore_layout.rc
  • ExtData.rc

While, within GCHP, GEOS-Chem does still rely upon settings controlled by 'input.geos' and 'HEMCO_Config.rc', several of the entries are superseded by *.rc files. These entries will eventually be highlighted in red below.

CAP.rc

CAP.rc is the configuration file for the top-level "Cap" (in ESMF lingo) component. It handles general runtime settings for GCHP including time and data parameters, performance profiling routines, and system-wide timestep (hearbeat). Combined with cap_restart, this determines the exact dates for the next run of GCHP. Any additional parameters in CAP.rc that are not included in the below list are artifacts of previous versions of GCHP and may be ignored.

Parameter Description
MAPLROOT_COMPNAME The default setting is GCHP.
ROOT_NAME Sets the child name that MAPL will use to initialize the ROOT component within CAP. It uses this name in all operations when querying and interacting with ROOT. The default setting is GCHP.
ROOT_CF Resource configuration file for the ROOT component. The default setting is GCHP.rc.
HIST_CF MAPL's HISTORY component resource configuration file. The default setting is HISTORY.rc.
BEG_DATE Begin date for the range of the experiment, in format YYYYMMDD HHMMSS.
Set this to your simulation start date.
END_DATE End date for the range of the experiment, in format YYYYMMDD HHMMSS.
Set this to your simulation end date.
JOB_SGMT Duration of each individual run, in format YYYYMMDD HHMMSS. Should be shorter than or equal to the span of BEG_DATE and END_DATE.
Update this based on BEG_DATE and END_DATE for your simulation.
HEARTBEAT_DT The 'ticking' rate of the ESMF/MAPL internal clock, in seconds. At every tick of the heartbeat, ESMF queries all components to see if anything needs to be calculated based upon the components' defined timesteps. Default is 1800.
Set this to the minimum timestep ("*_DT") set in GCHP.rc
MAPL_ENABLE_TIMERS Enables output of MAPL's run-profile timers. Default is 'YES'.
MAPL_ENABLE_MEMUTILS Enables runtime output of the programs' memory usage. Default is 'YES'.br>
PRINTSPEC If enabled, runs the model until all components are initialized. Then, dumps all the names of the members of the internal import/export states and ends the model. Setting options include: 0 (default): Off, 1: Import & Export, 2: Import, and 3: Export.

GCHP.rc

GCHP.rc is resource configuration file for the ROOT component within GCHP. It controls, at a minimum, global model resolution, grid type, parallel sub-domain size, component time steps, and job restart parameters.

Parameter Description
NX This is the number of grid boxes in an MPI sub-domain for the X-dimension.
NX * NY must be equal to the number of CPUs you are using.
NY This is the number of grid boxes in an MPI sub-domain for the Y-dimension.
NY must be a multiple of 6 (since a cube has 6 sides).
IM The number of longitudinal (or X) grid-points in the global domain (depending upon grid type).
Update this based on your intended grid resolution using this general mapping guideline, and make sure it is consistent with timestep settings (see RUN_DT below):
24 ~ 4x5; 48 ~ 2x2.5; 96 ~ 1x1.25; 192 ~ 1/2 degree; 384 ~ 1/4 degree.
JM The number of latitudinal (or Y) grid-points in the global domain (depending upon grid type). For the cubed sphere, JM must meet the condition that JM = IM * 6 (since a cube has six sides).
Update this to equal the value IM*6.
LM The number of gridpoints in the vertical. Must be equal to the vertical resolution of the offline meteorological fields, since MAPL cannot currently regrid vertically. The default value is 72.
GRIDNAME This should be formatted very carefully to be consistent with settings that came before. The first two characters determine how the pole is treated (default is PE). Next is the value of IM, an x, then the value of JM. This is followed by a -, and then a two-character code that determines Cubed Sphere Grid (CF) or Lat/Lon (). Many options here are defined in MAPL_Generic.
Update this to reflect your settings for IM and JM.
GOESChem_CTM A toggle that tells FVDycore that it is operating as a transport model rather than a prognostic model if set to 1.
AdvCore_Advection Default is 1.
DYCORE Should either be set to OFF (default) or ON. This value does nothing, but MAPL will crash if it is not declared.
SOLAR_DT The timestep in seconds that the SOLAR Component should be called. Default is 1800.
Must be a multiple of HEARTBEAT_DT in CAP.rc
IRRAD_DT The timestep in seconds that the IRRAD Component should be called. ESMF checks this value during its timestep check. Default is 1800.
Must be a multiple of HEARTBEAT_DT in CAP.rc
RUN_DT The timestep in seconds that the RUN Component should be called. ESMF checks this value during its timestep check. The recommended timestep [s] per grid resolution (see IM) are: 1800 ~ 4x5; 900 ~ 2x2.5; 600 ~ 1x1.25; 600 ~ 1/2 degree; 300 ~ 1/4 degree.
Must be a multiple of HEARTBEAT_DT in CAP.rc
GCHPchem_DT The timestep in seconds that the GCHPchem Component should be called. ESMF checks this value during its timestep check.
Must be a multiple of HEARTBEAT_DT in CAP.rc
DYNAMICS_DT The timestep in seconds that the DYNAMICS Component should be called. ESMF checks this value during its timestep check.
Must be a multiple of HEARTBEAT_DT in CAP.rc
SOLARAvrg Default is 0.
IRRADAvrg Default is 0.
PRINTRC Specifies which resource values to print. Options include 0: Non-Default Values, and 1: All Values. Default setting is 0.
PARALLEL_READFORCING Enables or disables parallel I/O processes when writing the restart files. Default value is 0 (disabled).
NUM_READERS Number of simultaneous readers. Should divide evenly unto NY. Default value is 1.
NUM_WRITERS Number of simultaneous writers. Should divide evenly unto NY. Default value is 1.
BKG_FREQUENCY Active observer when desired. Default value is 0.
GIGCchem_INTERNAL_RESTART_FILE The filename of the Internal Restart File to be written.
GIGCchem_INTERNAL_RESTART_TYPE The format of the Internal Restart File. Valid types include pbinary and pnc4.
GIGCchem_INTERNAL_CHECKPOINT_FILE The filename of the Internal Checkpoint File to be written.
GIGCchem_INTERNAL_CHECKPOINT_TYPE The format of the Internal Checkstart File. Valid types include pbinary and pnc4.
GIGCchem_INTERNAL_HEADER Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
DYN_INTERNAL_RESTART_FILE The filename of the DYNAMICS Internal Restart File to be written.
DYN_INTERNAL_RESTART_TYPE The format of the DYNAMICS Internal Restart File. Valid types include pbinary and pnc4.
DYN_INTERNAL_CHECKPOINT_FILE The filename of the DYNAMICS Internal Checkpoint File to be written.
DYN_INTERNAL_CHECKPOINT_TYPE The format of the DYNAMICS Internal Checkpoint File. Valid types include pbinary and pnc4.
DYN_INTERNAL_HEADER Only needed when the file type is set to pbinary. Specifies if a binary file is self-describing.
MAX_DIAG For reading the input.geos file. Default value is 80.
MAX_TRCS For reading the input.geos file. Default value is 130.
MAX_MEMB For reading the input.geos file. Default value is 15.
MAX_FAMS For reading the input.geos file. Default value is 20.
MAX_DEP For reading the input.geos file. Default value is 130.
LINOZ_NFIELDS For reading the input.geos file. Default value is 7.
LINOZ_NLAT For reading the input.geos file. Default value is 18.
LINOZ_NLEVELS For reading the input.geos file. Default value is 25.
LINOZ_NMONTHS For reading the input.geos file. Default value is 12.
RUN_PHASES For reading the input.geos file. Default value is 1.
HEMCO_CONFIG HEMCO configuration file. Default is HEMCO_Config.rc. (supercedes what is in input.geos?)
STDOUT_LOGFILE Log filename template. Default is PET%%%%%.GEOSCHEMchem.log.
STDOUT_LOGLUN Logical unit number for stdout. Default value is 6.
INIT_ZERO Default is 0.

HISTORY.rc

Parameter Description
EXPID The beginning of where output files can be stored. Includes directory structure and beginning of filename.
EXPDSC The Export Description. If one does ncdump -h, the attribute "Title" will be the set value of EXPDSC.
CoresPerNode Should match NX in GCHP.rc.
COLLECTIONS: 'center' This defines the collection 'center', which is in further detail in the next several parameters. This collection defines the format of the output files, as well as what gets written to them.
center.template This will go after the COLLECTIONS value in output file names, prefixed by a dot. Should include the file extension. Follows GrADS convensions.
center.archive Still used?
center.format Character string defining the file format. Options include "flat" (the default), "CFIO", or "CFIOasync".
center.resolution Defines the output resolution, IX, IY, in lat-lon. Output can be at a different resolution than the model run. Defaults to the same as the model. This means if the model is running at cube-sphere, it will by default output a cubed sphere grid unless specified otherwise.
center.frequency Is the time frequency of output, HHMMSS. 010000 will output data every hour. Defaults to 060000.
center.mode Character string definining what is written out; either "instantaneous" or "time-averaged" data.
NOTE: Does not appear in downloaded run directory HISTORY.rc
center.acc_interval Is needed when mode is set to "time-averaged". Defines the acculation interval of the time average. Must be less than or equal to frequency.
NOTE: Does not appear in downloaded run directory HISTORY.rc
center.subset Can be used to declare only outputting a subset of data for output. Values include lonMin, lonMax, latMin, and latMax.
NOTE: Does not appear in downloaded run directory HISTORY.rc
center.fields Paired character strings for diagnostic names and its associated gridded component to be written out.

fvcore_layout.rc

Parameter Description
npx The default setting is 22.
Change this to match IM in GCHP.rc.
npy The default setting is 22.
Change this to match IM in GCHP.rc.
npz The default setting is 72.
Change this to match LM in GCHP.rc.
dt The default setting is 1800 (4x5 resolution). Change this to match HEARTBEAT_DT in CAP.rc (or RUN_DT in GCHP.rc?)
n_sponge The default setting is -1.
ADIABATIC The default setting is true.
hydrostatic The default setting is true.
nord The default setting is 0.
d2_bg The default setting is 0.0075.
d4_bg The default setting is 0.0.
dddmp The default setting is 0.2.
ksplit The default setting is 1.
nsplit The default setting is 0.
msplit The default setting is 0.
hord_mt The default setting is 10.
hord_vt The default setting is 10.
hord_tm The default setting is 10.
hord_dp The default setting is 13.
hord_tr The default setting is 13.
kord_tm The default setting is -9.
kord_mt The default setting is 9.
kord_wz The default setting is 9.
kord_tr The default setting is 9.
FV_OFF The default setting is false.
fv_debug The default setting is false.
inline_q The default setting is false.
z_tracer The default setting is false.
chk_mass The default setting is false.

ExtData.rc

ExtData.rc contains emissions information for use with MAPL and ESMF. The information it contains overlaps with GEOS-Chem configuration file HEMCO_Config.rc. However, during a GCHP run, emissions files are found using the paths in ExtData.rc rather than in HEMCO_Config.rc; the file paths in HEMCO_Config.rc are ignored. Unlike the HEMCO_Config.rc file, the ExtData.rc file includes GMAO Meteorological data.

For each primary export (MET-field), 2D variable at the edge (e.g. VEGFRAC), and emission, ExtData.rc includes the following information in a space-delimited single row.

Info Name Description
Export Name Name of emissions in HEMCO, e.g. TROPP
Units Unit string nested within single quotes. '1' indicates there is no unit conversion from the native units in the netcdf file.
Dimension 2D is xy and 3D is xyz.
V Loc Possible values include C and E.
Clim Logical flag
Refresh Time Template Possible values include -, 0, 0:0
Offset Factor Data offset factor, e.g. 0.0 for no offset.
Scale Factor Data scale factor, e.g. 1.0 for no scaling.
External File Variable Variable name in the netcdf data file, e.g. TROPPT in MET-fields
External File Path Path to the netcdf data file. If not using the data, specify /dev/null to reduce processing time.

Information for masks and derived exports may also be included using the format specified in the headers within the file. --Lizzie Lundgren (talk) 22:31, 10 December 2015 (UTC)

GCHP Restart Files

This section is in development

The ExtData Gridded Component

This section is in development

Offline Data Requirements

This section is in development

Working Meteorological Fields

This section is in development

Controlling Output with MAPL History

This section is in development

Changing Resolutions

This section is in development

Changing Resolutions

Examples

Before running GCHP, check that the following is true:

  • Compilation was successful and you have a geos executable in your run directory
  • ExtData.rc contains met-field paths corresponding to your intended input resolution
  • You have all required symbolic links in your run directory: ChemDataDir, CodeDir, MainDataDir, MetDir, TileFiles, and your restart file.
  • All libraries and modules required by GCHP are loaded

Quick start: 1-hr standard simulation

The default GCHP run directory is set for a 1-hour tropchem simulation at resolution c24 with 2x25 input Met resolution and 4x5 output concentration resolution. c24 is approximately the cubed sphere equivalent of 4x5. If you followed the above instructions for setting up your run directory, you should have specified 2x2.5 input resolution for meteorology fields for your initial devkit run and edited ExtData.rc to include paths that reflect this resolution. For Odyssey users, the ExtData.rc update occurred automatically while for all non-Odyssey users this required manual editing.

For this quick test, you will need an environment with the following:

  • 6 CPUs (minimum - see model description)
  • 1 node
  • At least 2500 MB of memory per CPU
  • Your compiler, NetCDF and MPI implementation loaded

Odyssey users should refer to the Harvard Odyssey Users Environment Setup section of this page for a refresher on how to set up your environment prior to running GCHP.

WARNING! It appears that GFED might not be handled correctly by GCHP. We recommend that users disable fire emissions (GFED) in HEMCO_Config.rc for now. You will also need to disable the BOND emissions dataset as this interacts with the GFED dataset.

Once your run directory is all set up, start the simulation by typing the following:

mpirun -n 6 ./geos 2>&1 | tee 1hr_mapl.log

This command can be broken down as follows:

  • mpirun executes an MPI-enabled executable and associates the necessary resources. This is a version-dependent executable; some MPI implementations use other commands, such as mpiexec.
  • -n 6 specifies how many individual CPU cores are requested for the run. The number given here should always be the total number of cores, regardless of how many nodes they are spread over, and must be a multiple of 6 (at least one for each of the cubed sphere faces, and the same number for each face).
  • ./geos is the local GEOS-Chem executable, as with GEOS-Chem Classic
  • 2>&1 | tee 1hr_maple.log is a bash shell-specific means of collecting all MAPL output (standard and error) that is written to the screen into a file.

Note that the output collected to the log file specified when invoking mpirun is created by MAPL. The more traditional GEOS-Chem log output is automatically sent to a file defined in configuration file GCHP.rc. By default, its format is PET%%%%%.GEOSCHEMchem.log, where %%%%% is replaced during run-time with a processor id (typically 00000. PET stands for persistent execution thread. Unlike MAPL, which sends output to the log from ALL threads, GEOS-Chem only outputs from a single thread.

Once the simulation is complete, there should be two netcdf output files in the OutputDir sub-directory. To get started with manipulating output data, see GCHP output data.

Basic test cases

The next step is to try running some things of your own! These cases will require minor modifications to various elements of GCHP’s run directory, and should help to familiarize you with how exactly GCHP works. If you run into problems, please e-mail Lizzie Lundgren (elundgren@seas.harvard.edu).

Basic case 1: Changing resolution and moving to multiple nodes

Re-run the validation case at C48 (cube face side length N = 48 grid cells) resolution, with a shorter timestep (Δt = 600 s). This time, use a larger number of CPUs; say 12 cores, spread evenly across 2 nodes (C = 12, M = 2). Note that GCHP requires that cores are always distributed evenly across nodes. You will then need to change the following files to complete the change of resolution, timestep and core layout:

File (.rc) Changes for grid resolution CN Changes for timestep Δt Changes for core layout CxM
GCHP IM = N
JM = 6N
GRIDNAME=PENx6N-CF
HEARTBEAT_DT=Δt
*_DT=Δt
NX=M
NY=C/M
CAP HEARTBEAT_DT=Δt
fvcore_layout npx=N

npy=N

dt=Δt
HISTORY CoresPerNode=C/M

For the specific case of C48 with a timestep of 600 s, distributed as shown, using 12 cores, you should have:

File (.rc) Changes for grid resolution CN Changes for timestep Δt Changes for core layout CxM
GCHP IM = 48
JM = 288
GRIDNAME=PE48x288-CF
HEARTBEAT_DT=600
*_DT=600
NX=2
NY=6
CAP HEARTBEAT_DT=600
fvcore_layout npx=48
npy=48
dt=600
HISTORY CoresPerNode=6

Finally, use mpirun as normal (mpirun -n 12 ./geos 2>&1 | tee log).

A note regarding NX and NY: NX and NY specify the domain decomposition; that is, how the surface of the cubed sphere will be split up between the cores. NX corresponds to the number of processors to use per N cells in the X direction, where N is the cube side length. NY corresponds to the number of processors per N cells in the Y direction, but must also include an additional factor of 6, corresponding to the number of cube faces. Therefore any multiple of 6 is a valid value for NY, and the only other rigid constraint is that (NX*NY) = NP, where NP is the total number of processors assigned to the job. However, if possible, specifying NX = NY/6 will provide an optimal distribution of cores as it minimizes the amount of communication required. The number of cores requested should therefore ideally be 6*C*C, where C is an integer factor of N. For example, C=4 would give:

  • NX = C = 4
  • NY = 6*C = 24
  • NP = 6*C*C = 64

This layout would be valid for any simulation where N is a multiple of 4. The absolute minimum case, C=1, provides NX=1, NY=6 and NP=6.

Basic case 2: Running a coarse simulation with high-resolution met data

So far, all the examples have been run using coarse (4x5) meteorological data, for the purposes of getting things started quickly. However, a major feature of GCHP is that it can read native-resolution meteorological data without regridding or preprocessing. To allow tests of this feature, a small archive of native-resolution meteorological data has been archived for 2015-07-01 through to 2015-07-10, and can be found at

/n/seasasfs01/gcgrid/data/GEOS_0.25x0.3125.d

To use this data in place of the standard GEOS-Chem meteorological data, you need to perform two changes. First, you need to change the target of your MetDir link. To remove your existing link, type

 unlink MetDir 

Then, to establish a new link, type (on Odyssey)

 ln -s /n/seasasfs01/gcgrid/data/GEOS_0.25x0.3125.d/GEOS_FP MetDir 

This will establish a link to the native resolution meteorological data. Now, open ExtData.rc and perform a find/replace, changing 2x25.nc to Native.nc for all the meteorological data input files (collected at the top of ExtData.rc). Your GCHP run should now use the higher-resolution meteorological data. Note that this does come at a computational cost; however, this will have significantly reduced the artefacts that are associated with using coarse-resolution meteorological data on a foreign grid.

Advanced test cases

The following cases are deliberately light on setup information, to see how easy or difficult users find modifying the GCHP run and code directories to convince it to do what is needed. If you succeed in running any of these cases (or, possibly more importantly, if you find that you can’t), please e-mail Lizzie Lundgren at elundgren@seas.harvard.edu with details. The more detail the better; but please include at least the following:

  • The test case name (if applicable)
  • The resolution(s) you ran it at
  • Whether the run completed or not

Advanced case 1: Run GCHP with a restart file

Run GCHP once for at least ten days in any chemically-active configuration, generate a restart file, and run GCHP again from that restart file. To help you get started, Odyssey users can find some non-zero restart files at

/n/regal/jacob_lab/seastham/GCHP_Restarts

Copy one of the files to your run directory and change GCHP.rc to read

GIGCchem_INTERNAL_RESTART_FILE: +gcchem_internal_checkpoint_c24.nc
GIGCchem_INTERNAL_CHECKPOINT_FILE: gcchem_internal_checkpoint_c24.nc

The + means that any missing values will be ignored rather than causing the simulation to fail. Note that the restart file has no date or time markers and will be overwritten at the end of the run, so make sure to back your restart files up if you wish to reuse them!

Advanced case 2: GCHP speedrun

Initialize GCHP with a non-zero restart file. Run with 66 tracers but with no processes except advective transport. Reduce run time as much as possible by setting unnecessary filepaths in ExtData.rc to “/dev/null”. This will result in them being set to default values without spending time reading files.

Advanced case 3: Changing tracer count

Add a new, passive tracer to GCHP. To do this, you will need to:

  • Remove all tracers in input.geos and replace them with one tracer called PASV. Set the listed number of tracers to 1
  • Modify the file Chem_Registry.rc in CodeDir/GCHP/Registry so that all tracers (TRC_XYZ) are removed, and replace them with the entry TRC_PASV. NOTE: You must perform a clean compile after changing Chem_Registry.rc or the changes will not take effect!
  • Disable chemistry, deposition and emissions in input.geos
  • Remove all tracer outputs in HISTORY.rc and replace them with one output giving TRC_PASV

If you find that you can run GCHP with these modifications, all that remains is to obtain a non-zero restart file. This will be available soon - if you reach this point, please contact Lizzie Lundgren.

Go to GCHP Main Page