GEOS-Chem in CESM

From Geos-chem
Revision as of 16:37, 17 July 2024 by Lizzie Lundgren (Talk | contribs) (Editing FC2000climo to use different year climatology as present day)

Jump to: navigation, search

GEOS-Chem Main Page

Overview

This page serves as a resource for using GEOS-Chem in the Community Earth System Model (CESM). It is meant to supplement not replace existing CESM documentation and therefore includes a guide to navigating CESM resources maintained by NCAR. The rest of the page focuses on aspects of the model specific to GEOS-Chem and is written primarily to help offline GEOS-Chem users get started. CESM users interested in using GEOS-Chem chemistry may also find it useful but should additionally look at GEOS-Chem documentation listed on the main page of this wiki.

CESM Resources

Model documentation

The best way to successfully use GEOS-Chem chemistry in CESM is to become familiar with CESM documentation and guides maintained by NCAR. The CESM2 Quickstart Guide contains an overview of the model and instructions for downloading, building, and running CESM2. CESM2 is built upon a framework called Common Infrastructure for Modeling the Earth (CIME, pronounced "SEAM") which handles configuring, compiling, testing, and running the model. Offline GEOS-Chem users can think of CIME as run directory management and testing, containing the equivalent of what is found in the 'run' and 'test' directories within GEOS-Chem. Read through the CIME documentation to become familiar with the concepts of CIME and to get detailed instructions for creating run directories, configuring a run, building the source code, and running the model. Community Atmosphere Model (CAM) is the atmospheric component of CESM and includes both GEOS-Chem and HEMCO as subcomponents. The CAM6.3 documentation builds upon information in the CESM Quickstart Guide and the CIME documentation. Read through it to learn about the atmospheric component's configuration options, data inputs, and model outputs. CAM is connected to other components in CESM via a coupler call the Community Mediator for Earth Prediction Systems (CMEPS). You will likely not need to learn about the coupler but a link to documentation for it is provided below if you are interested.

Model websites

NCAR maintains several websites with up-to-date information about CESM. The CAM, HEMCO, and CESM with Chemistry pages are of particular interest to GEOS-Chem users.

Data visualization and analysis

NCAR provides jupyter notebook examples of visualizing and analyzing CESM output data. Examples for atmospheric chemistry modeling with CAM-Chem are relevant for modeling with GEOS-Chem.

Getting help

CESM help requests are handled on a searchable forum maintained by NCAR. Consider registering for a free account if you plan to use CESM.

Glossary

CAM = Community Atmosphere Model

case = unique set of compset and grid

case directory = user-created directory for a case where you can configure, build, and run CESM using CIME tools

CESM = Community Earth System Model

CIME = Common Infrastructure for Modeling the Earth

component = building block of CESM; the model is comprised of seven geophysical components, one external system processing component, and one driver/coupler component

compset = named set of components with default component-specific configurations and namelist settings

SIMA = System for Integrated Modeling of the Atmosphere

One-time setup

Download CESM

NCAR docs: Downloading CESM

To use GEOS-Chem you should checkout CESM tag cesm2_3_alpha17e or later. This version uses CAM tag cam6_3_160 which includes GEOS-Chem v14.1.2. The NCAR CESM documentation will guide you through downloading CESM and populating the externals. "Externals" mean source code repositories that are separate from the CESM source code repository that you download. This is similar to how the GC-Classic initial download is mostly an empty set of directories until you run 'git submodule update --init --recursive' which checks out git submodules. CESM uses a package called manage_externals rather than git submodules and stores information about the externals in local configuration file Externals.cfg (see here for example file). All externals are checked out with command "./manage_externals/checkout_externals". The checkout is recursive, meaning any external that has its own custom externals config file will also have the externals in that file checked out. This is how GEOS-Chem and HEMCO are downloaded. The CAM external includes file Externals_CAM.cfg (example here) which specifies GEOS-Chem, as well as the HEMCO interface (HEMCO_CESM). HEMCO is specified in the HEMCO_CESM externals config file Externals_HCO.cfg (example here).

Configure CIME

Scenario 1: NCAR Derecho HPC cluster

Configuring CIME is not necessary on the NCAR derecho cluster because the default CIME xml files that are included in the source code already include derecho and its libraries as an option. Simply pass derecho as the machine when creating a case as well as your NCAR project number. An example of how to do this is in the creating a case section of this wiki page.

Scenario 2: External HPC cluster

Prerequisites

The paths to these libraries (containing their bin, lib or lib64) is necessary to configure CIME to point to the right locations.

  • Intel or GNU compiler
  • netCDF, netCDF-Fortran
  • MPI
  • ESMF built with PIO support (built-in is OK) and netCDF IO support.

Configuring CIME

Relevant NCAR docs: Defining the machine - CIME documentation

Set up is much easier if previous users of your HPC cluster have configured a prior version of CESM.

CIME configuration for CESM2.3 has slightly changed. In previous versions of CESM, environmental configuration specific to your HPC cluster was in one config_machines.xml file. In CESM2.3 and above, the same information is now split into separate files. The following steps should be performed to add your HPC cluster configuration to CIME, while consulting a prior setup of CESM for relevant entries.

All of the below configuration is under ccs_config/machines.

1. Edit config_machines.xml to add a new line within <NODENAME_REGEX> with <value MACH="name_of_your_cluster">regex pattern here</value> to match the hostname pattern of your HPC system (consult the hostname command).

2. Create a folder with name_of_your_cluster. It may be easier to copy the existing ubuntu-latest folder.

3. Edit config_machines.xml to insert the machine configuration (<machine> tag) from a previous version of CESM. This usually includes MPI configuration and OS information as well as compiler configuration. The actual entries can be very long.

4. Create a file named name_of_your_cluster.cmake under cmake_macros to include any necessary cmake macros to point the compiler to the correct ESMF, netCDF, MPI library location. Refer to userdefined.cmake for some sample definitions. This .cmake file will automatically be used based on the machine name.

Quickstart basics

This guide assumes you have already read the CIME documentation. If you have not already done this then you should start there. Many details are skipped since they are already included in that guide.

Create case

A "case" in CESM is similar to a GEOS-Chem run directory. You navigate to the source code, create a case which creates a local case directory, navigate to the case directory, and then do all of your configuration, build, and run commands there. Unlike in GEOS-Chem, a second case directory is also created elsewhere on your system. This directory will contain a build directory ("bld") and a run directory ("run") and will store most of the build and run files, including output netCDF files. Having this second directory separate from your source code allows you to keep your source code in your run directory and not risk running out of memory when running CESM. You can think of the first directory as the "case control" directory where you change files and execute commands and the second directory as where files are created and stored. Please be aware that the second directory contains copies of configuration files and you should always only configure your case from the control directory instead.

Each case is created from a CESM compset (set configuration of model components) and a specified grid resolution. You can think of a compset as similar to a GEOS-Chem simulation type. Like GEOS-Chem Classic you should create a new case if changing to a different grid resolultion.

There are four CESM compsets that use GEOS-Chem version 14.1.2. In the future this number will be reduced to two per recommendation from NCAR. The four GEOS-Chem compsets currently used in CESM are:

  • FCHIST_GC
  • FCnudged_GC
  • FC2000climo_GC
  • FC2010climo_GC

Once the number of compsets is reduced to two you will still be able to run the original four types of simulations by editing default configuration settings. The FCnudged_GC simulation will be a modified version of FCHIST_GC, and FC2010climo_GC will be a modified version of FC2000climo_GC. For more information about this see the sections on this wiki page for running nudged and climatology runs.

Creating a case is done by python script create_newcase in the cime/scripts directory in the source code. You can navigate to cime/scripts and execute the script here. This will put the case control directory within cime/scripts.

Here is a generic template for using create_newcase, followed by an example for making a new case for FCHIST_GC on the NCAR derecho compute cluster.

 ./create_newcase --case {case_name}        --compset {compset_name} --res f19_f19_mg17 --run-unsupported  --project {project_id} --mach {machine}
 ./create_newcase --case case.FCHIST_GC_f19 --compset FCHIST_GC      --res f19_f19_mg17 --run-unsupported  --project UHAR0022     --mach derecho

The case name is whatever you would like it to be and will be the name of the case directories. Including compset name and grid in the case name is useful if you plan to have multiple cases for different compsets and grid resolutions. The project ID is only needed if you are running on derecho and should be a project you are a collaborator on. The machine name is defined in config_machines.xml.

Setup case

Setting up a case is an intermediate step between creating a case and building the model. It is needed to create additional files and directories that incorporate settings you may change after creating the case. Use the xmlquery and xmlchange tools in the case control directory to review and change settings. Below are examples of viewing and changing run duration to 1 day and number of processors used to 72.

  ./xmlquery  STOP_N
  ./xmlchange STOP_N=1
  ./xmlquery  NTASKS
  ./xmlchange NTASKS=72

Once you are satisfied with your settings use the case.setup script in your case control directory to setup the case.

  ./case.setup

You can also execute case setup to output to both terminal and log. This is useful if you want to see details of case setup later on to better understand it and potentially troubleshoot.

  ./case.setup 2>&1 | tee case.setup.log

Build the model

Building CESM is done by python script case.build in the case directory. Always remember to setup your case prior to building. For reference, here are the basic commands to (1) build, and (2) clean your build.

 ./case.build
 ./case.build --clean

The build artifacts, including logs and install files, are stored in the second case directory in subdirectory bld. To find out where this will be you check XML variable EXEROOT using the CIME too xmlquery.

 ./xmlquery EXEROOT

Each component in CESM has a different build log and logs from older builds are kept and not over-written. Each log contains the date and time of the build so that you can easily see which is the most recent. If a build is successful then the logs are compressed and given suffix .gz. If your build fails then a message will be printed to the screen with the path to the relevant log. This message will also be printed to CaseStatus for future reference.

Any GEOS-Chem build errors will appear in the log that starts with atm. Like in GEOS-Chem the build error is typically self-explanatory from the build messages. However, if you later run into a cryptic run error such as a segmentation fault then you should rebuild the model using DEBUG=TRUE.

 ./xmlquery  DEBUG # check the current setting to ensure proper syntax when changing
 ./xmlchange DEBUG=TRUE

The DEBUG setting is stored in XML file env_build.xml. This requires cleaning your build prior to rebuild. Do not worry about remembering this. CIME has built-in reminders about needing to clean your build based on what it detects as changed in your settings. For the case of DEBUG, if you try to run case.build again you will get a message like this:

Building case in directory /glade/u/home/elundgren/code.cam_158/cime/scripts/case.FCHIST_GC_14.1.2
sharedlib_only is False
model_only is False
File /glade/u/home/elundgren/code.cam_158/cime/scripts/case.FCHIST_GC_14.1.2/LockedFiles/env_build.xml has been modified
  found difference in DEBUG : case True locked False
Setting build complete to False
ERROR: 
ERROR env_build HAS CHANGED
  A manual clean of your obj directories is required
  You should execute the following:
    ./case.build --clean-all

Be sure to change DEBUG back to FALSE when you are done debugging. Otherwise the model will run very slowly.

Download input data

Which input data you need is dependent on case. When you build the model CIME will run tools to assess what data you need and whether you already have it on your system. An input data list file is saved to case control subdirectory Buildconf for each CESM component. GEOS-Chem inputs will be listed in the file called cam.input_data_list. If any files are missing then you can download it using the script check_input_data in the case control directory, passing it the download option.

 ./check_input_data --download

Data is expected to be at the path defined in XML variable DIN_LOC_ROOT. You can find what this path is with CIME tool xmlquery.

 ./xmlquery DIN_LOC_ROOT

You will likely only need to download input data if not working on the NCAR derecho cluster. The derecho cluster has an inventory of GEOS-Chem input data already present though you may need additional emission inventories or years if you plan to run with non-default emissions in GEOS-Chem or outside the typical date range.

See also the following docs pages for more about downloading input data:

Run the model

Use script case.submit to run the model. It will submit the run as a batch job if you have a job scheduler on your machine. Otherwise it will submit as an interactive run.

 ./case.submit

See the CIME documentation for running a case for additional information.

Output data and logs

CaseStatus

Every case control directory will contain log file CaseStatus. This log will activity within case including calls to xmlchange, case.setup, case.build, and case.submit, plus certain sub-calls within them such as case.run and st_archive which archives output data. Start and end times per call are documented as well as whether it was successful. Use this log to check on run success.

CaseStatus tracks everything since the case's creation and thus will document multiple runs if done. Job scheduler IDs are also included to help track the timeline of changes per run. Below is an example of a CaseStatus log.

2024-04-23 09:37:17: xmlchange success <command> ./xmlchange STOP_N=1  </command>
---------------------------------------------------
2024-04-23 09:37:21: xmlchange success <command> ./xmlchange NTASKS=72  </command>
---------------------------------------------------
2024-04-23 09:38:41: case.setup starting
---------------------------------------------------
2024-04-23 09:38:43: case.setup success
---------------------------------------------------
2024-04-23 09:39:15: case.build starting
---------------------------------------------------
2024-04-23 09:53:18: case.build success
---------------------------------------------------
2024-04-23 09:53:47: case.submit starting 4233928.desched1
---------------------------------------------------
2024-04-23 09:53:47: case.submit success 4233928.desched1
---------------------------------------------------
2024-04-23 09:53:56: case.run starting 4233927.desched1
---------------------------------------------------
2024-04-23 09:54:03: model execution starting 4233927.desched1
---------------------------------------------------
2024-04-23 10:01:17: model execution success 4233927.desched1
---------------------------------------------------
2024-04-23 10:01:17: case.run success 4233927.desched1
---------------------------------------------------
2024-04-23 10:01:35: st_archive starting 4233928.desched1
---------------------------------------------------
2024-04-23 10:01:44: st_archive success 4233928.desched1
---------------------------------------------------

replay.sh

Bash script replay.sh in the case control directory is a companion to CaseStatus. It stores the shell commands of the activity in the case. Use this file to save and reproduce what you did. Here is an example of the replay.sh that was generated in the same case that generated CaseStatus above.

#!/bin/bash                                                                                                                                          

set -e

# Created 2024-04-23 09:34:51                                                                                                                        
CASEDIR="/glade/u/home/elundgren/code.cam_158/cime/scripts/case.FCHIST_GC_14.1.2"
/glade/u/home/elundgren/code.cam_158/cime/scripts/create_newcase --case case.FCHIST_GC_14.1.2 --compset FCHIST_GC --res f19_f19_mg17 --run-unsupported --project UHAR0022 --mach derecho
cd "${CASEDIR}"
./xmlchange STOP_N=1
./xmlchange NTASKS=72
./case.setup
./case.build
./case.submit

Tips and tricks

Running a nudged simulation

Editing FC2000climo_GC to use different year climatology as present day

There are two GEOS-Chem compsets that use climatology as present day: FC2000climo_GC and FC2010climo_GC. In the future this will be reduced to simply FC2000climo_GC. Once you create this compset you can change the year manually within the case directory by adding entries to file user_cam_nl. This will update namelist settings in case directory file Buildconf/camconf/atm_in used for the run. The following settings are used for 2000 and 2010 on derecho:

2000:

solar_irrad_data_file='/glade/campaign/cesm/cesmdata/inputdata/atm/cam/solar/SolarForcing1995-2005avg_c160929.nc'
solar_data_ymd='20000101'
flbc_cycle_yr='2000'
flbc_file='/glade/campaign/cesm/cesmdata/inputdata/atm/waccm/lb/LBC_2000climo_CMIP6_0p5degLat_c180227.nc'

2010:

solar_irrad_data_file='/glade/campaign/cesm/cesmdata/inputdata/atm/cam/solar/SolarForcing2006-2014avg_c180917.nc'
solar_data_ymd='20100101'
flbc_cycle_yr='2010'
flbc_file='/glade/campaign/cesm/cesmdata/inputdata/atm/waccm/lb/LBC_2010climo_CMIP6_0p5degLat_c180227.nc'

Navigating case XML files

Many CESM settings are stored in XML files in the case control directory. Viewing and changing these settings can be tricky if you are not familiar with CIME tools xmlquery and xmlchange. While the two files contain comments with instructions on use we also put together a guide for this. See Guide to CAM Namelists and Case Tools created by GCST member Lizzie Lundgren.

When to create a new case

You will always need to create a new case if you want to have a new combination of compset, grid resolution, and machine. In addition there are certain settings that are fixed when you create a case. These are stored in XML file env_case.xml. If the file contains any settings you want to change then you should create a new case, e.g. CAM version used or case directory name or location.

Configuring diagnostics

Reading the CESM documentation about diagnostics is essential because it is very different from GEOS-Chem. The section to read is the Model output section of the CAM 6.3 User Guide.

GEOS-Chem is implemented in CESM such that the HISTORY.rc file is used to specify GEOS-Chem State_Diag, State_Met, and State_Chm fields to output. However, it is important to understand that most of the file is ignored, output files are configured differently, and additional non-GEOS-Chem diagnostics are configured elsewhere but appear in the same file. When you create a case there will be a GEOS-Chem HISTORY.rc file copied to the case control directory. Read the comments in that file to learn about how to use it. You can also view the file on the GEOS-Chem GitHub but note that the version may not exactly match what you get when you create a case in CESM.

Debugging and getting help

For general CESM help please use the DiscussCESM forum linked to in the CESM Resources section of this page. For GEOS-Chem help please following support instructions provided on GEOS-Chem ReadTheDocs here. Depending on the issue you may be referred by the CESM Support Team to contact the GEOS-Chem Support Team, or vice versa.

MUSICA and future work


GEOS-Chem Main Page