Minimum system requirements for GEOS-Chem

From Geos-chem
Jump to: navigation, search

Previous | Next | Getting Started with GEOS-Chem

  1. Minimum system requirements
  2. Downloading source code
  3. Downloading data directories
  4. Creating run directories
  5. Configuring runs
  6. Compiling
  7. Running
  8. Output files
  9. Visualizing and processing output
  10. Coding and debugging
  11. Further reading


Hardware Recommendations

Here is some useful information that you can use to determine if your system has sufficient resources to run GEOS-Chem simulations.

Memory and disk requirements

If you plan to run GEOS-Chem on a local computer system, please make sure that your system has sufficient memory and disk space:

Item Description
Enough memory to run GEOS-Chem For the 4° x 5° "standard" simulation
  • At least 8 GB RAM

For the 2° x 2.5° "standard" simulation:

  • About 30 GB RAM

Chris Holmes reported that a 2° x 2.5° "standard" simulation required:

  • 20 GB memory (MaxRSS)
  • 26 GB virtual memory (MaxVMSize)
Extra memory for special simulations You may want to consider at least 30 GB RAM if you plan on doing any of the following:

Chris Holmes reported that a GEOS-FP 0.25° x 0.3125° NA tropchem nested simulation required:

  • 31 GB memory (MaxRSS)
  • 38 GB virtual memory (MaxVMSize)
Sufficient disk storage for met fields

You may also run GEOS-Chem on the Amazon Web Services EC2 cloud computing platform. For more information, please see our cloud-computing tutorial: cloud.geos-chem.org

--Bob Yantosca (talk) 19:55, 10 January 2019 (UTC)

Computer architecture

Jun Wang wrote:

We have an opportunity to build a large HPC. Do you what configuration works best for GEOS-Chem?

Bob Yantosca replied:

You won’t need the GPU-accelerated nodes for GEOS-Chem. GC or GCHP will probably not be much able to take advantage of GPUs because (1) the Nvidia CUDA instructions are not included in the Intel Fortran compiler, (2) Writing GPU code is probably above the GCST’s skill level at this time, and (3) Right now the only Fortran compiler that can use GPUs is the PGI compiler.
This is the output of the /proc/cpuinfo file on the computational nodes we use at Harvard (i.e. the Odyssey cluster). You can ...see how it compares with [your system].
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 63
    model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    stepping        : 2
    cpu MHz         : 2494.085
    cache size      : 30720 KB
    physical id     : 0
    siblings        : 12
    core id         : 0
    cpu cores       : 12
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 15
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
                      clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
                      lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf
                      pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid
                      dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                      lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
                      fsgsbase bmi1 avx2 smep bmi2 erms invpcid
    bogomips        : 4988.17
    bogomips        : 4988.17
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
So our CPUs run at 2.5 GHz, and there are 24 CPUs/node. Each CPU has 30 MB of cache. I think there is 4.5 to 5 GB of memory per CPU available.
Also - if you are going to use the Intel Fortran Compiler, you will always get the best performance when using Intel CPUs. It is a known issue that the Intel Fortran Compiler does not optimize well on AMD CPUs. This was intentional. For more information, see this article.

GEOS-Chem v11-01 and newer versions are compatible with the GNU Fortran compiler, which yields nearly-identical results (within the bounds of numerical noise) to simulations that use the Intel Fortran Compiler, at the cost of somewhat slower performance. GNU Fortran should also perform more optimally on AMD CPUs than Intel Fortran.

--Bob Yantosca (talk) 19:11, 25 April 2017 (UTC)

Network and disk

Hong Liao wrote:

We are setting up a new cluster and we have the option of installing InfiniBand network (using Mellanox hardware) and GPFS.

My questions are:

  1. Will GEOS-Chem compile and run all right on clusters based on InfiniBand and GPFS?
  2. Will GEOS-Chem benefit from InfiniBand?

Bob Yantosca replied:

I can say a couple of general things:

  1. On our Harvard cluster (Odyssey) we use Infiniband to connect to a fast network disk (/n/regal). I don't know if that filesystem is GPFS.
  2. For GEOS-Chem "Classic" simulations, with OpenMP parallelization, you can only use one node of the machine. So in that case you won't be doing node-to-node communications. The only I/O would be from the CPU to the disk, and I think in that case Infiniband can make a great difference.
  3. For GCHP, which uses MPI parallelization, then you would also have to be concerned with inter-node communication, as you will be using CPUs across more than 1 node.
  4. The disk where our met fields live at Harvard is on a Lustre file system. This is more efficient for applications like GC that read a large volume of data.

Judit Flo Gaya replied:

What Bob said is correct, I just want to point out that GPFS and Lustre are "similar" file systems in the sense they both are parallel filesystems, they work differently and they charge differently.

They are both a lot more cumbersome to configure, maintain and debug than regular nfs, but they provide (when well implemented and tweaked) a significant increase in the performance of reading and writing files.

--Bob Yantosca (talk) 20:38, 27 October 2017 (UTC)

Utilizing cloud-computing resources

Bob Yantosca wrote:

Dear GEOS-Chem Users!

We have some exciting news to share with you! Jiawei Zhuang (Harvard) has proven that GEOS-Chem “Classic” can run on cloud computing resources, such as the Amazon Elastic Compute Cloud (EC2) servers. This important new development will allow you to:

  • Set up GEOS-Chem simulations quickly without having to invest in a local computing infrastructure
  • Run GEOS-Chem simulations that are memory and/or data-intensive (e.g. 1/4 degree nested-grid simulations)
  • Purchase only the computational time and data storage that you need (and easily request more resources)
  • Easily share data with others through the “cloud”

Two important recent developments make GEOS-Chem cloud computing possible:

1. Compatibility with the GNU Fortran compiler

An important (but probably understated) development is that GEOS-Chem v11-01 and newer can now be compiled with the free and open-source GNU Fortran compiler (aka gfortran). GNU Fortran breaks GEOS-Chem’s dependence on proprietary compilers like Intel Fortran (aka ifort) and PGI Fortran (aka pgfortran), which can be prohibitively expensive to purchase.

2. Development of a Python-based visualization and regridding software for GEOS-Chem

We are developing a new visualization/regridding package for GEOS-Chem (called GCPy) that is based on the free and open source Python programming language. Our first use of GCPy was to create the plots for the GCHP benchmark simulations. For more information, please see our Python tools for use with GEOS-Chem wiki page.

Please see the following resources for more information using GEOS-Chem on the Amazon EC2 compute platform:

  1. Zhuang, J., D.J. Jacob, J. Flo-Gaya, R.M. Yantosca, E.W. Lundgren, M.P. Sulprizio, and S.D. Eastham, Enabling immediate access to Earth science models through cloud computing: application to the GEOS-Chem model, Bull. Amer. Met. Soc., 2019, doi:10.1175/BAMS-D-18-0243.1 (PDF)

  2. cloud.geos-chem.org: Cloud-computing tutorial by Jiawei Zhuang.

  3. Using GEOS Chem on Amazon Web Service (AWS) cloud, by Jiawei Zhuang, presented at the IGC9 meeting (May 2019).

Software Requirements

Overview

Please see this list of required software packages on our GEOS-Chem basics page.

Supported Compilers

Please see our Guide to compilers for GEOS-Chem for detailed inforamtion about the compilers that you can use to build GEOS-Chem.

Required software libraries

Please see our Guide to netCDF in GEOS-Chem for information about netCDF and its dependent software libraries that you need to have on your system in order to use GEOS-Chem.

Parallelization

Please see our Parallelizing GEOS-Chem wiki page for more information about how GEOS-Chem is parallelized.

--Bob Yantosca (talk) 20:06, 17 June 2019 (UTC)

Estimated Disk Space

The following sections will help you assess how much disk space you need to run GEOS-Chem.

Emissions fields

Please see our HEMCO data directories wiki page for a list of disk space requirements for each of the emissions inventories read into GEOS-Chem by the HEMCO emissions component.

For GEOS-Chem v11-01, the entire size of the HEMCO data directories is about 375 GB total. Depending on the types of GEOS-Chem simulations you run, you may not need to download the entire set of emissions inventories.

--Bob Yantosca (talk) 21:47, 4 November 2016 (UTC)

Met fields

The amount of disk space that you will need depends on two things:

  1. Which type of met data you will use, and
  2. How many years of met data you will download

Typical disk space requirements are:

Met field Resolution File type Size
MERRA-2 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
MERRA-2 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 110 GB/yr
MERRA-2 0.5° x 0.625 Asia ("AS") nested grid COARDS-compliant netCDF (compressed) ~ 115 GB/yr
MERRA-2 0.5° x 0.625 Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
MERRA-2 0.5° x 0.625 North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 110 GB/yr
GEOS-FP 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
GEOS-FP 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 120 GB/yr
GEOS-FP 0.25° x 0.3125° China ("CH") nested grid COARDS-compliant netCDF (compressed) ~ 175 GB/yr
GEOS-FP 0.25° x 0.3125° Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
GEOS-FP 0.25° x 0.3125° North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 226 GB/yr

--Bob Yantosca (talk) 18:31, 2 January 2019 (UTC)

Output generated by GEOS-Chem

For the full-chemistry simulations, we can look to the GEOS-Chem benchmarks as a rough upper limit of how much disk space is needed for diagnostic output. The GEOS-Chem 12.4.0 benchmark simulation generates approximately 760 MB/month of files (monthly-mean output). This includes netCDF-format diagnostic output and restart files only, but excludes binary punch output (which is being phased out).

This would be an upper limit because in the benchmark simulations, we archive the "kitchen sink"—all species concentrations, various aerosol diagnostics, convective fluxes, dry dep fluxes and velocities, J-values, various chemical and meteorological quantities, transport fluxes, wet deposition diagnostics, and emissions diagnostics. Most GEOS-Chem users would not need to archive this much output.

In GEOS-Chem 12.5.0 (ETA late summer 2019), we will introduce the capability of horizontal and vertical subsetting for diagnostics being archived to netCDF output. This will let you save only a sub-region of the globe, or a subset of vertical levels (or both) in case you do not wish to archive diagnostics for the entire globe. This can help to further reduce the amount of diagnostic output being sent to disk.

The GEOS-Chem specialty simulations—simulations for species with first-order loss by prescribed oxidant fields (i.e. Hg, CH4, CO2, CO)—will produce much less output than the benchmark simulations. This is because these simulations typically only have a few species.

Also note: Archiving hourly or daily timeseries output would require much more disk space than the monthly-mean output. The disk space actually used will depend on how many quantities are archived and what the archival frequency is.

--Bob Yantosca (talk) 19:20, 14 June 2019 (UTC)

Performance and scalability

Please see our Guide to GEOS-Chem performance for more information about GEOS-Chem's performance and scalability.

--Bob Yantosca (talk) 19:51, 10 January 2019 (UTC)



Previous | Next | Getting Started with GEOS-Chem