Minimum system requirements for GEOS-Chem

From Geos-chem
Jump to: navigation, search

Several GEOS-Chem users have asked "What type of machine do I need to buy in order to run GEOS-Chem?" Here are our suggestions.

Hardware Recommendations

Here is some useful information that you can use to determine if your system has sufficient resources to run GEOS-Chem simulations.

Memory requirements

If you plan to run GEOS-Chem on a local computer system, please make sure that your system has sufficient memory and disk space:

Item Description
Enough memory to run GEOS-Chem For the 4° x 5° "standard" simulation
  • At least 8 GB RAM

For the 2° x 2.5° "standard" simulation:

  • About 30 GB RAM

Chris Holmes reported that a 2° x 2.5° "standard" simulation required:

  • 20 GB memory (MaxRSS)
  • 26 GB virtual memory (MaxVMSize)
Extra memory for special simulations You may want to consider at least 30 GB RAM if you plan on doing any of the following:

Chris Holmes reported that a GEOS-FP 0.25° x 0.3125° NA tropchem nested simulation required:

  • 31 GB memory (MaxRSS)
  • 38 GB virtual memory (MaxVMSize)
Sufficient disk storage for met fields

You may also run GEOS-Chem on the Amazon Web Services EC2 cloud computing platform. For more information, please see our cloud-computing tutorial: cloud.geos-chem.org

--Bob Yantosca (talk) 19:55, 10 January 2019 (UTC)

Computer architecture

Jun Wang wrote:

We have an opportunity to build a large HPC. Do you what configuration works best for GEOS-Chem?

Bob Yantosca replied:

You won’t need the GPU-accelerated nodes for GEOS-Chem. GC or GCHP will probably not be much able to take advantage of GPUs because (1) the Nvidia CUDA instructions are not included in the Intel Fortran compiler, (2) Writing GPU code is probably above the GCST’s skill level at this time, and (3) Right now the only Fortran compiler that can use GPUs is the PGI compiler.
This is the output of the /proc/cpuinfo file on the computational nodes we use at Harvard (i.e. the Odyssey cluster). You can ...see how it compares with [your system].
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 63
    model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    stepping        : 2
    cpu MHz         : 2494.085
    cache size      : 30720 KB
    physical id     : 0
    siblings        : 12
    core id         : 0
    cpu cores       : 12
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 15
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
                      clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
                      lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf
                      pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid
                      dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                      lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
                      fsgsbase bmi1 avx2 smep bmi2 erms invpcid
    bogomips        : 4988.17
    bogomips        : 4988.17
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
So our CPUs run at 2.5 GHz, and there are 24 CPUs/node. Each CPU has 30 MB of cache. I think there is 4.5 to 5 GB of memory per CPU available.
Also - if you are going to use the Intel Fortran Compiler, you will always get the best performance when using Intel CPUs. It is a known issue that the Intel Fortran Compiler does not optimize well on AMD CPUs. This was intentional. For more information, see this article.

GEOS-Chem v11-01 and newer versions are compatible with the GNU Fortran compiler, which yields nearly-identical results (within the bounds of numerical noise) to simulations that use the Intel Fortran Compiler, at the cost of somewhat slower performance. GNU Fortran should also perform more optimally on AMD CPUs than Intel Fortran.

--Bob Yantosca (talk) 19:11, 25 April 2017 (UTC)

Network and disk

Hong Liao wrote:

We are setting up a new cluster and we have the option of installing InfiniBand network (using Mellanox hardware) and GPFS.

My questions are:

  1. Will GEOS-Chem compile and run all right on clusters based on InfiniBand and GPFS?
  2. Will GEOS-Chem benefit from InfiniBand?

Bob Yantosca replied:

I can say a couple of general things:

  1. On our Harvard cluster (Odyssey) we use Infiniband to connect to a fast network disk (/n/regal). I don't know if that filesystem is GPFS.
  2. For GEOS-Chem "Classic" simulations, with OpenMP parallelization, you can only use one node of the machine. So in that case you won't be doing node-to-node communications. The only I/O would be from the CPU to the disk, and I think in that case Infiniband can make a great difference.
  3. For GCHP, which uses MPI parallelization, then you would also have to be concerned with inter-node communication, as you will be using CPUs across more than 1 node.
  4. The disk where our met fields live at Harvard is on a Lustre file system. This is more efficient for applications like GC that read a large volume of data.

Judit Flo Gaya replied:

What Bob said is correct, I just want to point out that GPFS and Lustre are "similar" file systems in the sense they both are parallel filesystems, they work differently and they charge differently.

They are both a lot more cumbersome to configure, maintain and debug than regular nfs, but they provide (when well implemented and tweaked) a significant increase in the performance of reading and writing files.

--Bob Yantosca (talk) 20:38, 27 October 2017 (UTC)

Utilizing cloud-computing resources

Bob Yantosca wrote:

Dear GEOS-Chem Users!

We have some exciting news to share with you! Jiawei Zhuang (Harvard) has proven that GEOS-Chem “Classic” can run on cloud computing resources, such as the Amazon Elastic Compute Cloud (EC2) servers. This important new development will allow you to:

  • Set up GEOS-Chem simulations quickly without having to invest in a local computing infrastructure
  • Run GEOS-Chem simulations that are memory and/or data-intensive (e.g. 1/4 degree nested-grid simulations)
  • Purchase only the computational time and data storage that you need (and easily request more resources)
  • Easily share data with others through the “cloud”

Two important recent developments make GEOS-Chem cloud computing possible:

1. Compatibility with the GNU Fortran compiler

An important (but probably understated) development is that GEOS-Chem v11-01 and newer can now be compiled with the free and open-source GNU Fortran compiler (aka gfortran). Bob Yantosca and Seb Eastham have removed and/or rewritten several sections of legacy code that GNU Fortran could not compile. Due to their diligence, GEOS-Chem v11-01 is now compatible with GNU Fortran v4, and GEOS-Chem v11-02 will be compatible with GNU Fortran v6 (the latest version).

GNU Fortran breaks GEOS-Chem’s dependence on proprietary compilers like Intel Fortran (aka ifort) and PGI Fortran (aka pgfortran), which can be prohibitively expensive to purchase. The GNU Fortran (and C/C++) compilers come pre-installed on most versions of the Linux operating system today (and if not, they are easy to install). GNU Fortran also can produce executables that can optimize well for many different types of CPUs (Intel, AMD, etc).

To validate the performance of GEOS-Chem with GNU Fortran, we ran two 1-month benchmark simulations for v11-02, one with Intel Fortran v11.1 and another with GNU Fortran v6.2. We posted a summary of the results on the GEOS-Chem wiki, which you can read by clicking this link.

As you can see from the wiki, the v11-02a benchmark using GNU Fortran gives essentially identical results (within the bounds of numerical noise) to the v11-02a benchmark using Intel Fortran. The run time for several GEOS-Chem operations is somewhat slower, but we believe that this might be improved with some streamlining of code. We believe that having a longer run time is an acceptable tradeoff for not having to purchase an expensive Intel Fortran or PGI Fortran license.

2. Development of a Python-based visualization and regridding software for GEOS-Chem

We are developing a new visualization/regridding package for GEOS-Chem (called GCPy) that is based on the free and open source Python programming language. Our first use of GCPy was to create the plots for the GCHP benchmark simulations. While GCPy is currently not ready for public use (as of April 2017), we will work on improving its usability in the very near future.

Having an option like GCPy will finally let us reduce our dependence on IDL based software (e.g. GAMAP). An IDL license is now very expensive to purchase, and is out of reach for some GEOS-Chem user groups.

In addition to GCPy (which is still in development), there are other Python-based visualization packages for GEOS-Chem (developed by several members of the GEOS-Chem user community) that you can use right away. For more information, please see our Python code for GEOS-Chem wiki page.
Jiawei Zhuang has created a tutorial on how you can set up GEOS-Chem on the Amazon EC2 compute platform: http://cloud.geos-chem.org.

Software Requirements

Overview

Please see this list of required software packages on our GEOS-Chem basics page.

A few notes:

  1. The Linux flavor (RedHat, SuSE, Fedora, Ubuntu, etc.) is not important. Also, 64-bit architecture is not an issue with GEOS-Chem.
  2. GEOS-Chem is written in the Fortran–90 language. Fortran-90 is an extension of Fortran-77, which for many years has been the standard programming language for scientific computing. GEOS-Chem takes advantage of several powerful features of Fortran-90, including dynamic memory allocation, modular program design, array operation syntax, and derived data types. Please view Appendix 7: GEOS-Chem Style Guide in the GEOS-Chem manual for more tips on how to write effective Fortran-90 code.
  3. We use the Git version control software to manage and track GEOS-Chem software updates. Git allows users at remote sites to easily download GEOS-Chem over the network. Git also enables users to keep track of their changes when developing the code and enables the creation of patches that would simplify the implementation of new developments in the standard version. For all these reasons, you must install Git so that you can download and manage your local GEOS-Chem source code.

--Bob Yantosca (talk) 19:17, 4 November 2016 (UTC)

Supported compilers

The following platforms and compilers are currently supported.

Platform Compiler Status Tested by
GNU Fortran compiler
This is our recommended open-source compiler for GEOS-Chem
Linux gfortran 8.x.x Supported (GEOS-Chem v11-01 and later versions) GCST
Linux gfortran 7.x.x Supported (GEOS-Chem v11-01 and later versions) GCST
Linux gfortran 6.x.x Supported (GEOS-Chem v11-01 and later versions) GCST
Linux gfortran 5.x.x Supported (GEOS-Chem v11-01 and later versions) GCST
Linux gfortran 4.8.2 Supported (GEOS-Chem v11-01 and later versions) GCST
Linux gfortran 4.4.7 Supported (GEOS-Chem v11-01 and later versions) GCST
Intel Fortran compiler
This is our recommended proprietary compiler for GEOS-Chem
Linux ifort 17.0.4 and later versions Supported
  • NOTE: GEOS-Chem v10-01 and prior versions will not compile with ifort 17 and higher.
GCST
Linux ifort 15.0.0 and similar builds Supported GCST
Linux ifort 13.0.079 and similar builds Supported GCST
Linux ifort 12 Supported GCST
Linux ifort 11.1.069 and similar builds Supported
  • NOTE: This is an old version (2008). The GCST uses this version to benchmark GEOS-Chem, in order to ensure numerical compatibility between different benchmark versions.
GCST
Linux ifort 10.1 Supported (but this is an old version by now) GCST

--Bob Yantosca (talk) 19:48, 10 January 2019 (UTC)

Parallelization

OpenMP

You should be aware that because GEOS-Chem uses OpenMP parallelization, you can only run on as many nodes as are shared by the memory. For example, if you had 2 PC's, and each PC w/ 4 cores each, then you can only run on 1 PC at a time (i.e. 4 cores). This is because OpenMP has a requirement that all of the processors on the machine must be able to see all of the memory on the machine. In that case, you could run 2 jobs simultaneously on 4 cores, but not a single job on 8 cores. See OpenMP.org for more information.

Our traditional configuration of GEOS-Chem, known as "GEOS-Chem Classic", cannot use MPI capability. It is parallelized with the OpenMP parallelization directives.

MPI

MPI (Message Passing Interface) is required for passing memory from one physical system to another. For example, if you wanted to run a GEOS-Chem simulation across several independent machines (or nodes of a cluster), then this requires MPI parallelization. OpenMP parallelization cannot be used unless all of the CPUs on the machine have access to all of the memory on the machine (a.k.a. shared-memory architecture).

Our High-performance version of GEOS-Chem (aka GCHP) can use OpenMPI and MVAPICH2 versions of the MPI parallelzation library.

--Bob Yantosca (talk) 20:09, 4 November 2016 (UTC)

Performance and scalability

Please see our GEOS-Chem performance wiki page to view timing results from recent GEOS-Chem versions.

--Bob Yantosca (talk) 19:51, 10 January 2019 (UTC)

Estimated Disk Space

Emissions fields

Please see our HEMCO data directories wiki page for a list of disk space requirements for each of the emissions inventories read into GEOS-Chem by the HEMCO emissions component.

For GEOS-Chem v11-01, the entire size of the HEMCO data directories is about 375 GB total. Depending on the types of GEOS-Chem simulations you run, you may not need to download the entire set of emissions inventories.

--Bob Yantosca (talk) 21:47, 4 November 2016 (UTC)

Met fields

The amount of disk space that you will need depends on two things:

  1. Which type of met data you will use, and
  2. How many years of met data you will download

Typical disk space requirements are:

Met field Resolution File type Size
MERRA-2 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
MERRA-2 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 110 GB/yr
MERRA-2 0.5° x 0.625 Asia ("AS") nested grid COARDS-compliant netCDF (compressed) ~ 115 GB/yr
MERRA-2 0.5° x 0.625 Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
MERRA-2 0.5° x 0.625 North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 110 GB/yr
GEOS-FP 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
GEOS-FP 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 120 GB/yr
GEOS-FP 0.25° x 0.3125° China ("CH") nested grid COARDS-compliant netCDF (compressed) ~ 175 GB/yr
GEOS-FP 0.25° x 0.3125° Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
GEOS-FP 0.25° x 0.3125° North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 226 GB/yr

--Bob Yantosca (talk) 18:31, 2 January 2019 (UTC)

Do I need to install any libraries for GEOS-Chem?

GEOS-Chem requires the netCDF library, as well as its dependent libraries (such as HDF5).

If you are using GEOS-Chem on a local computer system, then first check to see if netCDF and its dependent libraries have already been installed for you by your IT staff.

If the libraries are not already on your system, then follow these instructions to download and install netCDF and its dependent libraries on your own.

If you are using GEOS-Chem on the Amazon Web Services cloud computing platform, then all of the required libraries will be included in either the Amazon Machine Image (AMI) or software container (e.g. Docker or Singularity) that you use to initialize your computational environment. Please see our cloud computing tutorial for more information: cloud.geos-chem.org.

--Bob Yantosca (talk) 20:04, 10 January 2019 (UTC)