Minimum system requirements for GEOS-Chem

From Geos-chem
Jump to: navigation, search

Several GEOS-Chem users have asked "What type of machine do I need to buy in order to run GEOS-Chem?" Here are our suggestions.

Hardware Recommendations

Here is some useful information that you can use to determine if your system has sufficient resources to run GEOS-Chem simulations.

Memory requirements

Item Description
Enough memory to run GEOS-Chem For the 4° x 5° "standard" simulation
  • At least 6-8 GB RAM

For the 2° x 2.5° "standard" simulation:

  • About 20-25 GB RAM

Chris Holmes reported that a 2° x 2.5° "standard" simulation required:

  • 20 GB memory (MaxRSS)
  • 26 GB virtual memory (MaxVMSize)
Extra memory for special simulations You may want to consider at least 30 GB RAM if you plan on doing any of the following:

Chris Holmes reported that a GEOS-FP 0.25° x 0.3125° NA tropchem nested simulation required:

  • 31 GB memory (MaxRSS)
  • 38 GB virtual memory (MaxVMSize)
Sufficient disk storage for met fields

Computer architecture

Jun Wang wrote:

We have an opportunity to build a large HPC. Do you what configuration works best for GEOS-Chem?

Bob Yantosca replied:

You won’t need the GPU-accelerated nodes for GEOS-Chem. GC or GCHP will probably not be much able to take advantage of GPUs because (1) the Nvidia CUDA instructions are not included in the Intel Fortran compiler, (2) Writing GPU code is probably above the GCST’s skill level at this time, and (3) Right now the only Fortran compiler that can use GPUs is the PGI compiler.
This is the output of the /proc/cpuinfo file on the computational nodes we use at Harvard (i.e. the Odyssey cluster). You can ...see how it compares with [your system].
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 63
    model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
    stepping        : 2
    cpu MHz         : 2494.085
    cache size      : 30720 KB
    physical id     : 0
    siblings        : 12
    core id         : 0
    cpu cores       : 12
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 15
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
                      clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
                      lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf
                      pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid
                      dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
                      lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
                      fsgsbase bmi1 avx2 smep bmi2 erms invpcid
    bogomips        : 4988.17
    bogomips        : 4988.17
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
So our CPUs run at 2.5 GHz, and there are 24 CPUs/node. Each CPU has 30 MB of cache. I think there is 4.5 to 5 GB of memory per CPU available.
Also - if you are going to use the Intel Fortran Compiler, you will always get the best performance when using Intel CPUs. It is a known issue that the Intel Fortran Compiler does not optimize well on AMD CPUs. This was intentional. For more information, see this article.

GEOS-Chem v11-01 and newer versions are compatible with the GNU Fortran compiler, which yields nearly-identical results (within the bounds of numerical noise) to simulations that use the Intel Fortran Compiler, at the cost of somewhat slower performance. GNU Fortran should also perform more optimally on AMD CPUs than Intel Fortran.

--Bob Yantosca (talk) 19:11, 25 April 2017 (UTC)

Network and disk

Hong Liao wrote:

We are setting up a new cluster and we have the option of installing InfiniBand network (using Mellanox hardware) and GPFS.

My questions are:

  1. Will GEOS-Chem compile and run all right on clusters based on InfiniBand and GPFS?
  2. Will GEOS-Chem benefit from InfiniBand?

Bob Yantosca replied:

I can say a couple of general things:

  1. On our Harvard cluster (Odyssey) we use Infiniband to connect to a fast network disk (/n/regal). I don't know if that filesystem is GPFS.
  2. For GEOS-Chem "Classic" simulations, with OpenMP parallelization, you can only use one node of the machine. So in that case you won't be doing node-to-node communications. The only I/O would be from the CPU to the disk, and I think in that case Infiniband can make a great difference.
  3. For GCHP, which uses MPI parallelization, then you would also have to be concerned with inter-node communication, as you will be using CPUs across more than 1 node.
  4. The disk where our met fields live at Harvard is on a Lustre file system. This is more efficient for applications like GC that read a large volume of data.

Judit Flo Gaya replied:

What Bob said is correct, I just want to point out that GPFS and Lustre are "similar" file systems in the sense they both are parallel filesystems, they work differently and they charge differently.

They are both a lot more cumbersome to configure, maintain and debug than regular nfs, but they provide (when well implemented and tweaked) a significant increase in the performance of reading and writing files.

--Bob Yantosca (talk) 20:38, 27 October 2017 (UTC)

Utilizing cloud-computing resources

CAVEAT: We STRONGLY advise that you check with your cloud computing provider if there are substantial fees to upload and store large amounts of data (such as the GEOS-Chem met fields and emissions data) before starting any GEOS-Chem simulations "in the cloud".

Bob Yantosca wrote:

Dear GEOS-Chem Users!

We have some exciting news to share with you! Jiawei Zhuang (Harvard) has proven that GEOS-Chem “Classic” can run on cloud computing resources, such as the Amazon Elastic Compute Cloud (EC2) servers. This important new development will allow you to:

  • Set up GEOS-Chem simulations quickly without having to invest in a local computing infrastructure
  • Run GEOS-Chem simulations that are memory and/or data-intensive (e.g. 1/4 degree nested-grid simulations)
  • Purchase only the computational time and data storage that you need (and easily request more resources)
  • Easily share data with others through the “cloud”

Two important recent developments make GEOS-Chem cloud computing possible:

1. Compatibility with the GNU Fortran compiler

An important (but probably understated) development is that GEOS-Chem v11-01 and newer can now be compiled with the free and open-source GNU Fortran compiler (aka gfortran). Bob Yantosca and Seb Eastham have removed and/or rewritten several sections of legacy code that GNU Fortran could not compile. Due to their diligence, GEOS-Chem v11-01 is now compatible with GNU Fortran v4, and GEOS-Chem v11-02 will be compatible with GNU Fortran v6 (the latest version).

GNU Fortran breaks GEOS-Chem’s dependence on proprietary compilers like Intel Fortran (aka ifort) and PGI Fortran (aka pgfortran), which can be prohibitively expensive to purchase. The GNU Fortran (and C/C++) compilers come pre-installed on most versions of the Linux operating system today (and if not, they are easy to install). GNU Fortran also can produce executables that can optimize well for many different types of CPUs (Intel, AMD, etc).

To validate the performance of GEOS-Chem with GNU Fortran, we ran two 1-month benchmark simulations for v11-02, one with Intel Fortran v11.1 and another with GNU Fortran v6.2. We posted a summary of the results on the GEOS-Chem wiki, which you can read by clicking this link.

As you can see from the wiki, the v11-02a benchmark using GNU Fortran gives essentially identical results (within the bounds of numerical noise) to the v11-02a benchmark using Intel Fortran. The run time for several GEOS-Chem operations is somewhat slower, but we believe that this might be improved with some streamlining of code. We believe that having a longer run time is an acceptable tradeoff for not having to purchase an expensive Intel Fortran or PGI Fortran license.

2. Development of a Python-based visualization and regridding software for GEOS-Chem

We are developing a new visualization/regridding package for GEOS-Chem (called GCPy) that is based on the free and open source Python programming language. Our first use of GCPy was to create the plots for the GCHP benchmark simulations. While GCPy is currently not ready for public use (as of April 2017), we will work on improving its usability in the very near future.

Having an option like GCPy will finally let us reduce our dependence on IDL based software (e.g. GAMAP). An IDL license is now very expensive to purchase, and is out of reach for some GEOS-Chem user groups.

In addition to GCPy (which is still in development), there are other Python-based visualization packages for GEOS-Chem (developed by several members of the GEOS-Chem user community) that you can use right away. For more information, please see our Python code for GEOS-Chem wiki page.

Jiawei Zhuang has created a tutorial on how you can set up GEOS-Chem on the Amazon EC2 compute platform. He also has collated several of the input files that you will need to customize your login environment on EC2. For more information, please see his Github site: https://github.com/JiaweiZhuang/cloud_GC.

P.S. At present it is not yet possible to run GCHP (our high-performance GEOS-Chem) on the Amazon EC2 platform, due to various technical issues. We will be looking into this in the near future.

Software Requirements

Overview

Please see this list of required software packages on our GEOS-Chem basics page.

A few notes:

  1. The Linux flavor (RedHat, SuSE, Fedora, Ubuntu, etc.) is not important. Also, 64-bit architecture is not an issue with GEOS-Chem.
  2. GEOS-Chem is written in the Fortran–90 language. Fortran-90 is an extension of Fortran-77, which for many years has been the standard programming language for scientific computing. GEOS-Chem takes advantage of several powerful features of Fortran-90, including dynamic memory allocation, modular program design, array operation syntax, and derived data types. Please view Appendix 7: GEOS-Chem Style Guide in the GEOS-Chem manual for more tips on how to write effective Fortran-90 code.
  3. We use the Git version control software to manage and track GEOS-Chem software updates. Git allows users at remote sites to easily download GEOS-Chem over the network. Git also enables users to keep track of their changes when developing the code and enables the creation of patches that would simplify the implementation of new developments in the standard version. For all these reasons, you must install Git so that you can download and manage your local GEOS-Chem source code.

--Bob Yantosca (talk) 19:17, 4 November 2016 (UTC)

Supported compilers

As of 2016, the following platforms and compilers are supported. The majority of GEOS-Chem users compile GEOS-Chem with the Intel Fortran Compiler. GEOS-Chem v11-01 and higher versions are now compatible with the GNU Fortran compiler as well.

In GEOS-Chem v11-01 and later versions, the Fortran compiler environment variables must be set.

Platform Compiler Status Tested by
Linux ifort 15.0.0 and similar builds Supported GCST
Linux ifort 13.0.079 and similar builds Supported GCST
Linux ifort 12 Supported GCST
Linux ifort 11.1.069 and similar builds Supported
  • NOTE: This is an old version (2008). The GCST uses this version to benchmark GEOS-Chem, in order to ensure numerical compatibility between different benchmark versions.
GCST
Linux ifort 10.1 Supported (but this is an old version by now) GCST
Linux gfortran 4.4.7 Supported GCST
Linux gfortran 4.8.2 Supported GCST
Linux gfortran 5.x.x Supported GCST
Linux gfortran 6.x.x Supported GCST
Linux pgfortran 14.10 Supported
  • Validation of benchmark results is still needed.
GCST

--Bob Yantosca (talk) 20:40, 27 October 2017 (UTC)

Parallelization

OpenMP

You should be aware that because GEOS-Chem uses OpenMP parallelization, you can only run on as many nodes as are shared by the memory. For example, if you had 2 PC's, and each PC w/ 4 cores each, then you can only run on 1 PC at a time (i.e. 4 cores). This is because OpenMP has a requirement that all of the processors on the machine must be able to see all of the memory on the machine. In that case, you could run 2 jobs simultaneously on 4 cores, but not a single job on 8 cores. See OpenMP.org for more information.

Our traditional configuration of GEOS-Chem, known as "GEOS-Chem Classic", cannot use MPI capability. It is parallelized with the OpenMP parallelization directives.

MPI

MPI (Message Passing Interface) is required for passing memory from one physical system to another. For example, if you wanted to run a GEOS-Chem simulation across several independent machines (or nodes of a cluster), then this requires MPI parallelization. OpenMP parallelization cannot be used unless all of the CPUs on the machine have access to all of the memory on the machine (a.k.a. shared-memory architecture).

Our High-performance version of GEOS-Chem (aka GCHP) can use OpenMPI and MVAPICH2 versions of the MPI parallelzation library.

--Bob Yantosca (talk) 20:09, 4 November 2016 (UTC)

Performance and scalability

Please see the following resources for more information about the performance and scalability of GEOS-Chem on various platform/compiler combinations:

  1. GEOS-Chem performance wiki page
  2. 1-month benchmark timing results for various platform/compiler combinations
  3. Wiki post about GEOS-Chem scalability on hyperthreaded chipsets
  4. Machine-specific issues and portability issues
  5. GEOS-Chem v9-02 performance

Estimated Disk Space

Emissions fields

Please see our HEMCO data directories wiki page for a list of disk space requirements for each of the emissions inventories read into GEOS-Chem by the HEMCO emissions component.

For GEOS-Chem v11-01, the entire size of the HEMCO data directories is about 375 GB total. Depending on the types of GEOS-Chem simulations you run, you may not need to download the entire set of emissions inventories.

--Bob Yantosca (talk) 21:47, 4 November 2016 (UTC)

Met fields

The amount of disk space that you will need depends on two things:

  1. Which type of met data you will use, and
  2. How many years of met data you will download

Typical disk space requirements are:

Met field Resolution File type Size
MERRA-2 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
MERRA-2 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 110 GB/yr
MERRA-2 0.5° x 0.625 Asia ("AS") nested grid COARDS-compliant netCDF (compressed) ~ 115 GB/yr
MERRA-2 0.5° x 0.625 Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
MERRA-2 0.5° x 0.625 North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 110 GB/yr
GEOS-FP 4° x 5° COARDS-compliant netCDF (compressed) ~ 30 GB/yr
GEOS-FP 2° x 2.5° COARDS-compliant netCDF (compressed) ~ 120 GB/yr
GEOS-FP 0.25° x 0.3125° China ("CH") nested grid COARDS-compliant netCDF (compressed) ~ 175 GB/yr
GEOS-FP 0.25° x 0.3125° Europe ("EU") nested grid COARDS-compliant netCDF (compressed) ~ 58 GB/yr
GEOS-FP 0.25° x 0.3125° North America ("NA") nested grid COARDS-compliant netCDF (compressed) ~ 226 GB/yr
MERRA 4° x 5° Binary (uncompressed) ~ 70 GB/yr
MERRA 2° x 2.5° Binary (uncompressed) ~ 200 GB/yr
GEOS-5 4° x 5° Binary (uncompressed) ~ 30 GB/yr
GEOS-5 2° x 2.5° Binary (uncompressed) ~ 120 GB/yr
GEOS-5 0.5° x 0.666° nested CH Binary (uncompressed) ~ 140 GB/yr
GEOS-5 0.5° x 0.666° nested NA Binary (uncompressed) ~ 160 GB/yr

--Bob Yantosca (talk) 21:50, 4 November 2016 (UTC)

Do I need to install any libraries for GEOS-Chem?

The short answer is yes.

Required libraries for GEOS-Chem

We have begun the process of migrating GEOS-Chem's file I/O from binary files to netCDF files. If you are using GEOS-Chem v9-01-03 or higher versions, you will need a version of the netCDF library.

On many modern computer systems, libraries such as netCDF are pre-installed and available for use via the module command. Be sure to check with your sysadmin or IT staff if a version of netCDF has already been installed on your system. If so, you can just use load the pre-built netCDF library into your environment with one or more module load statements. You can usually place these statements into your .bashrc or .cshrc startup file so that netCDF is loaded into your environment each time you log in to your account.

If your computer system does NOT already have a pre-built version of the netCDF libraries installed, you can use our GEOS-Chem-Libraries installer to automate much of the netCDF library installation for you. For more information about how to use the GEOS-Chem-Libraries installer, please see our Installing libraries for GEOS-Chem wiki page, as well as Chapter 3 of the GEOS-Chem User's Guide. If you work at an institution with several GEOS-Chem users, then your sysadmin can help you to installl the netCDF library into a common directory where everyone can access it.

--Bob Yantosca (talk) 16:48, 5 December 2016 (UTC)

Libraries that you may need for certain data sets

If you are working with large raw data files (e.g. "raw" GMAO met field data or one of the many satellite data products), you may also need to install additional libraries. You can then link these libraries to the Fortran code that you use to process your data.

Here is a table that describes the file formats that are used by the various data products:

Data type File format used
GEOS-4 met ("raw" data) HDF4-EOS
GEOS-5 met ("raw" data) HDF4-EOS
MERRA met ("raw" data) HDF4-EOS
GEOS-FP met ("raw" data) netCDF-4
MERRA-2 met ("raw" data) netCDF-4
MOPITT satellite data HDF4-EOS
AIRS satellite data HDF4-EOS
OMI satellite data HDF5-EOS

NOTES:

  1. HDF4 is an older version of HDF. HDF5 is the newer version.
    • One of the major differences is that with HDF4 the file size may not exceed 2GB. This restriction has been lifted in HDF5.
  2. HDF-EOS is a "superset" of HDF that was developed by NASA and others to create extra data structures (Grid, Point, and Swath) that are more relevant for earth science applications.
    • HDF4-EOS is HDF-EOS that is based on HDF4.
    • HDF5-EOS is HDF-EOS that is based on HDF5.
  3. You must first install HDF4 before installing HDF4-EOS.
    • HDF4 requires that ZLIB, JPEG, (and optionally, SZLIB) libraries must also be installed.
  4. You must first install HDF5 before installing HDF5-EOS
    • HDF5 requires that ZLIB, JPEG, (and optionally, SZLIB) libraries must also be installed.
  5. netCDF-4 is the latest version of netCDF.
    • netCDF-4 is 100% compatible with the older netCDF-3 format.
    • netCDF-4 relies on HDF5, so you have to first build the HDF5 and ZLIB libraries before installing netCDF-4.

It is also possible to read HDF, HDF-EOS, and netCDF files into IDL with the GAMAP package. See the GAMAP User's Guide for more information.

The NCAR command language (NCL) can read data in all of these file formats.

--Bob Y. 16:25, 16 January 2014 (EST)