Minimum system requirements for GEOS-Chem

From Geos-chem
Revision as of 18:10, 8 April 2015 by Melissa Payer (Talk | contribs) (Supported compilers)

Jump to: navigation, search

Several GEOS-Chem users have asked "What type of machine do I need to buy in order to run GEOS-Chem?" Here are our suggestions.

Hardware Recommendations

Our hardware recommendations are:

Software Requirements

Overview

GEOS-Chem requires the following software:

  1. Any Unix-style operating system, such as:
    • Linux (Red Hat, SuSE, CentOS, etc.)
    • Ubuntu
    • Fedora
    • Mac OS X (which is POSIX-compliant)
    • NOTE: GEOS-Chem cannot run on a Microsoft (XP, Vista, Windows 7) environment!
  2. A Fortran 90 compiler that supports OpenMP parallelization
    • If using the Portland Group compiler, then a C compiler (such as GNU C compiler) is also required. The gcc comes installed with most Unix builds.
  3. GNU Make (chances are your Unix system has this installed already)
  4. Git version control system (free, open-source version control software)
  5. For GEOS-Chem v9-01-03 and newer versions, you will also need to have a netCDF library installation (either netCDF-3 or netCDF-4) on your system.

The Linux flavor (RedHat, SuSE, Fedora, Ubuntu, etc.) is not important. Also, 64-bit architecture is not an issue with GEOS-Chem.

GEOS-Chem is written in the Fortran–90 language. Fortran-90 is an extension of Fortran-77, which for many years has been the standard programming language for scientific computing. GEOS-Chem takes advantage of several powerful features of Fortran-90, including dynamic memory allocation, modular program design, array operation syntax, and derived data types. Please view Appendix 7: GEOS-Chem Style Guide in the GEOS-Chem manual for more tips on how to write effective Fortran-90 code.

We use the Git version control software to manage and track GEOS-Chem software updates. Git allows users at remote sites to easily download GEOS-Chem over the network. Git also enables users to keep track of their changes when developing the code and enables the creation of patches that would simplify the implementation of new developments in the standard version. For all these reasons, you must install Git so that you can download and manage your local GEOS-Chem source code.

Supported compilers

As of 2013, the following platforms and compilers are supported. (Also note that the platform/compiler combinations that have since been de-supported.)

Platform Compiler Status Tested by Contact Info
Linux Intel Fortran Compiler (IFORT) 12.1 Supported GEOS-Chem Support Team geos-chem-support@as.harvard.edu
Linux Intel Fortran Compiler (IFORT) 11.1 Supported GEOS-Chem Support Team geos-chem-support@as.harvard.edu
Linux Intel Fortran Compiler (IFORT) 10.1 Supported GEOS-Chem Support Team geos-chem-support@as.harvard.edu
Linux Portland Group (PGI) Compiler Supported Hongyu Liu hyl@nianet.org
Linux GNU Fortran Not yet ported Need a volunteer!!
Sun Solaris SunStudio Compiler No longer supported
IBM XLF Compiler No longer supported
SGI IRIX SGI MIPSPro Compiler No longer supported
HP/Compaq Unix HP/Compaq Fortran No longer supported

For Linux, we strongly recommend to use the Intel Fortran (IFORT) compiler. In our experience, we have found that the Intel compiler is just a better all-around compiler than PGI. We have had various minor compatibility problems with PGI. Also, the Intel compiler is available at relatively low cost to you if you work for an educational institution.

Intel Fortran Compiler 11.1.x and higher versions are very optimized for multi-core chipsets.

--Bob Y. 10:09, 17 January 2014 (EST)

Parallelization

OpenMP

You should be aware that because GEOS-Chem uses OpenMP parallelization, you can only run on as many nodes as are shared by the memory. For example, if you had 2 PC's, and each PC w/ 4 cores each, then you can only run on 1 PC at a time (i.e. 4 cores). This is because OpenMP has a requirement that all of the processors on the machine must be able to see all of the memory on the machine. In that case, you could run 2 jobs simultaneously on 4 cores, but not a single job on 8 cores. See OpenMP.org for more information.

MPI

MPI (Message Passing Interface) is required for passing memory from one physical system to another. For example, if you wanted to run a GEOS-Chem simulation across several independent machines (or nodes of a cluster), then this requires MPI parallelization. OpenMP parallelization cannot be used unless all of the CPUs on the machine have access to all of the memory on the machine (a.k.a. shared-memory architecture).

The current standard version of GEOS-Chem does not yet have MPI capability. However, we are currently working on creating a Grid-independent GEOS-Chem version that would be compatible with MPI parallelization via the Earth System Model Framework. The work is ongoing.

Performance and scalability

Please see the following resources for more information about the performance and scalability of GEOS-Chem on various platform/compiler combinations:

  1. GEOS-Chem performance wiki page
  2. 1-month benchmark timing results for various platform/compiler combinations
  3. Wiki post about GEOS-Chem scalability on hyperthreaded chipsets
  4. Machine-specific issues and portability issues
  5. GEOS-Chem v9-02 performance

How much disk space will I need for GEOS-Chem?

It will depend on which type of met data that you will use and how many years of data you will download. Typical disk space requirements are:

Met fied Resolution File type Size
GEOS-4 4° x 5° bpch (uncompressed) ~13 GB/yr
GEOS-4 4° x 5° bpch (uncompressed) ~52 GB/yr
GEOS-5 4° x 5° bpch (uncompressed) ~30 GB/yr
GEOS-5 2° x 2.5° bpch (uncompressed) ~120 GB/yr
GEOS-5 0.5° x 0.666° nested CH bpch (uncompressed) ~140 GB/yr
GEOS-5 0.5° x 0.666° nested NA bpch (uncompressed) ~160 GB/yr
GEOS-FP 4° x 5° netCDF (uncompressed) ~70 GB/yr
GEOS-FP 2° x 2.5° netCDF (uncompressed) ~270 GB/yr
GEOS-FP 0.25° x 0.3125° nested CH netCDF (uncompressed) TBD GB/yr
GEOS-FP 0.25° x 0.3125° nested NA netCDF (uncompressed) ~930 GB/yr
MERRA 4° x 5° bpch (uncompressed) ~73 GB/yr

You will also need additional disk space for emission inventories and other atmospheric data sets. In GEOS-Chem v10-01 and later versions, you can download these data sets from the HEMCO data directories.

Do I need to install any libraries for GEOS-Chem?

The short answer is yes.

Required libraries for GEOS-Chem

We have begun the process of migrating GEOS-Chem's file I/O from binary files to netCDF files. If you are using GEOS-Chem v9-01-03 or higher versions, you will have to install a version of the netCDF library. You can install either the netCDF classic library or the netCDF-4 library. (Installing the netCDF-4 library will also require you to install the HDF-5 library as well.)

The GEOS-Chem Support Team has created an external package named GEOS-Chem-Libraries that will automate much of the netCDF library installation for you. For more information about how to use the GEOS-Chem-Libraries installer, please see our Installing libraries for GEOS-Chem wiki page.

If you work at an institution with several GEOS-Chem users, then your sysadmin can help you to installl the netCDF library into a common directory where everyone can access it.

--Bob Y. 16:25, 16 January 2014 (EST)

Libraries that you may need for certain data sets

If you are working with large raw data files (e.g. "raw" GMAO met field data or one of the many satellite data products), you may also need to install additional libraries. You can then link these libraries to the Fortran code that you use to process your data.

Here is a table that describes the file formats that are used by the various data products:

Data type File format used
GEOS-4 met ("raw" data) HDF4-EOS
GEOS-5 met ("raw" data) HDF4-EOS
MERRA met ("raw" data) HDF4-EOS
GEOS-FP met ("raw" data) netCDF-4
MOPITT satellite data HDF4-EOS
AIRS satellite data HDF4-EOS
OMI satellite data HDF5-EOS

NOTES:

  1. HDF4 is an older version of HDF. HDF5 is the newer version.
    • One of the major differences is that with HDF4 the file size may not exceed 2GB. This restriction has been lifted in HDF5.
  2. HDF-EOS is a "superset" of HDF that was developed by NASA and others to create extra data structures (Grid, Point, and Swath) that are more relevant for earth science applications.
    • HDF4-EOS is HDF-EOS that is based on HDF4.
    • HDF5-EOS is HDF-EOS that is based on HDF5.
  3. You must first install HDF4 before installing HDF4-EOS.
    • HDF4 requires that ZLIB, JPEG, (and optionally, SZLIB) libraries must also be installed.
  4. You must first install HDF5 before installing HDF5-EOS
    • HDF5 requires that ZLIB, JPEG, (and optionally, SZLIB) libraries must also be installed.
  5. netCDF-4 is the latest version of netCDF.
    • netCDF-4 is 100% compatible with the older netCDF-3 format.
    • netCDF-4 relies on HDF5, so you have to first build the HDF5 and ZLIB libraries before installing netCDF-4.

It is also possible to read HDF, HDF-EOS, and netCDF files into IDL with the GAMAP package. See the GAMAP User's Guide for more information.

The NCAR command language (NCL) can read data in all of these file formats.

--Bob Y. 16:25, 16 January 2014 (EST)