Minimum system requirements for GEOS-Chem
- Minimum system requirements
- Downloading source code
- Downloading data directories
- Creating run directories
- Configuring runs
- Output files
- Visualizing and processing output
- Coding and debugging
- Further reading
- 1 Hardware Recommendations
- 2 Software Requirements
- 3 Estimated Disk Space
- 4 Performance and scalability
Here is some useful information that you can use to determine if your system has sufficient resources to run GEOS-Chem simulations.
Memory and disk requirements
If you plan to run GEOS-Chem on a local computer system, please make sure that your system has sufficient memory and disk space:
|Enough memory to run GEOS-Chem||For the 4° x 5° "standard" simulation
For the 2° x 2.5° "standard" simulation:
Chris Holmes reported that a 2° x 2.5° "standard" simulation required:
|Extra memory for special simulations||You may want to consider at least 30 GB RAM if you plan on doing any of the following:
Chris Holmes reported that a GEOS-FP 0.25° x 0.3125° NA tropchem nested simulation required:
|Sufficient disk storage for met fields|
You may also run GEOS-Chem on the Amazon Web Services EC2 cloud computing platform. For more information, please see our cloud-computing tutorial: cloud.geos-chem.org
Jun Wang wrote:
We have an opportunity to build a large HPC. Do you what configuration works best for GEOS-Chem?
Bob Yantosca replied:
You won’t need the GPU-accelerated nodes for GEOS-Chem. GC or GCHP will probably not be much able to take advantage of GPUs because (1) the Nvidia CUDA instructions are not included in the Intel Fortran compiler, (2) Writing GPU code is probably above the GCST’s skill level at this time, and (3) Right now the only Fortran compiler that can use GPUs is the PGI compiler.
This is the output of the /proc/cpuinfo file on the computational nodes we use at Harvard (i.e. the Odyssey cluster). You can ...see how it compares with [your system].
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz stepping : 2 cpu MHz : 2494.085 cache size : 30720 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 12 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid bogomips : 4988.17 bogomips : 4988.17 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
So our CPUs run at 2.5 GHz, and there are 24 CPUs/node. Each CPU has 30 MB of cache. I think there is 4.5 to 5 GB of memory per CPU available.
Also - if you are going to use the Intel Fortran Compiler, you will always get the best performance when using Intel CPUs. It is a known issue that the Intel Fortran Compiler does not optimize well on AMD CPUs. This was intentional. For more information, see this article.
GEOS-Chem v11-01 and newer versions are compatible with the GNU Fortran compiler, which yields nearly-identical results (within the bounds of numerical noise) to simulations that use the Intel Fortran Compiler, at the cost of somewhat slower performance. GNU Fortran should also perform more optimally on AMD CPUs than Intel Fortran.
Network and disk
Hong Liao wrote:
We are setting up a new cluster and we have the option of installing InfiniBand network (using Mellanox hardware) and GPFS.
My questions are:
- Will GEOS-Chem compile and run all right on clusters based on InfiniBand and GPFS?
- Will GEOS-Chem benefit from InfiniBand?
Bob Yantosca replied:
I can say a couple of general things:
- On our Harvard cluster (Odyssey) we use Infiniband to connect to a fast network disk (/n/regal). I don't know if that filesystem is GPFS.
- For GEOS-Chem "Classic" simulations, with OpenMP parallelization, you can only use one node of the machine. So in that case you won't be doing node-to-node communications. The only I/O would be from the CPU to the disk, and I think in that case Infiniband can make a great difference.
- For GCHP, which uses MPI parallelization, then you would also have to be concerned with inter-node communication, as you will be using CPUs across more than 1 node.
- The disk where our met fields live at Harvard is on a Lustre file system. This is more efficient for applications like GC that read a large volume of data.
Judit Flo Gaya replied:
What Bob said is correct, I just want to point out that GPFS and Lustre are "similar" file systems in the sense they both are parallel filesystems, they work differently and they charge differently.They are both a lot more cumbersome to configure, maintain and debug than regular nfs, but they provide (when well implemented and tweaked) a significant increase in the performance of reading and writing files.
Utilizing cloud-computing resources
Bob Yantosca wrote:
Dear GEOS-Chem Users!
We have some exciting news to share with you! Jiawei Zhuang (Harvard) has proven that GEOS-Chem “Classic” can run on cloud computing resources, such as the Amazon Elastic Compute Cloud (EC2) servers. This important new development will allow you to:
- Set up GEOS-Chem simulations quickly without having to invest in a local computing infrastructure
- Run GEOS-Chem simulations that are memory and/or data-intensive (e.g. 1/4 degree nested-grid simulations)
- Purchase only the computational time and data storage that you need (and easily request more resources)
- Easily share data with others through the “cloud”
Two important recent developments make GEOS-Chem cloud computing possible:1. Compatibility with the GNU Fortran compiler
An important (but probably understated) development is that GEOS-Chem v11-01 and newer can now be compiled with the free and open-source GNU Fortran compiler (aka gfortran). GNU Fortran breaks GEOS-Chem’s dependence on proprietary compilers like Intel Fortran (aka ifort) and PGI Fortran (aka pgfortran), which can be prohibitively expensive to purchase.
2. Development of a Python-based visualization and regridding software for GEOS-ChemWe are developing a new visualization/regridding package for GEOS-Chem (called GCPy) that is based on the free and open source Python programming language. Our first use of GCPy was to create the plots for the GCHP benchmark simulations. For more information, please see our Python tools for use with GEOS-Chem wiki page.
Please see the following resources for more information using GEOS-Chem on the Amazon EC2 compute platform:
Zhuang, J., D.J. Jacob, J. Flo-Gaya, R.M. Yantosca, E.W. Lundgren, M.P. Sulprizio, and S.D. Eastham, Enabling immediate access to Earth science models through cloud computing: application to the GEOS-Chem model, Bull. Amer. Met. Soc., 2019, doi:10.1175/BAMS-D-18-0243.1 (PDF)
cloud.geos-chem.org: Cloud-computing tutorial by Jiawei Zhuang.
Please see this list of required software packages on our GEOS-Chem basics page.
Please see our Guide to compilers for GEOS-Chem for detailed inforamtion about the compilers that you can use to build GEOS-Chem.
Required software libraries
Please see our Guide to netCDF in GEOS-Chem for information about netCDF and its dependent software libraries that you need to have on your system in order to use GEOS-Chem.
Please see our Parallelizing GEOS-Chem wiki page for more information about how GEOS-Chem is parallelized.
Estimated Disk Space
The following sections will help you assess how much disk space you need to run GEOS-Chem.
For GEOS-Chem v11-01, the entire size of the HEMCO data directories is about 375 GB total. Depending on the types of GEOS-Chem simulations you run, you may not need to download the entire set of emissions inventories.
The amount of disk space that you will need depends on two things:
- Which type of met data you will use, and
- How many years of met data you will download
Typical disk space requirements are:
|Met field||Resolution||File type||Size|
|MERRA-2||4° x 5°||COARDS-compliant netCDF (compressed)||~ 30 GB/yr|
|MERRA-2||2° x 2.5°||COARDS-compliant netCDF (compressed)||~ 110 GB/yr|
|MERRA-2||0.5° x 0.625 Asia ("AS") nested grid||COARDS-compliant netCDF (compressed)||~ 115 GB/yr|
|MERRA-2||0.5° x 0.625 Europe ("EU") nested grid||COARDS-compliant netCDF (compressed)||~ 58 GB/yr|
|MERRA-2||0.5° x 0.625 North America ("NA") nested grid||COARDS-compliant netCDF (compressed)||~ 110 GB/yr|
|GEOS-FP||4° x 5°||COARDS-compliant netCDF (compressed)||~ 30 GB/yr|
|GEOS-FP||2° x 2.5°||COARDS-compliant netCDF (compressed)||~ 120 GB/yr|
|GEOS-FP||0.25° x 0.3125° China ("CH") nested grid||COARDS-compliant netCDF (compressed)||~ 175 GB/yr|
|GEOS-FP||0.25° x 0.3125° Europe ("EU") nested grid||COARDS-compliant netCDF (compressed)||~ 58 GB/yr|
|GEOS-FP||0.25° x 0.3125° North America ("NA") nested grid||COARDS-compliant netCDF (compressed)||~ 226 GB/yr|
Output generated by GEOS-Chem
For the full-chemistry simulations, we can look to the GEOS-Chem benchmarks as a rough upper limit of how much disk space is needed for diagnostic output. The GEOS-Chem 12.4.0 benchmark simulation generates approximately 760 MB/month of files (monthly-mean output). This includes netCDF-format diagnostic output and restart files only, but excludes binary punch output (which is being phased out).
This would be an upper limit because in the benchmark simulations, we archive the "kitchen sink"—all species concentrations, various aerosol diagnostics, convective fluxes, dry dep fluxes and velocities, J-values, various chemical and meteorological quantities, transport fluxes, wet deposition diagnostics, and emissions diagnostics. Most GEOS-Chem users would not need to archive this much output.
In GEOS-Chem 12.5.0 (ETA late summer 2019), we will introduce the capability of horizontal and vertical subsetting for diagnostics being archived to netCDF output. This will let you save only a sub-region of the globe, or a subset of vertical levels (or both) in case you do not wish to archive diagnostics for the entire globe. This can help to further reduce the amount of diagnostic output being sent to disk.
The GEOS-Chem specialty simulations—simulations for species with first-order loss by prescribed oxidant fields (i.e. Hg, CH4, CO2, CO)—will produce much less output than the benchmark simulations. This is because these simulations typically only have a few species.
Also note: Archiving hourly or daily timeseries output would require much more disk space than the monthly-mean output. The disk space actually used will depend on how many quantities are archived and what the archival frequency is.
Performance and scalability
Please see our Guide to GEOS-Chem performance for more information about GEOS-Chem's performance and scalability.