- Parallelizing GEOS-Chem
- GEOS-Chem scalability
- GEOS-Chem 7-day timing tests
- GEOS-Chem 1-month benchmark timing results
- Profiling GEOS-Chem with the TAU performance system
- Speeding up GEOS-Chem
- 1 Overview of OpenMP parallelization
- 2 Example using OpenMP directives
- 3 Frequently asked questions about parallelization
- 3.1 How can I tell if a variable has to be declared with PRIVATE or SHARED?
- 3.2 My parallel loop calls a subroutine. Do all of the variables in the subroutine have to be declared PRIVATE?
- 3.3 What does the THREADPRIVATE statement do?
- 3.4 Can I use pointers within an OpenMP parallel loop?
- 3.5 How many cores may I use in my GEOS-Chem simulation?
- 3.6 Why is GEOS-Chem is not using all the cores I requested?
- 4 MPI
- 5 Further reading
Overview of OpenMP parallelization
In the late 1990's, a new open standard for parallel computing named OpenMP was developed by several compiler vendors, including Sun, SGI, Compaq, The Portland Group, and Microsoft. The resulting standard allows parallel processing source code (written in either Fortran or C) to be ported between different platforms with minimal effort.
All GEOS–Chem "Classic" parallel processing commands now adhere to the OpenMP standard. Therefore, in order to run GEOS–Chem on your platform, you must make sure that your compiler supports OpenMP.
In GEOS–Chem "Classic", parallelization is acheieved by splitting the work contained by a DO-loop across several CPUs. Because most GEOS-Chem arrays represent quantities (meteorological fields, tracer concentrations, chemical species concentrations) on a geospatial grid, when we parallelize, we give each CPU its own region of the "world" to work on. However, all CPUs can see the entire "world" (i.e. the entire memory on the machine) at once, but is just restricted to working on its own region of the "world".
Also, it is important to remember that OpenMP is a loop-level parallelization. That means that only commands within selected DO loops will execute in parallel. All GEOS-Chem simulations start off on a single CPU (known as the "master CPU"). Upon entering a parallel DO loop, other CPUs will be invoked to share the workload within the loop. At the end of the parallel DO loop, the other CPUs return to standby status and the execution continues only on the "master" CPU.
One restriction of using OpenMP parallelization is that your simulations can run only on as many cores are shared by the same memory. In practice, this limits GEOS-Chem "Classic" simulations to using 1 node of a shared computer cluster. In practice, this means that GEOS-Chem "Classic" simulations are typically restricted to using less than 64 cores per simulation.
We should also note that GEOS-Chem with the High Performance Option (aka GCHP uses a different type of parallelization called MPI. This allows GCHP to use hundreds or thousands of cores across several nodes of a computer cluster. See our Getting Started With GCHP page for more information.
Example using OpenMP directives
Here is an example of parallel code written with the OpenMP directives:
!$OMP PARALLEL DO !$OMP+SHARED( A ) !$OMP+PRIVATE( I, J, B ) !$OMP+SCHEDULE( DYNAMIC ) DO J = 1, JJPAR DO I = 1, IIPAR B = A(I,J) A(I,J) = B * 2.0 ENDDO ENDDO !$OMP END PARALLEL DO
!$OMP PARALLEL DO (which must start in column 1) is called a sentinel. It tells the compiler that the following DO-loop is to be executed in parallel. The commands following the sentinel specify further options for the parallelization. These options may be spread across multiple lines by using the OpenMP line continuation command
The above DO-loop will assign different
(I,J) pairs to different processors. The more processors specified, the less time it will take to do the operation.
SHARED( A ) tells the compiler that the
A array may be shared across all processors. We say that
A is a SHARED variable.
A itself can be shared across all CPUs, its indices
J cannot be shared. Because different CPUs will be handling different
(I,J) pairs, each CPU needs its own local copy of
(I,J). In this way, the processors will not interfere with each other by overwriting each other's values of
J. We say that
J need to be made PRIVATE to the parallel loop; this declaration is achieved with the
!$OMP+PRIVATE( I, J ) declaration. (The
+ is simply a continuation character.)
B scalar also needs to be declared PRIVATE, since its value will be recomputed for each
(I,J) pair. We thus must extend the declaration of
!$OMP+PRIVATE( I, J ) to
!$OMP+PRIVATE( I, J, B ).
!$OMP END PARALLEL DO is another sentinel. It lets the compiler know where the parallel DO-loop ends. The
!$OMP END PARALLEL DO sentinel is optional and thus may be omitted. However, specifying both the beginning and end of parallel sections is not only good style, but also enhances the overall readability of the code.
Many GEOS-Chem routines use the old "Fortran 77" fixed-format style. OpenMP sentinels in fixed-format code, such as the example below,
!$OMP PARALLEL DO !$OMP+DEFAULT( SHARED ) !$OMP+PRIVATE( I, J, L ) DO L = 1, LLPAR DO J = 1, JJPAR DO I = 1, IIPAR ... etc ... ENDDO ENDDO ENDDO !$OMP END PARALLEL DO
must adhere to the following syntax:
!$OMPstatements must ALWAYS start in column 1, regardless of where the DO loop actually starts, and:
- If the
!$OMPsentinel extends across more than one line, you must place a ine continuation character (we recommend
+) in column #6.
--Bob Y. 11:22, 25 February 2013 (EST)
Starting with GEOS-Chem v9-01-03, many GEOS-Chem routines are being converted from the the old Fortran-77 fixed-format coding style to the new Fortran-90 free-format style. In free-format code you may line up the
!$OMP sentinels so that they are flush with the DO loops, such as in this example:
!$OMP PARALLEL DO !$OMP DEFAULT( SHARED ) & !$OMP PRIVATE( I, J, L ) & DO L = 1, LLPAR DO J = 1, JJPAR DO I = 1, IIPAR ... etc ... ENDDO ENDDO ENDDO !$OMP END PARALLEL DO
!$OMP sentinels that extend over more than one line, you must place an ampersand (
&) character at the end of the line. The ampersand is the official Fortran-90 line continuation character.
Environment variable settings for OpenMP
Please see our Enviroment variables for OpenMP parallelization wiki section for more information about how to define the number of cores (aka threads) that you wish your parallelized GEOS-Chem "Classic" code to use.
Frequently asked questions about parallelization
Here are some frequently asked questions about parallelizing GEOS-Chem with the OpenMP directives:
How can I tell if a variable has to be declared with PRIVATE or SHARED?
Here is a quick and dirty rule of thumb for determining which in a parallel DO-loop must be declared PRIVATE:
- All loop indices must be declared PRIVATE.
- All array indices must be declared PRIVATE.
- All scalars which are assigned a value within a parallel loop must be declared PRIVATE.
- All arguments to a function or subroutine called within a parallel loop must be declared PRIVATE.
Also, you may also have noticed that the first character of both the
!$OMP PARALLEL DO sentinel and the
!$OMP+ line continuation command is a legal Fortran comment character (
!). This is by design. In order to invoke the parallel procesing commands, you must turn on a specific switch in your makefile (this is
-openmp for the Intel Fortran Compiler; check your compiler manual for other platforms). If you do not specify multiprocessor compilation, then the parallel processing directives will be considered as Fortran comments, and the associated DO-loops will be executed on one processor only.
It should be noted that OpenMP commands are not the same as MPI (message passing interface). With OpenMP directives, you are able to split a job among several processors on the same machine. You are NOT able to split a job among several processors on different machines. Therefore, OpenMP is not suitable for running on distributed memory architectures.
My parallel loop calls a subroutine. Do all of the variables in the subroutine have to be declared PRIVATE?
Let's say you have the following routine:
SUBROUTINE mySub( X, Y, Z ) ! Dummy variables for input REAL, INTENT(IN) :: X, Y ! Dummy variable for output REAL, INTENT(OUT) :: Z ! Add X + Y to make Z Z = X + Y END SUBROUTINE mySub
and you call this from within an OpenMP parallel loop:
INTEGER :: N REAL :: A, B, C !$OMP PARALLEL DO !$OMP+DEFAULT( SHARED ) !$OMP+PRIVATE( N, A, B, C ) DO N = 1, nIterations ! Get inputs from some array A = Input(N,1) B = Input(N,2) ! Add A + B to make C CALL mySub( A, B, C ) ! Save the output in an array Output(N) = C ENDDO !$OMP END PARALLEL DO
As described above, because
C are scalars within the parallel loop, they must be declared with
!$OMP PRIVATE. But note that you do not have to declare the variables
!$OMP PRIVATEwithin subroutine
mySub. This is because each CPU will call
mySub in a separate thread of execution, and each CPU will create its own instance of
What does the THREADPRIVATE statement do?
Let's modify the above example slightly. Let's now suppose that subroutine
mySub from the prior example is now part of a Fortran-90 module, which looks like this:
MODULE myModule ! Module variable: ! This is global and acts as if it were in a F77-style common block REAL, PUBLIC :: Z CONTAINS SUBROUTINE mySub( X, Y ) ! Dummy variables for input REAL, INTENT(IN) :: X, Y ! Add X + Y to make Z ! NOTE that Z is now a global variable Z = X + Y END SUBROUTINE mySub END MODULE myModule
Note that Z is now a global scalar variable. Let's now use the same parallel loop as before:
! Get the Z variable from myModule USE myModule, ONLY : Z INTEGER :: N REAL :: A, B, C !$OMP PARALLEL DO !$OMP+DEFAULT( SHARED ) !$OMP+PRIVATE( N, A, B, C ) DO N = 1, nIterations ! Get inputs from some array A = Input(N,1) B = Input(N,2) ! Add A + B to make C CALL mySub( A, B ) ! Save the output in an array Output(N) = Z ENDDO !$OMP END PARALLEL DO
Z is now a global variable, if we do not take special precautions, it will get repeatedly (and mercilessly!) overwritten by each CPU. To prevent this from happening, we must declare
Z with the
!$OMP THREADPRIVATE statement in the module where it is defined. We shall add the line in purple:
MODULE myModule ! Module variable: ! This is global and acts as if it were in a F77-style common block REAL, PUBLIC :: Z !$OMP THREADPRIVATE( Z ) ... etc ...
Z with the
!$OMP THREADPRIVATE statement, this tells the computer to keep a separate copy of
Z for each CPU, so that each CPU does not interfere with each other. Also note, when you make a variable
!$OMP THREADPRIVATE, this means that the variable has no meaning outside of the parallel loop. So you should not rely on using the value of
Z elsewhere in your code.
Most of the time you won't have to use the
!$OMP THREADPRIVATE statement. You may need to use it if you are trying to parallelize code that came from someone else.
Can I use pointers within an OpenMP parallel loop?
You may use pointer-based variables (including derived-type objects) within an OpenMP parallel loop. But you must make sure that you point to the target within the parallel loop section AND that you also nullify the pointer within the parallel loop section. For example:
! Declare variables REAL, TARGET :: myArray(IIPAR,JJPAR) REAL, POINTER :: myPtr (: ) ! Declare an OpenMP parallel loop !$OMP PARALLEL DO !$OMP+DEFAULT( SHARED ) !$OMP+PRIVATE( I, J, myPtr, … ) DO J = 1, JJPAR DO I = 1, IIPAR ! Point to a variable. !This must be done in the parallel loop section. myPtr => myArray(:,J) . . . do other stuff . . . ENDDO !$OMP END PARALLEL DO ! Nullify the pointer. ! NOTE: This is incorrect because we nullify the pointer outside of the loop. myPtr => NULL()
! Declare variables REAL, TARGET :: myArray(IIPAR,JJPAR) REAL, POINTER :: myPtr (: ) ! Declare an OpenMP parallel loop !$OMP PARALLEL DO !$OMP+DEFAULT( SHARED ) !$OMP+PRIVATE( I, J, myPtr, … ) DO J = 1, JJPAR DO I = 1, IIPAR ! Point to a variable. ! This must be done in the parallel loop section myPtr => myArray(:,J) . . . do other stuff . . . ! Nullify the pointer. ! This must be done before the end of the parallel loop. myPtr => NULL() ENDDO !$OMP END PARALLEL DO
In other words, pointers used in OpenMP parallel loops only have meaning within the parallel loop.
How many cores may I use in my GEOS-Chem simulation?
You can use as many computational cores as there are on a single node of your cluster. With OpenMP parallelization, the restriction is that all of the cores have to see all the memory on the machine (or node of a larger machine). So if you have 32 CPUs on a single node, you can use them. We have shown that GEOS-Chem run times will continue to decrease (albeit asymptotically) when you increase the number of cores.
For more information, please see our Guide to GEOS-Chem performance.
Why is GEOS-Chem is not using all the cores I requested?
The number of threads for an OpenMP simulation is determined by the environment variable OMP_NUM_THREADS. You may have to set this manually in your run script (or .bashrc or .cshrc file) to get the proper # of cores Check with your local IT people. Typically you set it with:
setenv OMP_NUM_THREADS 8 # For csh or tcsh export OMP_NUM_THREADS=8 # For bash
--Bob Y. 10:26, 26 August 2014 (EDT)
The OpenMP parallelization used by GEOS-Chem is an example of shared memory parallelization. As we have seen, we are restricted to using a single node of a computer cluster. This is because all of the cores need to talk with all of the memory on the node.
On the other hand, MPI (Message Passing Interface) parallelzation is an example of distributed parallelization. An MPI library installation is required for passing memory from one physical system to another (i.e. across nodes).
We have developed a framework (known as GEOS-Chem "HP" or GCHP) that will allow GEOS-Chem to take advantage of distributed computing via the Earth System Model Framework and MPI parallelization. We invite all GEOS-Chem users to try installing and running the GCHP framework on their systems. For detailed information, please see our GEOS-Chem HP wiki page.
- OpenMP web site
- Introduction to programming with OpenMP (Jason Blevins)
- Guide to compilers for GEOS-Chem
- Specifying parallelization settings for OpenMP parallelization