Parallelizing GEOS-Chem

From Geos-chem
Jump to: navigation, search

In the late 1990's, a new open standard for parallel computing named OpenMP was developed by several compiler vendors, including Sun, SGI, Compaq, The Portland Group, and Microsoft. The resulting standard allows parallel processing source code (written in either Fortran or C) to be ported between different platforms with minimal effort.

All GEOS–Chem parallel processing commands now adhere to the new OpenMP standard. Therefore, in order to run GEOS–Chem on your platform, you must make sure that your compiler supports OpenMP.

Overview of OpenMP parallelization

In GEOS–Chem, parallelization is acheieved by splitting the work contained by a DO-loop across several CPUs. Because most GEOS-Chem arrays represent quantities (meteorological fields, tracer concentrations, chemical species concentrations) on a geospatial grid, when we parallelize, we give each CPU its own region of the "world" to work on. However, all CPUs can see the entire "world" (i.e. the entire memory on the machine) at once, but is just restricted to working on its own region of the "world".

OpenMP Demo.png

Also, it is important to remember that OpenMP is a loop-level parallelization. That means that only commands within selected DO loops will execute in parallel. All GEOS-Chem simulations start off on a single CPU (known as the "master CPU"). Upon entering a parallel DO loop, other CPUs will be invoked to share the workload within the loop. At the end of the parallel DO loop, the other CPUs return to standby status and the execution continues only on the "master" CPU.

Example using OpenMP directives

Here is an example of parallel code written with the OpenMP directives:

 !$OMP PARALLEL DO
 !$OMP+SHARED( A )
 !$OMP+PRIVATE( I, J, B )
 !$OMP+SCHEDULE( DYNAMIC )     
       DO J = 1, JJPAR
       DO I = 1, IIPAR
          B = A(I,J)
          A(I,J) = B * 2.0
       ENDDO
       ENDDO
 !$OMP END PARALLEL DO

The !$OMP PARALLEL DO (which must start in column 1) is called a sentinel. It tells the compiler that the following DO-loop is to be executed in parallel. The commands following the sentinel specify further options for the parallelization. These options may be spread across multiple lines by using the OpenMP line continuation command !$OMP+.

The above DO-loop will assign different (I,J) pairs to different processors. The more processors specified, the less time it will take to do the operation.

The declaration SHARED( A ) tells the compiler that the A array may be shared across all processors. We say that A is a SHARED variable.

Even though A itself can be shared across all CPUs, its indices I and J cannot be shared. Because different CPUs will be handling different (I,J) pairs, each CPU needs its own local copy of (I,J). In this way, the processors will not interfere with each other by overwriting each other's values of I and J. We say that I and J need to be made PRIVATE to the parallel loop; this declaration is achieved with the !$OMP+PRIVATE( I, J ) declaration. (The + is simply a continuation character.)

The B scalar also needs to be declared PRIVATE, since its value will be recomputed for each (I,J) pair. We thus must extend the declaration of !$OMP+PRIVATE( I, J ) to !$OMP+PRIVATE( I, J, B ).

The !$OMP END PARALLEL DO is another sentinel. It lets the compiler know where the parallel DO-loop ends. The !$OMP END PARALLEL DO sentinel is optional and thus may be omitted. However, specifying both the beginning and end of parallel sections is not only good style, but also enhances the overall readability of the code.

--Bob Yantosca (talk) 18:19, 2 May 2016 (UTC)

Fortran-77 fixed-format

Many GEOS-Chem routines use the old "Fortran 77" fixed-format style. OpenMP sentinels in fixed-format code, such as the example below,

!$OMP PARALLEL DO
!$OMP+DEFAULT( SHARED )
!$OMP+PRIVATE( I, J, L )
            DO L = 1, LLPAR
            DO J = 1, JJPAR
            DO I = 1, IIPAR
               ... etc ...
            ENDDO 
            ENDDO
            ENDDO
!$OMP END PARALLEL DO

must adhere to the following syntax:

  1. The !$OMP statements must ALWAYS start in column 1, regardless of where the DO loop actually starts, and:
  2. If the !$OMP sentinel extends across more than one line, you must place a ine continuation character (we recommend +) in column #6.

--Bob Y. 11:22, 25 February 2013 (EST)

Fortran-90 free-format

Starting with GEOS-Chem v9-01-03, many GEOS-Chem routines are being converted from the the old Fortran-77 fixed-format coding style to the new Fortran-90 free-format style. In free-format code you may line up the !$OMP sentinels so that they are flush with the DO loops, such as in this example:

            !$OMP PARALLEL DO
            !$OMP DEFAULT( SHARED  ) &
            !$OMP PRIVATE( I, J, L ) &
            DO L = 1, LLPAR
            DO J = 1, JJPAR
            DO I = 1, IIPAR
               ... etc ...
            ENDDO 
            ENDDO
            ENDDO
            !$OMP END PARALLEL DO

For !$OMP sentinels that extend over more than one line, you must place an ampersand (&) character at the end of the line. The ampersand is the official Fortran-90 line continuation character.

--Bob Yantosca (talk) 18:19, 2 May 2016 (UTC)

Frequently asked questions about parallelization

Here are some frequently asked questions about parallelizing GEOS-Chem with the OpenMP directives:

How can I tell if a variable has to be declared with PRIVATE or SHARED?

Here is a quick and dirty rule of thumb for determining which in a parallel DO-loop must be declared PRIVATE:

  • All loop indices must be declared PRIVATE.
  • All array indices must be declared PRIVATE.
  • All scalars which are assigned a value within a parallel loop must be declared PRIVATE.
  • All arguments to a function or subroutine called within a parallel loop must be declared PRIVATE.

Also, you may also have noticed that the first character of both the !$OMP PARALLEL DO sentinel and the !$OMP+ line continuation command is a legal Fortran comment character (!). This is by design. In order to invoke the parallel procesing commands, you must turn on a specific switch in your makefile (this is -openmp for the Intel Fortran Compiler; check your compiler manual for other platforms). If you do not specify multiprocessor compilation, then the parallel processing directives will be considered as Fortran comments, and the associated DO-loops will be executed on one processor only.

It should be noted that OpenMP commands are not the same as MPI (message passing interface). With OpenMP directives, you are able to split a job among several processors on the same machine. You are NOT able to split a job among several processors on different machines. Therefore, OpenMP is not suitable for running on distributed memory architectures.

--Bob Yantosca (talk) 17:49, 2 May 2016 (UTC)

My parallel loop calls a subroutine. Do all of the variables in the subroutine have to be declared PRIVATE?

Let's say you have the following routine:

 SUBROUTINE mySub( X, Y, Z )

    ! Dummy variables for input
    REAL, INTENT(IN)  :: X, Y
 
    ! Dummy variable for output
    REAL, INTENT(OUT) :: Z
 
    ! Add X + Y to make Z
    Z = X + Y

 END SUBROUTINE mySub

and you call this from within an OpenMP parallel loop:

 INTEGER :: N
 REAL    :: A, B, C

 !$OMP PARALLEL DO
 !$OMP+DEFAULT( SHARED )
 !$OMP+PRIVATE( N, A, B, C )
 DO N = 1, nIterations

    ! Get inputs from some array
    A = Input(N,1)
    B = Input(N,2)

    ! Add A + B to make C
    CALL mySub( A, B, C )

    ! Save the output in an array
    Output(N) = C 

 ENDDO
 !$OMP END PARALLEL DO

As described above, because N, A, B, and C are scalars within the parallel loop, they must be declared with !$OMP PRIVATE. But note that you do not have to declare the variables X, Y, and Z with !$OMP PRIVATEwithin subroutine mySub. This is because each CPU will call mySub in a separate thread of execution, and each CPU will create its own instance of X, Y, and Z.

--Bob Yantosca (talk) 17:46, 2 May 2016 (UTC)

What does the THREADPRIVATE statement do?

Let's modify the above example slightly. Let's now suppose that subroutine mySub from the prior example is now part of a Fortran-90 module, which looks like this:

 MODULE myModule

   ! Module variable: 
   ! This is global and acts as if it were in a F77-style common block
   REAL, PUBLIC :: Z

 CONTAINS

   SUBROUTINE mySub( X, Y )

    ! Dummy variables for input
    REAL, INTENT(IN)  :: X, Y
 
    ! Add X + Y to make Z
    ! NOTE that Z is now a global variable
    Z = X + Y
 
   END SUBROUTINE mySub
  
 END MODULE myModule

Note that Z is now a global scalar variable. Let's now use the same parallel loop as before:

 ! Get the Z variable from myModule
 USE myModule, ONLY : Z

 INTEGER :: N
 REAL    :: A, B, C

 !$OMP PARALLEL DO
 !$OMP+DEFAULT( SHARED )
 !$OMP+PRIVATE( N, A, B, C )
 DO N = 1, nIterations

    ! Get inputs from some array
    A = Input(N,1)
    B = Input(N,2)

    ! Add A + B to make C
    CALL mySub( A, B )

    ! Save the output in an array
    Output(N) = Z

 ENDDO
 !$OMP END PARALLEL DO

Because Z is now a global variable, if we do not take special precautions, it will get repeatedly (and mercilessly!) overwritten by each CPU. To prevent this from happening, we must declare Z with the !$OMP THREADPRIVATE statement in the module where it is defined. We shall add the line in purple:

 MODULE myModule

   ! Module variable: 
   ! This is global and acts as if it were in a F77-style common block
   REAL, PUBLIC :: Z
   !$OMP THREADPRIVATE( Z )

   ... etc ...

By declaring Z with the !$OMP THREADPRIVATE statement, this tells the computer to keep a separate copy of Z for each CPU, so that each CPU does not interfere with each other. Also note, when you make a variable !$OMP THREADPRIVATE, this means that the variable has no meaning outside of the parallel loop. So you should not rely on using the value of Z elsewhere in your code.

Most of the time you won't have to use the !$OMP THREADPRIVATE statement. You may need to use it if you are trying to parallelize code that came from someone else.

--Bob Yantosca (talk) 18:03, 2 May 2016 (UTC)

Can I use pointers within an OpenMP parallel loop?

You may use pointer-based variables (including derived-type objects) within an OpenMP parallel loop. But you must make sure that you point to the target within the parallel loop section AND that you also nullify the pointer within the parallel loop section. For example:

INCORRECT:

 ! Declare variables
 REAL, TARGET  :: myArray(IIPAR,JJPAR)
 REAL, POINTER :: myPtr  (:          )
 
 ! Declare an OpenMP parallel loop
 !$OMP PARALLEL DO
 !$OMP+DEFAULT( SHARED )
 !$OMP+PRIVATE( I, J, myPtr, … )
 DO J = 1, JJPAR
 DO I = 1, IIPAR 
 
    ! Point to a variable.  
    !This must be done in the parallel loop section.
    myPtr => myArray(:,J)
     
    . . . do other stuff . . . 

 ENDDO
 !$OMP END PARALLEL DO

 ! Nullify the pointer.
 ! NOTE: This is incorrect because we nullify the pointer outside of the loop.
 myPtr => NULL()

CORRECT:

 ! Declare variables
 REAL, TARGET  :: myArray(IIPAR,JJPAR)
 REAL, POINTER :: myPtr  (:          )
  
 ! Declare an OpenMP parallel loop
 !$OMP PARALLEL DO
 !$OMP+DEFAULT( SHARED )
 !$OMP+PRIVATE( I, J, myPtr, … )
 DO J = 1, JJPAR
 DO I = 1, IIPAR 

    ! Point to a variable.  
    ! This must be done in the parallel loop section
    myPtr => myArray(:,J)
     
    . . . do other stuff . . . 

    ! Nullify the pointer.
    ! This must be done before the end of the parallel loop.
    myPtr => NULL()

 ENDDO
 !$OMP END PARALLEL DO

In other words, pointers used in OpenMP parallel loops only have meaning within the parallel loop.

--Bob Yantosca (talk) 17:11, 2 May 2016 (UTC)

How many CPUs may I use in my GEOS-Chem simulation?

You can use as many CPUs as there are on a single node of your cluster. With OpenMP parallelization, the restriction is that all of the CPUs have to see all the memory on the machine (or node of a larger machine). So if you have 32 CPUs on a single node, you can use them. But you may not get good performance beyond about 16 CPUs. The more CPUs you add, the more overhead you add (i.e. all the CPUs have to talk to each other), and that can kill your performance. See these references:

  1. 7-model-day timing tests done with GEOS-Chem
  2. GEOS-Chem scalability
  3. Performance of GEOS-Chem v9-02

--Bob Yantosca (talk) 20:06, 4 January 2017 (UTC)

Why is GEOS-Chem is not using all the CPUs I requested?

The number of threads for an OpenMP simulation is determined by the environment variable OMP_NUM_THREADS. You may have to set this manually in your run script (or .bashrc or .cshrc file) to get the proper # of CPUs. Check with your local IT people. Typically you set it with:

setenv OMP_NUM_THREADS 8   # For csh or tcsh
export OMP_NUM_THREADS=8   # For bash

--Bob Y. 10:26, 26 August 2014 (EDT)

For more information

Please consult the following web pages for more information about the OpenMP parallelization directives:

MPI

We have developed a framework (known as GEOS-Chem "HP" or GCHP) that will allow GEOS-Chem to take advantage of distributed computing via the Earth System Model Framework and MPI parallelization. We invite all GEOS-Chem users to try installing and running the GCHP framework on their systems. For detailed information, please see our GEOS-Chem HP wiki page.

--Bob Yantosca (talk) 14:31, 7 June 2017 (UTC)