Bugs and fixes

From Geos-chem
Revision as of 14:12, 13 June 2008 by Bmy (Talk | contribs)

Jump to: navigation, search

On this page we list the GEOS-Chem bugs that users have recently encountered, and how to fix them.

Mis-labeling of SOA quantities in carbon_mod.f

Colette Heald (heald@atmos.colostate.edu) wrote:

I noticed previously that the parameters in (routine) SOA_PARA were confusingly ordered, but it's only today that I realized that in fact they are mis-labeled as well. The actual numbers are all perfectly correct but they should correspond to the information in Tables 3 and 4 for the categories of Table 1 of the Chung and Seinfeld (2002) paper. I have re-organized the code & comments for clarity (lines 1875-1939), they should read as:
     ! Average of ALPHA-PINENE, BETA-PINENE, SABINENE, D3-CARENE
     RALPHA(1,1) = 0.067d0            
     RALPHA(2,1) = 0.35425d0

     ! LIMONENE
     RALPHA(1,2) = 0.239d0
     RALPHA(2,2) = 0.363d0

     ! Average of TERPINENES and TERPINOLENE
     RALPHA(1,3) = 0.0685d0
     RALPHA(2,3) = 0.2005d0

     ! Average of MYRCENE, LINALOOL, TERPINENE-4-OL, OCIMENE
     RALPHA(1,4) = 0.06675d0
     RALPHA(2,4) = 0.135d0

     ! Average of BETA-CARYOPHYLLENE and and ALPHA-HUMULENE
     RALPHA(1,5) = 1.0d0
     RALPHA(2,5) = 0.0d0

     ! Using BETA-PINENE for all species for NO3 oxidation
     ! Data from Table 4 of Griffin, et al., JGR 104 (D3): 3555-3567 (1999)
     RALPHA(3,:) = 1.d0           

     ! Here we define some alphas for isoprene (dkh, bmy, 5/22/06)

     ! high NOX  [Kroll et al, 2005]
     !RALPHA(1,6) = 0.264d0
     !RALPHA(2,6) = 0.0173d0
     !RALPHA(3,6) = 0d0

     ! low NOX   [Kroll et al, 2006; Henze and Seinfeld, 2006]
     RALPHA(1,6) = 0.232d0
     RALPHA(2,6) = 0.0288d0
     RALPHA(3,6) = 0d0

     !=================================================================
     ! Equilibrium gas-particle partition coefficients of 
     ! semi-volatile compounds [ug-1 m**3]
     !=================================================================

     ! Average of ALPHA-PINENE, BETA-PINENE, SABINENE, D3-CARENE
     KOM(1,1) = 0.1835d0
     KOM(2,1) = 0.004275d0

     ! LIMONENE
     KOM(1,2) = 0.055d0
     KOM(2,2) = 0.0053d0

     ! Average of TERPINENES and TERPINOLENE
     KOM(1,3) = 0.133d0
     KOM(2,3) = 0.0035d0

     ! Average of MYRCENE, LINALOOL, TERPINENE-4-OL, OCIMENE
     KOM(1,4) = 0.22375d0
     KOM(2,4) = 0.0082d0

     ! Average of BETA-CARYOPHYLLENE and and ALPHA-HUMULENE
     KOM(1,5) = ( 0.04160d0 + 0.0501d0 ) / 2.d0
     KOM(2,5) = 0.0d0

     ! NOT APPLICABLE -- using BETA-PINENE for all species
     ! Data from Table 4 of Griffin, et al., JGR 104 (D3): 3555-3567 (1999)
     KOM(3,:) = 0.0163d0

Bob Yantosca (heald@atmos.colostate.edu) wrote:

We will put this in the next update of the code. Thanks!

--Bob Y. 10:11, 13 June 2008 (EDT)

Numerical noise bug in upbdflx_mod.f

Ray Nassar (ray@io.as.harvard.edu) wrote:

The run I did last night confirmed that turning off "Use strat O3/NOy BC's" does change O3 and NOx at lower levels so I cannot turn this off for sensitivity tests if comparing to past runs.
I have tried the debugging approach which you have suggested and it indicates that the problem is coming from the triple loop:
        DO L = LMIN, LLPAR
        DO J = 1,    JJPAR
        DO I = 1,    IIPAR

           ! Skip over tropospheric boxes
           IF ( ITS_IN_THE_STRAT( I, J, L ) ) THEN

              ! PNOY = P(NOy) converted from [v/v/s] to [v/v]
              PNOY = STRATPNOY(J,L) * DTDYN

              ! Add [NOx] and [HNO3] to PNOY.
              ! PNOY is now the total [NOy] concentration
              PNOY = PNOY + STT(I,J,L,IDTNOX) + STT(I,J,L,IDTHNO3)

              ! Partition total [NOy] to [NOx], units are [v/v]
              STT(I,J,L,IDTNOX)  = PNOY * XRATIO(J,L)

              ! Partition total [NOy] to [HNO3], units are [v/v]
              STT(I,J,L,IDTHNO3) = PNOY * ( 1d0 - XRATIO(J,L) )

           ENDIF
        ENDDO
        ENDDO
        ENDDO
in the file upbdflx_mod.f. Since XRATIO can have a value of zero, it means that numerical noise sometimes makes 1d0 - XRATIO(J,L) a negative number and then STT(I,J,L,IDTHNO3) becomes negative and triggers the error message. For the two instances where the difference is computed, I have now added the lines:
       ! Ray Nassar (2008-06-11) -------------------------------
       !
       ! The kludge below is to deal with errors which result
       ! from subtracting 1d0 - XRATIO(J,L). When XRATIO should
       ! be equal 1, the binary representation will have a
       ! finite error making it slightly more or less than 1.
       ! For XRATIO slightly more than 1, STT(I,J,L,IDTHNO3)<=0
       ! which triggers a CHECK_STT error message.
       !
       if (STT(I,J,L,IDTHNO3) <= 0d0) then
          STT(I,J,L,IDTHNO3) = 0d0
       endif
       !--------------------------------------------------------
immediately after the calculation of STT(I,J,L,IDTHNO3) and this fixes the problem! I think this will have to be fixed in the next version of the code.

Philippe Le Sager (plesager@seas.as.harvard.edu) wrote:

Thanks for catching that. We will put that in the next batch of fixes, maybe simply with a
   MAX( ( 1d0 -  XRATIO(J,L) ), 1d-20) 
in place of
   ( 1d0 - XRATIO(J,L) ), 
since we want to have strictly positive concentration in the troposphere and some boxes switch b/w troposphere and stratosphere.

--Bob Y. 11:13, 12 June 2008 (EDT)

Problem w/ ND65 diagnostic and dynamic tropopause

Yuxuan Wang (wang3@fas.harvard.edu) wrote:

It looks like that it's the dynamic tropopause in v8-01-01 that causes the unrealistically high AD65 values (prod and loss rate of chemical families). When I turn off the dynamic tropopause in input.geos, the model gives AD65 that's reasonable and also comparable to my previous simulations with code version 7. I took a careful look of tropopause_mod.f, but couldn't find where the problem is.
Yuxuan

Philippe Le Sager (plesager3@seas.harvard.edu) wrote:

I think I found the root of your problem with P/L diagnostic. The line 606 (subroutine do_diag_pl, in diag_pl_mod.f) :
          ! Zero each "fake" ND65 prod/loss family for next iteration
          CSPEC(JLOOP,IFAM(N)) = 0.0d0
is done too late in the code. Here is the full story:
At each time step, the CSPEC_FULL array is used to initialize the CSPEC array before entering SMVGEAR. CSPEC_FULL is filled with CSPEC after SMVGEAR. So it should account for the line 606 above, but it does not since the family P/L diagnostic is done after CSPEC_FULL has been updated.
The fix is easy. In CHEMDR.F you need to move the call to SAVE_FULL_TROP to the end of the routine, so CSPEC_FULL is updated after the P/L diagnostic. You can look at my: ~phs/Code.current/chemdr.f
If you could let me know if that fix you problem with a quick run, that would be perfect
Thanks,
-Philippe

Yuxuan Wang (wang3@fas.harvard.edu) wrote:

I just did a one-day test and it worked! P/L diagnostics now are at the right order of magnitude after I turned on the variable tropopause. That's indeed the fix. Thank you!!
Yuxuan

This fix will go into the next release of GEOS-Chem.

--Bob Y. 16:17, 3 June 2008 (EDT)

Negative tracer in routine WETDEP because of negative RH

See this post: GEOS-5 issues#Small negative RH value in 20060206.a6.2x25 file

Fixes are available at ftp://ftp.as.harvard.edu/pub/exchange/phs/GEOS-Chem-UPDATE/

--phs 16:31, 6 June 2008 (EDT)

Negative tracer in routine WETDEP

Dylan Millet (dbm@umn.edu) wrote:

I'm having a run die consistently at the same time (October 1, 2005; first time step of the month) in large-scale wetdep, with an STT element < 0.
  • Platform: Linux cluster
  • Threads: 8
  • Version: v7-4-13 out of the box.
  • GEOS4, 4x5, 30L, full chemistry
  • IFORT 10.1
In Section 6 (No Downward Precip) of wetscav_mod.f, subroutine safety is getting called.
    WETDEP - STT < 0 at    1   1  29 for tracer    7 in area    6
(First of all it seems odd to do wetdep for L=29, this is 63 km up). Have you seen anything like this? I ran for the whole year starting Jan 1 successfully until this point.
... By the way, the problem persists when I turn off chemistry altogether.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

I used your restart file and the same input.geos (w/ chemistry on and off). My code went thru without problem. I tried both Sun Studio and Ifort 9 compilers, and the later on two different machines (altix and ceres). I used v7-04-13 and v8-01-01. I never reproduced your error.
We just got the new Ifort 10, and tried it too. I run v8-01-01 without an error. But when I tried v7-04-13, I finally reproduced your error, with the exact same negative values!
In other words: the bug happens with IFort 10 and v7-04-13 only.
Also, have a look at this recent development. This is not the reason for your bug (I tried v8 w/ ifort 10 and isorropia -like v7-04-13- and it did not crash), but using RPMARES instead of Isorropia may be a way to fix it.
... More about the Ifort 10 / v7-04-13 issue. When I wanted to debug with TotalView, I could not reproduce the bug anymore.... because I simply suppress any optimization. So, I did more test and found that if the default -O2 optimization is used, GEOS-Chem crashes. But it works fine with -O1. It is hard to tell what happens, since only the emissions step is done between reading the restart file and the crash.
Bob and I will further test Ifort 10 for optimization on our machines. Maybe we will find something... For the time being, you may have to switch to -O1, at least for the run that crashes. You will find the optimization flag at the beginning of the Makefile.ifort.

Long story short: This appears to be an optimization issue with IFORT 10 and v7-04-13. Upgrading to GEOS-Chem v8-01-01 should solve this problem.

--Bmy 10:38, 17 April 2008 (EDT)

ISORROPIA and RPMARES

Please see the discussion about the bugs & fixes for ISORROPIA and RPMARES on the Code Developer's Forum for Aerosol thermodynamical equilibrium.

NaN's in SMVGEAR

Lok Lamsal (lok.lamsal@fizz.phys.dal.ca)wrote:

I ran into a problem while running GEOS-4, version v7-04-09, at 2x2.5. The simulation stops on 15th July 2006 with different error messages on two of our machines tuque and beret. One of the error messages on tuque is like this:
 sum of rrate =  Infinity
 SMVGEAR: CNEW is NaN!
 Species index :            1
 Grid Box      :          121          15           1
 STOP in smvgear.f!
     - CLEANUP: deallocating arrays now...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
And on beret the message is like this:
     - CLEANUP: deallocating arrays now...
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 Image              PC                Routine            Line
 Source
 geos_4_nv          400000000059EA10  Unknown               Unknown
 Unknown
 libguide.so        200000000039C1A0  Unknown               Unknown
 Unknown
 etc.

</pre>

The message which is repeated in either case is like this:
 SMVGEAR: DELT= 9.96E-16 TOO LOW DEC YFAC. KBLK, KTLOOP, NCS, TIME, TIMREMAIN, YFAC, EPS =
Could you suggest me what the problem could be? Just to inform you: while trying to figure out the problem, I noticed from Bastien that he did not have problem on that day with version v7-04-10, which stopped on September 13 2006.

Bob Yantosca (yantosca@seas.harvard.edu) replied:

I think there is a division by zero somewhere that is causing SMVGEAR to choke. It could be a couple of things:
(1) Make sure that in your In a6_read_mod.f (routine READ_A6) you have the following code to prevent Q from going to zero. This can make logarithms blow up in other places in the code:


         !--------------------------------
         ! Q: 6-h avg specific humidity
         ! (GEOS-4 only)
         !--------------------------------
         CASE ( 'Q' )
            READ( IU_A6, IOSTAT=IOS ) XYMD, XHMS, Q3
            IF ( IOS /= 0 ) CALL IOERROR( IOS, IU_A6, 'read_a6:16' )
    
            IF ( CHECK_TIME( XYMD, XHMS, NYMD, NHMS ) ) THEN
               IF ( PRESENT( Q ) ) CALL TRANSFER_3D( Q3, Q )
               NFOUND = NFOUND + 1
    
               ! NOTE: Now set negative Q to a small positive #
               ! instead of zero, so as not to blow up logarithms
               ! (bmy, 9/8/06)
               WHERE ( Q < 0d0 ) Q = 1d-32
            ENDIF
(2) In fvdas_convect_mod.f, make SMALLEST a smaller number (i.e. 1d-60):
   !=================================================================
   ! MODULE VARIABLES
   !=================================================================
    
   ! Variables
   INTEGER            :: LIMCNV              ! Constants
   LOGICAL, PARAMETER :: RLXCLM   = .TRUE.
   REAL*8,  PARAMETER :: CMFTAU   = 3600.d0
   REAL*8,  PARAMETER :: EPS      = 1.0d-13       
   REAL*8,  PARAMETER :: GRAV     = 9.8d0
   !-------------------------------------------------------
   ! Prior to 12/19/06:
   ! Make SMALLEST smaller (bmy, 12/19/06)
   !REAL*8,  PARAMETER :: SMALLEST = 1.0d-32
   !-------------------------------------------------------
   REAL*8,  PARAMETER :: SMALLEST = 1.0d-60
   REAL*8,  PARAMETER :: TINYALT  = 1.0d-36           
   REAL*8,  PARAMETER :: TINYNUM  = 2*SMALLEST
(3) In "fvdas_convect_mod.f", avoid division by zero in routine CONVTRAN:
            IF ( CDIFR > 1.d-6 ) THEN
    
               ! If the two layers differ significantly.
               ! use a geometric averaging procedure
               CABV = MAX( CMIX(I,KM1), MAXC*TINYNUM, SMALLEST )
               CBEL = MAX( CMIX(I,K),   MAXC*TINYNUM, SMALLEST )
 !-----------------------------------------------------------------
 !  Prior to 12/19/06:
 ! Avoid division by zero (bmy, 12/19/06)
 !                  CHAT(I,K) = LOG( CABV / CBEL)
 !     &                       /   ( CABV - CBEL)
 !     &                       *     CABV * CBEL
 !-----------------------------------------------------------------
    
               ! If CABV-CBEL is zero then set CHAT=SMALLEST
               ! so that we avoid div by zero (bmy, 12/19/06)
               IF ( ABS( CABV - CBEL ) > 0d0 ) THEN
                  CHAT(I,K) = LOG( CABV / CBEL )
  &                         /    ( CABV - CBEL )
  &                         *      CABV * CBEL
               ELSE
                  CHAT(I,K) = SMALLEST
               ENDIF
    
            ELSE                           
               ! Small diff, so just arithmetic mean
               CHAT(I,K) = 0.5d0 * ( CMIX(I,K) + CMIX(I,KM1) )
            ENDIF
(4) Also I had to rewrite the parallel DO loops in the routine HACK_CONV since this was causing some kind of a memory fault.
You may just want to get the most recent version of fvdas_convect_mod.f, which has all of these fixes installed. See:
 ftp ftp.as.harvard.edu
 cd pub/exchange/bmy
 get fvdas_convect_mod.f
So I would recommend trying to implement these fixes and see if this solves your problem.
NOTE: These fixes have been introduced into GEOS-Chem v7-04-10.

regrid_1x1_mod.f

From Mike Barkley (mbarkley@staffmail.ed.ac.uk)

I think I've found an error in the regrid_1x1_mod.f subroutine (attached in the text file):

SUBROUTINE REGRID_MASS_TO_2x25( I1, J1, L1, IN, I2, J2, OUT )

There is a do loop over longitude with the upper limit defined as the input latitude (J1) instead what should (?) be the output longitude (I2) - I've indicated where this in the program, Which is correct? We didn't notice this until we were running multi-processor 2x2.5 simulations on different servers.

The bug was:

  !-----------------------
  ! Non-polar latitudes
  !-----------------------
  DO J = 2, J2-1    
     ...          
     DO I = 1, J1

which needs to be replaced by:

  !-----------------------
  ! Non-polar latitudes
  !-----------------------
  DO J = 2, J2-1    
     ...          
     DO I = 1, I1

NOTE: This bug has now been fixed in GEOS-Chem v7-04-13.

smvgear.f

Bob Yantosca (yantosca@seas.harvard.edu) wrote

Some of you have reported a weird error in SMVGEAR that causes GEOS-Chem simulations to die unexpectedly. The main symptom of this error is that concentrations of some species (e.g CO) appear to go to zero, while other species (e.g. Ox) seem to reach unphysically high values, all within a single chemistry timestep. Then the simulation dies shortly thereafter.
May Fu and Philippe Le Sager have isolated the cause of the problem. They found that in some instances it is possible (e.g. due to locally low OH) to get into a regime where the first derivative of a species goes very negative during SMVGEAR's internal iteration loop. This then causes the new species concentration to be negative. This can sometimes happen even if the local & global error tolerance checks have passed. Then upon exiting the internal iteration loop, SMVGEAR would automatically reset any negative species concentrations to zero (actually a small positive number like 1e-99). A species with zero concentration can adversely affect other species within the SMVGEAR solver process. Furthermore, sometimes these zero concentrations were propagating out of SMVGEAR and into the STT tracer array, which caused problems in other areas of the code.
May & Philippe implemented a fix into the file "smvgear.f" which does the following: If a negative species concentration value is found during an internal iteration, then we don't set it to zero. We instead reduce the internal iteration timestep and do another iteration (i.e. re-evaluate the Jacobian matrix) to solve for the new species concentration. This process is repeated until SMVGEAR converges onto a non-negative solution. May & Philippe also added an extra error trap to stop the simulation if any negative species concentrations still persist upon exiting the subroutine. So the entire process should now be more robust.
You may download the updated "smvgear.f" file from our anonymous FTP site:
   ftp ftp.as.harvard.edu
   cd pub/geos-chem/patches/v7-04-12
   get README
   get smvgear.f
Then copy the "smvgear.f" file to your own source code directory and recompile. Please see the README file for more information on how to locate the places in "smvgear.f" that were modified.
This is not really a "bug" but more of a "design flaw" in the original SMVGEAR package.


NOTE: This bug has now been fixed in GEOS-Chem GEOS-Chem v7-04-13.