Difference between revisions of "Bugs and fixes"

From Geos-chem
Jump to: navigation, search
(17-Apr-2007)
Line 18: Line 18:
  
 
:(First of all it seems odd to do wetdep for L=29, this is 63 km up).  Have you seen anything like this?  I ran for the whole year starting Jan 1 successfully until this point.
 
:(First of all it seems odd to do wetdep for L=29, this is 63 km up).  Have you seen anything like this?  I ran for the whole year starting Jan 1 successfully until this point.
 +
 +
:...
 +
 +
:By the way, the problem persists when I turn off chemistry altogether.
  
 
'''''Philippe Le Sager (plesager@seas.harvard.edu) replied:'''''  
 
'''''Philippe Le Sager (plesager@seas.harvard.edu) replied:'''''  

Revision as of 14:38, 17 April 2008

On this page we list the GEOS-Chem bugs that users have recently encountered, and how to fix them.

Negative tracer in routine WETDEP

17-Apr-2007

Dylan Millet (dbm@umn.edu) wrote:

I'm having a run die consistently at the same time (October 1, 2005; first time step of the month) in large-scale wetdep, with an STT element < 0.
  • Platform: Linux cluster
  • Threads: 8
  • Version: v7-4-13 out of the box.
  • GEOS4, 4x5, 30L, full chemistry
  • IFORT 10.1
In Section 6 (No Downward Precip) of wetscav_mod.f, subroutine safety is getting called.
    WETDEP - STT < 0 at    1   1  29 for tracer    7 in area    6
(First of all it seems odd to do wetdep for L=29, this is 63 km up). Have you seen anything like this? I ran for the whole year starting Jan 1 successfully until this point.
...
By the way, the problem persists when I turn off chemistry altogether.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

I used your restart file and the same input.geos (w/ chemistry on and off). My code went thru without problem. I tried both Sun Studio and Ifort 9 compilers, and the later on two different machines (altix and ceres). I used v7-04-13 and v8-01-01. I never reproduced your error.
We just got the new Ifort 10, and tried it too. I run v8-01-01 without an error. But when I tried v7-04-13, I finally reproduced your error, with the exact same negative values!
In other words: the bug happens with IFort 10 and v7-04-13 only.
Also, have a look at this recent development. This is not the reason for your bug (I tried v8 w/ ifort 10 and isorropia -like v7-04-13- and it did not crash), but using RPMARES instead of Isorropia may be a way to fix it.
...
More about the Ifort 10 / v7-04-13 issue. When I wanted to debug with TotalView, I could not reproduce the bug anymore.... because I simply suppress any optimization. So, I did more test and found that if the default -O2 optimization is used, GEOS-Chem crashes. But it works fine with -O1. It is hard to tell what happens, since only the emissions step is done between reading the restart file and the crash.
Bob and I will further test Ifort 10 for optimization on our machines. Maybe we will find something... For the time being, you may have to switch to -O1, at least for the run that crashes. You will find the optimization flag at the beginning of the Makefile.ifort.

Long story short: This appears to be an optimization issue with IFORT 10 and v7-04-13. Upgrading to GEOS-Chem v8-01-01 should solve this problem.

--Bmy 10:38, 17 April 2008 (EDT)

ISORROPIA and RPMARES

11-Apr-2008

Please see the discussion about the bugs & fixes for ISORROPIA and RPMARES on the Code Developer's Forum for Aerosol thermodynamical equilibrium.

NaN's in SMVGEAR

05-Jul-2007

From Lok Lamsal (lok.lamsal@fizz.phys.dal.ca)

I ran into a problem while running GEOS-4, version v7-04-09, at 2x2.5. The simulation stops on 15th July 2006 with different error messages on two of our machines tuque and beret. One of the error messages on tuque is like this:

  sum of rrate =  Infinity
  SMVGEAR: CNEW is NaN!
  Species index :            1
  Grid Box      :          121          15           1
  STOP in smvgear.f!
      - CLEANUP: deallocating arrays now...
 forrtl: severe (174): SIGSEGV, segmentation fault occurred

And on beret the message is like this:

      - CLEANUP: deallocating arrays now...
  forrtl: severe (174): SIGSEGV, segmentation fault occurred
  Image              PC                Routine            Line
  Source
  geos_4_nv          400000000059EA10  Unknown               Unknown
  Unknown
  libguide.so        200000000039C1A0  Unknown               Unknown
  Unknown
  etc.

The message which is repeated in either case is like this:

  SMVGEAR: DELT= 9.96E-16 TOO LOW DEC YFAC. KBLK, KTLOOP, NCS, TIME, TIMREMAIN, YFAC, EPS =

Could you suggest me what the problem could be? Just to inform you: while trying to figure out the problem, I noticed from Bastien that he did not have problem on that day with version v7-04-10, which stopped on September 13 2006.

Response by Bob Yantosca (yantosca@seas.harvard.edu):

I think there is a division by zero somewhere that is causing SMVGEAR to choke. It could be a couple of things:

(1) Make sure that in your In a6_read_mod.f (routine READ_A6) you have the following code to prevent Q from going to zero. This can make logarithms blow up in other places in the code:

          !--------------------------------
          ! Q: 6-h avg specific humidity
          ! (GEOS-4 only)
          !--------------------------------
          CASE ( 'Q' )
             READ( IU_A6, IOSTAT=IOS ) XYMD, XHMS, Q3
             IF ( IOS /= 0 ) CALL IOERROR( IOS, IU_A6, 'read_a6:16' )
     
             IF ( CHECK_TIME( XYMD, XHMS, NYMD, NHMS ) ) THEN
                IF ( PRESENT( Q ) ) CALL TRANSFER_3D( Q3, Q )
                NFOUND = NFOUND + 1
     
                ! NOTE: Now set negative Q to a small positive #
                ! instead of zero, so as not to blow up logarithms
                ! (bmy, 9/8/06)
                WHERE ( Q < 0d0 ) Q = 1d-32
             ENDIF


(2) In fvdas_convect_mod.f, make SMALLEST a smaller number (i.e. 1d-60):

    !=================================================================
    ! MODULE VARIABLES
    !=================================================================
     
    ! Variables
    INTEGER            :: LIMCNV              ! Constants
    LOGICAL, PARAMETER :: RLXCLM   = .TRUE.
    REAL*8,  PARAMETER :: CMFTAU   = 3600.d0
    REAL*8,  PARAMETER :: EPS      = 1.0d-13       
    REAL*8,  PARAMETER :: GRAV     = 9.8d0
    !-------------------------------------------------------
    ! Prior to 12/19/06:
    ! Make SMALLEST smaller (bmy, 12/19/06)
    !REAL*8,  PARAMETER :: SMALLEST = 1.0d-32
    !-------------------------------------------------------
    REAL*8,  PARAMETER :: SMALLEST = 1.0d-60
    REAL*8,  PARAMETER :: TINYALT  = 1.0d-36           
    REAL*8,  PARAMETER :: TINYNUM  = 2*SMALLEST


(3) In "fvdas_convect_mod.f", avoid division by zero in routine CONVTRAN:


             IF ( CDIFR > 1.d-6 ) THEN
     
                ! If the two layers differ significantly.
                ! use a geometric averaging procedure
                CABV = MAX( CMIX(I,KM1), MAXC*TINYNUM, SMALLEST )
                CBEL = MAX( CMIX(I,K),   MAXC*TINYNUM, SMALLEST )
  !-----------------------------------------------------------------
  !  Prior to 12/19/06:
  ! Avoid division by zero (bmy, 12/19/06)
  !                  CHAT(I,K) = LOG( CABV / CBEL)
  !     &                       /   ( CABV - CBEL)
  !     &                       *     CABV * CBEL
  !-----------------------------------------------------------------
     
                ! If CABV-CBEL is zero then set CHAT=SMALLEST
                ! so that we avoid div by zero (bmy, 12/19/06)
                IF ( ABS( CABV - CBEL ) > 0d0 ) THEN
                   CHAT(I,K) = LOG( CABV / CBEL )
   &                         /    ( CABV - CBEL )
   &                         *      CABV * CBEL
                ELSE
                   CHAT(I,K) = SMALLEST
                ENDIF
     
             ELSE                           
                ! Small diff, so just arithmetic mean
                CHAT(I,K) = 0.5d0 * ( CMIX(I,K) + CMIX(I,KM1) )
             ENDIF


(4) Also I had to rewrite the parallel DO loops in the routine HACK_CONV since this was causing some kind of a memory fault.

You may just want to get the most recent version of fvdas_convect_mod.f, which has all of these fixes installed. See:

  ftp ftp.as.harvard.edu
  cd pub/exchange/bmy
  get fvdas_convect_mod.f

So I would recommend trying to implement these fixes and see if this solves your problem.

NOTE: These fixes have been introduced into GEOS-Chem v7-04-10.

regrid_1x1_mod.f

16 Oct 2007

From Mike Barkley (mbarkley@staffmail.ed.ac.uk)

I think I've found an error in the regrid_1x1_mod.f subroutine (attached in the text file):

SUBROUTINE REGRID_MASS_TO_2x25( I1, J1, L1, IN, I2, J2, OUT )

There is a do loop over longitude with the upper limit defined as the input latitude (J1) instead what should (?) be the output longitude (I2) - I've indicated where this in the program, Which is correct? We didn't notice this until we were running multi-processor 2x2.5 simulations on different servers.

The bug was:

  !-----------------------
  ! Non-polar latitudes
  !-----------------------
  DO J = 2, J2-1    
     ...          
     DO I = 1, J1

which needs to be replaced by:

  !-----------------------
  ! Non-polar latitudes
  !-----------------------
  DO J = 2, J2-1    
     ...          
     DO I = 1, I1

This bug has now been fixed in GEOS-Chem v7-04-13.

smvgear.f

02 Nov 2007

Bob Yantosca (yantosca@seas.harvard.edu) wrote

Some of you have reported a weird error in SMVGEAR that causes GEOS-Chem simulations to die unexpectedly. The main symptom of this error is that concentrations of some species (e.g CO) appear to go to zero, while other species (e.g. Ox) seem to reach unphysically high values, all within a single chemistry timestep. Then the simulation dies shortly thereafter.
May Fu and Philippe Le Sager have isolated the cause of the problem. They found that in some instances it is possible (e.g. due to locally low OH) to get into a regime where the first derivative of a species goes very negative during SMVGEAR's internal iteration loop. This then causes the new species concentration to be negative. This can sometimes happen even if the local & global error tolerance checks have passed. Then upon exiting the internal iteration loop, SMVGEAR would automatically reset any negative species concentrations to zero (actually a small positive number like 1e-99). A species with zero concentration can adversely affect other species within the SMVGEAR solver process. Furthermore, sometimes these zero concentrations were propagating out of SMVGEAR and into the STT tracer array, which caused problems in other areas of the code.
May & Philippe implemented a fix into the file "smvgear.f" which does the following: If a negative species concentration value is found during an internal iteration, then we don't set it to zero. We instead reduce the internal iteration timestep and do another iteration (i.e. re-evaluate the Jacobian matrix) to solve for the new species concentration. This process is repeated until SMVGEAR converges onto a non-negative solution. May & Philippe also added an extra error trap to stop the simulation if any negative species concentrations still persist upon exiting the subroutine. So the entire process should now be more robust.
You may download the updated "smvgear.f" file from our anonymous FTP site:
   ftp ftp.as.harvard.edu
   cd pub/geos-chem/patches/v7-04-12
   get README
   get smvgear.f
Then copy the "smvgear.f" file to your own source code directory and recompile. Please see the README file for more information on how to locate the places in "smvgear.f" that were modified.
This is not really a "bug" but more of a "design flaw" in the original SMVGEAR package.


This bug has now been fixed in GEOS-Chem GEOS-Chem v7-04-13.