Difference between revisions of "Bugs and fixes"

From Geos-chem
Jump to: navigation, search
(ND05 diagnostic quantities zeroed unexpectedly)
(Issues resolved in GEOS-Chem v8-01-01)
Line 101: Line 101:
  
 
--[[User:Bmy|Bob Y.]] 10:35, 12 January 2010 (EST)
 
--[[User:Bmy|Bob Y.]] 10:35, 12 January 2010 (EST)
 +
 +
== Issues resolved in GEOS-Chem v7-04-13 ==
 +
 +
* [[GEOS-Chem v7-04-13#regrid_1x1_mod.f|Error in regrid_1x1_mod.f]]
 +
* [[GEOS-Chem v7-04-13#smvgear.f|Error in smvgear.f]]
 +
 +
--[[User:Bmy|Bob Y.]] 10:49, 12 January 2010 (EST)
 +
 +
== Issues resolved in GEOS-Chem v7-04-10 ==
 +
 +
These fixes have been introduced into GEOS-Chem v7-04-10.
 +
 +
'''''Lok Lamsal (lok.lamsal@fizz.phys.dal.ca)wrote:'''''
 +
:I ran into a problem while running GEOS-4, version v7-04-09, at 2x2.5. The simulation stops on 15th July 2006 with different error messages on two of our machines tuque and beret. One of the error messages on tuque is like this:
 +
 +
  sum of rrate =  Infinity
 +
  SMVGEAR: CNEW is NaN!
 +
  Species index :            1
 +
  Grid Box      :          121          15          1
 +
  STOP in smvgear.f!
 +
      - CLEANUP: deallocating arrays now...
 +
forrtl: severe (174): SIGSEGV, segmentation fault occurred
 +
 +
:And on beret the message is like this:
 +
 +
      - CLEANUP: deallocating arrays now...
 +
  forrtl: severe (174): SIGSEGV, segmentation fault occurred
 +
  Image              PC                Routine            Line
 +
  Source
 +
  geos_4_nv          400000000059EA10  Unknown              Unknown
 +
  Unknown
 +
  libguide.so        200000000039C1A0  Unknown              Unknown
 +
  Unknown
 +
  etc.
 +
 +
:The message which is repeated in either case is like this:
 +
 +
  SMVGEAR: DELT= 9.96E-16 TOO LOW DEC YFAC. KBLK, KTLOOP, NCS, TIME, TIMREMAIN, YFAC, EPS =
 +
 +
:Could you suggest me what the problem could be? Just to inform you: while trying to figure out the problem, I noticed from Bastien that he did not have problem on that day with version v7-04-10, which stopped on September 13 2006.
 +
 +
'''''Bob Yantosca (yantosca@seas.harvard.edu) replied:'''''
 +
:I think there is a division by zero somewhere that is causing SMVGEAR to choke.  It could be a couple of things:
 +
 +
:(1) Make sure that in your In a6_read_mod.f (routine READ_A6) you have the following code to prevent Q from going to zero.  This can make logarithms blow up in other places in the code:
 +
 +
 +
          !--------------------------------
 +
          ! Q: 6-h avg specific humidity
 +
          ! (GEOS-4 only)
 +
          !--------------------------------
 +
          CASE ( 'Q' )
 +
            READ( IU_A6, IOSTAT=IOS ) XYMD, XHMS, Q3
 +
            IF ( IOS /= 0 ) CALL IOERROR( IOS, IU_A6, 'read_a6:16' )
 +
   
 +
            IF ( CHECK_TIME( XYMD, XHMS, NYMD, NHMS ) ) THEN
 +
                IF ( PRESENT( Q ) ) CALL TRANSFER_3D( Q3, Q )
 +
                NFOUND = NFOUND + 1
 +
   
 +
                ! NOTE: Now set negative Q to a small positive #
 +
                ! instead of zero, so as not to blow up logarithms
 +
                ! (bmy, 9/8/06)
 +
                WHERE ( Q < 0d0 ) Q = 1d-32
 +
            ENDIF
 +
 +
:(2) In fvdas_convect_mod.f, make SMALLEST a smaller number (i.e. 1d-60):
 +
 +
    !=================================================================
 +
    ! MODULE VARIABLES
 +
    !=================================================================
 +
   
 +
    ! Variables
 +
    INTEGER            :: LIMCNV              ! Constants
 +
    LOGICAL, PARAMETER :: RLXCLM  = .TRUE.
 +
    REAL*8,  PARAMETER :: CMFTAU  = 3600.d0
 +
    REAL*8,  PARAMETER :: EPS      = 1.0d-13     
 +
    REAL*8,  PARAMETER :: GRAV    = 9.8d0
 +
    !-------------------------------------------------------
 +
    ! Prior to 12/19/06:
 +
    ! Make SMALLEST smaller (bmy, 12/19/06)
 +
    !REAL*8,  PARAMETER :: SMALLEST = 1.0d-32
 +
    !-------------------------------------------------------
 +
    REAL*8,  PARAMETER :: SMALLEST = 1.0d-60
 +
    REAL*8,  PARAMETER :: TINYALT  = 1.0d-36         
 +
    REAL*8,  PARAMETER :: TINYNUM  = 2*SMALLEST
 +
 +
:(3) In "fvdas_convect_mod.f", avoid division by zero in routine CONVTRAN:
 +
 +
            IF ( CDIFR > 1.d-6 ) THEN
 +
   
 +
                ! If the two layers differ significantly.
 +
                ! use a geometric averaging procedure
 +
                CABV = MAX( CMIX(I,KM1), MAXC*TINYNUM, SMALLEST )
 +
                CBEL = MAX( CMIX(I,K),  MAXC*TINYNUM, SMALLEST )
 +
  !-----------------------------------------------------------------
 +
  !  Prior to 12/19/06:
 +
  ! Avoid division by zero (bmy, 12/19/06)
 +
  !                  CHAT(I,K) = LOG( CABV / CBEL)
 +
  !    &                      /  ( CABV - CBEL)
 +
  !    &                      *    CABV * CBEL
 +
  !-----------------------------------------------------------------
 +
   
 +
                ! If CABV-CBEL is zero then set CHAT=SMALLEST
 +
                ! so that we avoid div by zero (bmy, 12/19/06)
 +
                IF ( ABS( CABV - CBEL ) > 0d0 ) THEN
 +
                  CHAT(I,K) = LOG( CABV / CBEL )
 +
  &                        /    ( CABV - CBEL )
 +
  &                        *      CABV * CBEL
 +
                ELSE
 +
                  CHAT(I,K) = SMALLEST
 +
                ENDIF
 +
   
 +
            ELSE                         
 +
                ! Small diff, so just arithmetic mean
 +
                CHAT(I,K) = 0.5d0 * ( CMIX(I,K) + CMIX(I,KM1) )
 +
            ENDIF
 +
 +
:(4) Also I had to rewrite the parallel DO loops in the routine HACK_CONV since this was causing some kind of a memory fault.
 +
 +
:You may just want to get the most recent version of <tt>fvdas_convect_mod.f</tt>, which has all of these fixes installed. See:
 +
 +
  ftp ftp.as.harvard.edu
 +
  cd pub/geos-chem/patches/v7-04-10
 +
  get fvdas_convect_mod.f
 +
 +
:So I would recommend trying to implement these fixes and see if this solves your problem.
 +
 +
--[[User:Bmy|Bob Y.]] 16:50, 7 May 2009 (EDT)
  
 
== ND05 diagnostic quantities zeroed unexpectedly ==  
 
== ND05 diagnostic quantities zeroed unexpectedly ==  

Revision as of 15:49, 12 January 2010

On this page we list the GEOS-Chem bugs that users have recently encountered, and how to fix them.

Issues resolved in GEOS-Chem v8-02-05

--Bob Y. 10:39, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-02-04

--Bob Y. 10:11, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-02-03

--Bob Y. 16:38, 11 January 2010 (EST)

Issues resolved in GEOS-Chem v8-02-02

--Bob Y. 16:40, 11 January 2010 (EST)

Issues resolved in GEOS-Chem v8-02-01

--Bob Y. 09:55, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-01-04

--Bob Y. 09:58, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-01-03

--Bob Y. 10:29, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-01-02

--Bob Y. 10:29, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v8-01-01

--Bob Y. 10:35, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v7-04-13

--Bob Y. 10:49, 12 January 2010 (EST)

Issues resolved in GEOS-Chem v7-04-10

These fixes have been introduced into GEOS-Chem v7-04-10.

Lok Lamsal (lok.lamsal@fizz.phys.dal.ca)wrote:

I ran into a problem while running GEOS-4, version v7-04-09, at 2x2.5. The simulation stops on 15th July 2006 with different error messages on two of our machines tuque and beret. One of the error messages on tuque is like this:
 sum of rrate =  Infinity
 SMVGEAR: CNEW is NaN!
 Species index :            1
 Grid Box      :          121          15           1
 STOP in smvgear.f!
     - CLEANUP: deallocating arrays now...
forrtl: severe (174): SIGSEGV, segmentation fault occurred
And on beret the message is like this:
     - CLEANUP: deallocating arrays now...
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 Image              PC                Routine            Line
 Source
 geos_4_nv          400000000059EA10  Unknown               Unknown
 Unknown
 libguide.so        200000000039C1A0  Unknown               Unknown
 Unknown
 etc.
The message which is repeated in either case is like this:
 SMVGEAR: DELT= 9.96E-16 TOO LOW DEC YFAC. KBLK, KTLOOP, NCS, TIME, TIMREMAIN, YFAC, EPS =
Could you suggest me what the problem could be? Just to inform you: while trying to figure out the problem, I noticed from Bastien that he did not have problem on that day with version v7-04-10, which stopped on September 13 2006.

Bob Yantosca (yantosca@seas.harvard.edu) replied:

I think there is a division by zero somewhere that is causing SMVGEAR to choke. It could be a couple of things:
(1) Make sure that in your In a6_read_mod.f (routine READ_A6) you have the following code to prevent Q from going to zero. This can make logarithms blow up in other places in the code:


         !--------------------------------
         ! Q: 6-h avg specific humidity
         ! (GEOS-4 only)
         !--------------------------------
         CASE ( 'Q' )
            READ( IU_A6, IOSTAT=IOS ) XYMD, XHMS, Q3
            IF ( IOS /= 0 ) CALL IOERROR( IOS, IU_A6, 'read_a6:16' )
    
            IF ( CHECK_TIME( XYMD, XHMS, NYMD, NHMS ) ) THEN
               IF ( PRESENT( Q ) ) CALL TRANSFER_3D( Q3, Q )
               NFOUND = NFOUND + 1
    
               ! NOTE: Now set negative Q to a small positive #
               ! instead of zero, so as not to blow up logarithms
               ! (bmy, 9/8/06)
               WHERE ( Q < 0d0 ) Q = 1d-32
            ENDIF
(2) In fvdas_convect_mod.f, make SMALLEST a smaller number (i.e. 1d-60):
   !=================================================================
   ! MODULE VARIABLES
   !=================================================================
    
   ! Variables
   INTEGER            :: LIMCNV              ! Constants
   LOGICAL, PARAMETER :: RLXCLM   = .TRUE.
   REAL*8,  PARAMETER :: CMFTAU   = 3600.d0
   REAL*8,  PARAMETER :: EPS      = 1.0d-13       
   REAL*8,  PARAMETER :: GRAV     = 9.8d0
   !-------------------------------------------------------
   ! Prior to 12/19/06:
   ! Make SMALLEST smaller (bmy, 12/19/06)
   !REAL*8,  PARAMETER :: SMALLEST = 1.0d-32
   !-------------------------------------------------------
   REAL*8,  PARAMETER :: SMALLEST = 1.0d-60
   REAL*8,  PARAMETER :: TINYALT  = 1.0d-36           
   REAL*8,  PARAMETER :: TINYNUM  = 2*SMALLEST
(3) In "fvdas_convect_mod.f", avoid division by zero in routine CONVTRAN:
            IF ( CDIFR > 1.d-6 ) THEN
    
               ! If the two layers differ significantly.
               ! use a geometric averaging procedure
               CABV = MAX( CMIX(I,KM1), MAXC*TINYNUM, SMALLEST )
               CBEL = MAX( CMIX(I,K),   MAXC*TINYNUM, SMALLEST )
 !-----------------------------------------------------------------
 !  Prior to 12/19/06:
 ! Avoid division by zero (bmy, 12/19/06)
 !                  CHAT(I,K) = LOG( CABV / CBEL)
 !     &                       /   ( CABV - CBEL)
 !     &                       *     CABV * CBEL
 !-----------------------------------------------------------------
    
               ! If CABV-CBEL is zero then set CHAT=SMALLEST
               ! so that we avoid div by zero (bmy, 12/19/06)
               IF ( ABS( CABV - CBEL ) > 0d0 ) THEN
                  CHAT(I,K) = LOG( CABV / CBEL )
  &                         /    ( CABV - CBEL )
  &                         *      CABV * CBEL
               ELSE
                  CHAT(I,K) = SMALLEST
               ENDIF
    
            ELSE                           
               ! Small diff, so just arithmetic mean
               CHAT(I,K) = 0.5d0 * ( CMIX(I,K) + CMIX(I,KM1) )
            ENDIF
(4) Also I had to rewrite the parallel DO loops in the routine HACK_CONV since this was causing some kind of a memory fault.
You may just want to get the most recent version of fvdas_convect_mod.f, which has all of these fixes installed. See:
 ftp ftp.as.harvard.edu
 cd pub/geos-chem/patches/v7-04-10
 get fvdas_convect_mod.f
So I would recommend trying to implement these fixes and see if this solves your problem.

--Bob Y. 16:50, 7 May 2009 (EDT)

ND05 diagnostic quantities zeroed unexpectedly

Helen McIntyre (h.macintyre@see.leeds.ac.uk) wrote:

I'm running the new and old versions of GEOS-Chem, and both seem to not output all the ND05 diagnostics correctly. There are 10 prod/loss diagnostics in this category, but only the 5th, 6th and 7th work. All the rest come out as zero.
I've had a brief look though the code, and it seems that the ones that work are calculated in one part of the routine (sulfate_mod.f), and the zero ones in another (I don't know if this has anything to do with it).
I've just done a 1 day run, with Geos-chem v8-01-01 at 4x5 resolution using GEOS-5 met fields. The old version I'm using is v7-02-04 and I get the same result.
The 'ctm.bpch', 'geos.log' and 'input.geos' files from the v8-01-01 run can be found here: http://homepages.see.leeds.ac.uk/~lechlm/files/

Claire Carouge (ccarouge@seas.harvard.edu) replied:

There are some problems in the calculations. For the 5th element of AD05 (in Fortran notations and not IDL), the diagnostic come from the value of L1 which is calculated line 1529 (v8-01-04):
   L1     = ( SO20 - SO2_cd + PSO2_DMS(I,J,L) ) * RK1/RK
But on line 1508, we have:
   RK1 = 0.d0
with the previous comment:
  ! For online runs, SMVGEAR deals w/ this computation,
  ! so we can simply set RK1 = 0 (rjp, bmy, 3/23/03)
So L1 is always 0. I have no idea what RK1 is for, you may have to go into the physics/chemistry behind the code and we are not qualified to help you with it.
The other values for AD05 are calculated in the routine CHEM_DMS and this routine is only called in an offline aerosol simulation (l. 545 in sulfate_mod.f).
So my guess is that the ND05 diagnostic was designed for offline simulations and if you want it for online chemistry, you need to implement it.

Bad GEOS-4 A6 met data causing segmentation fault

Jesse Kenyon (kenyon@duke.edu) wrote:

In our runs of the GEOS-Chem model using GEOS4 data from 2006, we have run into a corrupt data problem that causes our runs to crash on rundate 20060913. We have been able to isolate this to bad values in the A6 files for the meridional wind (V component). Specifically, it appears two files contain bad data: 20060913.a6.4x5 and 20060915.a6.4x5. The bad data takes the form of unphysically huge values (e.g. 0.75553E+29), and occurs at many gridboxes on levels 6 and 7 of the 20060913 file and levels 18 and 19 of the 20060915 file. In both files, the bad data only occur for the 00 hour (06, 12, and 18 hours appear okay). We also checked the 20060914, 20060916, 20060917, 20060918, and 20060919.a6.4x5 files - they seem okay. A check of the U component wind on 09-13 and 09-15 shows no problems.
For us, the problem manifested itself as a segmentation fault when trying to access address -21474836 of array qtmp in subroutine xtp in module tpcore_fdvas_mod.f90. The address was calculated from xmass which is calculated from wind and pressure in another subroutine (Init_press_fix). We know some folks at Harvard have been able to get beyond this rundate without crashing and suspect it might be due to a difference in computer system or compiler.
We have not yet tried running with a "repaired" V wind, so cannot say for sure that there are no other problems in the A6 files besides V (or U which was also checked).

Philippe Le Sager (plesager@seas.harvard.edu) wrote:

Thanks for reporting the issue. We had the exact same problem with GCAP once and the problem was solved by repairing the met field (a very bad value in U or V). Then Mike and Bastien got the exact same error with GEOS4, on the same day as you. My first idea was to test the met fields but I did not get any bad value. I just did a run with Bastien's inputs, and was not able to reproduce the crash. You just confirmed that the first idea was the good one: a bad met field.
Since I seem to have a good met field and you do not, I check the met fields on the server this time. And I did find a problem with V for 20060913. Here are the output from test_met.pro (it gives min and max of each met fields):
   at Harvard (internal disk):
   20060913 000000 U             -72.816071     144.557083
   20060913 000000 V             -67.133759      61.291386

   on the server:
   20060913 000000 U             -72.816071     144.557083
   20060913 000000 V             -67.133759***************
All others fields give the exact same min/max. There is a huge or NaN value in the file we put on the server. We do not understand how that happened, since files are simply copied from one location to the other. So we are still investigating the issue, checking the whole archive, and will let you know as soon as we replace it.

Bob Yantosca (yantosca@seas.harvard.edu) wrote:

I have fixed the bad A-6 data for 2006 -- 09/13, 09/14, 09/15, 09/16. For some reason the data in the FTP site was corrupt (bad values in the winds at a couple of GMT times) but the data on our internal disk (behind the firewall) was not. I just copied the relevant data files over and re-created the TAR file.
Please obtain the new TAR file from:
  ftp ftp.as.harvard.edu
  cd pub/geos-chem/data/GEOS_4x5.d/GEOS_4_v4/2006/09
  get 09.tar.gz

--Bob Y. 15:40, 23 September 2008 (EDT)

Too many levels in photolysis code

The scattering module (OPMIE.f) for Fast-J requires many additional vertical levels. It happens that the limit (NL set in jv_mie.h) can be reached in some situations, causing the program to stop with a "Too many levels in photolysis code.." error message. Sometimes you can increase NL to solve the problem. Now a new version of OPMIE.f is available, which still warns you if NL is reached, but works with that limit.

Before being released into the standard model, you can find the new OPMIE.f at: ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/OPMIE.f

--phs 11:29, 17 June 2008 (EDT)

This can also be an indication that there may be a problem in your visual optical depths, dust emissions, or aerosol emissions. Dust and aerosol optical depths are computed from the concentration array STT. If for some reason you end up emitting too much aerosol or dust (i.e. a unit conversion error), then this will result in an abnormally high dust or aerosol optical depth. A very high optical depth will cause FAST-J to want to keep adding points to the Gaussian quadrature in OPMIE.f. You can get into a situation where the number of points that FAST-J wants to add is greater than the array parameter NL (it may want to add thousands of points!).

Therefore, if you encounter this type of error, it is a good idea to doublecheck your aerosol & dust emissions to make sure that the monthly and annual totals are reasonable.

--Bob Y. 11:03, 26 June 2008 (EDT)

Negative tracer in routine WETDEP because of negative RH

See this post: GEOS-5 issues#Small negative RH value in 20060206.a6.2x25 file

Fixes are available at ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01.

--phs 16:31, 6 June 2008 (EDT)

Negative tracer in routine WETDEP

Dylan Millet (dbm@umn.edu) wrote:

I'm having a run die consistently at the same time (October 1, 2005; first time step of the month) in large-scale wetdep, with an STT element < 0.
  • Platform: Linux cluster
  • Threads: 8
  • Version: v7-4-13 out of the box.
  • GEOS4, 4x5, 30L, full chemistry
  • IFORT 10.1
In Section 6 (No Downward Precip) of wetscav_mod.f, subroutine safety is getting called.
    WETDEP - STT < 0 at    1   1  29 for tracer    7 in area    6
(First of all it seems odd to do wetdep for L=29, this is 63 km up). Have you seen anything like this? I ran for the whole year starting Jan 1 successfully until this point.
... By the way, the problem persists when I turn off chemistry altogether.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

I used your restart file and the same input.geos (w/ chemistry on and off). My code went thru without problem. I tried both Sun Studio and Ifort 9 compilers, and the later on two different machines (altix and ceres). I used v7-04-13 and v8-01-01. I never reproduced your error.
We just got the new Ifort 10, and tried it too. I run v8-01-01 without an error. But when I tried v7-04-13, I finally reproduced your error, with the exact same negative values!
In other words: the bug happens with IFort 10 and v7-04-13 only.
Also, have a look at this recent development. This is not the reason for your bug (I tried v8 w/ ifort 10 and isorropia -like v7-04-13- and it did not crash), but using RPMARES instead of Isorropia may be a way to fix it.
... More about the Ifort 10 / v7-04-13 issue. When I wanted to debug with TotalView, I could not reproduce the bug anymore.... because I simply suppress any optimization. So, I did more test and found that if the default -O2 optimization is used, GEOS-Chem crashes. But it works fine with -O1. It is hard to tell what happens, since only the emissions step is done between reading the restart file and the crash.
Bob and I will further test Ifort 10 for optimization on our machines. Maybe we will find something... For the time being, you may have to switch to -O1, at least for the run that crashes. You will find the optimization flag at the beginning of the Makefile.ifort.

Long story short: This appears to be an optimization issue with IFORT 10 and v7-04-13. Upgrading to GEOS-Chem v8-01-01 should solve this problem.

--Bmy 10:38, 17 April 2008 (EDT)

ISORROPIA and RPMARES

Please see the discussion about the bugs & fixes for ISORROPIA and RPMARES on the Code Developer's Forum for Aerosol thermodynamical equilibrium.

Also, if you are trying to run an aerosol-only simulation, then please see this discussion about a bug that manifested itself only after switching from ISORROPIA to RPMARES.

--Bob Y. 10:44, 26 June 2008 (EDT)