Bugs and fixes

From Geos-chem
Revision as of 22:33, 18 November 2008 by Plesager (Talk | contribs) (bugs fixes in diagnostics, mainly ND47 & ND65)

Jump to: navigation, search

On this page we list the GEOS-Chem bugs that users have recently encountered, and how to fix them. Please also see the following wiki pages:

Outstanding issues yet to be resolved
This page contains a list of known issues (e.g. shortcomings in scientific algorithms, diagnostics, technical problems, etc.) which are slated to be fixed in the standard GEOS-Chem code but have not been done so at this time.
Previous issues that are now resolved
This page contains a list of known issues that have already been fixed in the standard GEOS-Chem code at this time.

Bugs in ND47/ND65/ND45 in v8-01-01

Yuxuan Wang (wang3@fas.harvard.edu) wrote:

I think I found a bug in code version v8-01-01. The bug involves the scale factors used for ND47 (24-hr avg output) and ND65 (24-hr chemical family production and loss rate). The bug occurs when users define ND45 time window to be less than 24 hours.
In diag3.f, the scale factors for ND47 and ND65 are CTOTH and CTO3. In diag_2pm.f, CTOTH and CTO3 are counters for the user-defined time window of ND45 output. In my simulation, for example, I defined ND45 time window to be from 12 to 16 (afternoon concentrations), so CTOTH and CTO3 record the number of afternoon hours in my run. However, ND47 and ND65 are accummulating for 24 hours, independent of the time window selected for ND45. At times when the code writes the diagnostics, ND47 and ND65 are scaled by CTOTH and CTO3 which are accumulating only for the afternoon hours. This incorrect scaling led to unrealistically high concentrations in ND47 and ND65.
I did a test simulation and found the bug did not affect the results if ND45 time-window are defined from 0 to 24 (i.e., 24-hr mean).
Related codes are:
----------in diag3.f-------------------
starting line 2616
      IF ( ND47 > 0 ) THEN
        CATEGORY = 'IJ-24H-$'
        UNIT     = 

        ! Now use SCALE_TMP instead of SCALEDYN
        SCALE_TMP = FLOAT( CTOTH ) + 1d-20

        ! Now account for undefined O3 concentration
        ! at first timestep (phs, 1/24/07)
        IF ( FIRST ) THEN
           SCALE_TMP3D = MAX( FLOAT( CTO3 )-1d0, 1d-20 )
        ELSE
           SCALE_TMP3D = FLOAT( CTO3 ) + 1d-20
        ENDIF
         ......

starting line 2799
     IF ( ND65 > 0 ) THEN
        CATEGORY = 'PORL-L=$'

        ! Note: P/L are defined at first time step, since
        ! they are computed after chemistry (phs, 3/6/07)
        SCALE_TMP3D = FLOAT( CTO3 ) + 1d-20
 
-----------in diag_2pm.f------------------------------------
starting line 68
     DO J = 1, JJPAR
     DO I = 1, IIPAR

        !-----------------------------
        ! ND45 -- mixing ratios
        !-----------------------------
        IF ( IS_ND45 ) THEN

           ! Archive if we fall w/in the local time limits
           IF ( LT(I) >= HR1_OTH .and. LT(I) <= HR2_OTH ) THEN
              LTOTH(I,J) = 1
              CTOTH(I,J) = CTOTH(I,J) + 1

              ! Counter for # of O3 boxes in the troposphere (phs, 1/24/07)
              DO L = 1, LD45
                 IF ( IS_ND45_O3 .and. ITS_IN_THE_TROP( I, J, L )) THEN
                    CTO3(I,J,L) = CTO3(I,J,L) + 1
                 ENDIF
              ENDDO

           ELSE
              LTOTH(I,J) = 0
           ENDIF
        ENDIF
Thank you for your attention!
Yuxuan


And Shiliang Wu (wu18@fas.harvard.edu) also reported a bug when ND47 is turned on:

I recently had problem running GC (v7.4.13 if that matters) - It always crashed at the end of the run before writing the bpch file and the restart file. I tracked down and it looks something weird related to the diagnostic ND47 - I was able to reproduce the problem whenever ND47 is on but everything is OK when ND47 is off.

Philippe Le Sager (wu18@fas.harvard.edu) wrote:

I found your problem. You cannot save ND45 with less level than ND47. Different tracers is ok:
  • ND45 and ND47 all tracers and all levels
    • runs fine
  • ND45 and ND47 all tracers, 5 levels for ND45 and all 23 for ND47
    • crashed after 2h simulation.
  • ND45 and ND47 all levels, 4 tracers for ND45 and all 43 for ND47
    • runs fine
We will either add a check to prevent L45 < L47 or modify the program to allow it in a next GC version.


We did modify the code. Philippe Le Sager (plesager@seas.harvard.edu) wrote:

I think I have fixed several diagnostic issues related to counters of time in the troposphere: ND45's O3, ND65, ND47 and ND20.
(1) When ND45 is not averaging over all time steps, these diagnostics should be correct now.
(2) The restriction on the number of levels for some diagnostics ( LDxx =< LD45 ) has been released.
You can temporarily find the updated routines in:
ftp://ftp.as.harvard.edu/pub/exchange/phs/ND_20_45_47_65_fixes
Please read carefully the README file, since one of the files has additional modifications scheduled for v8-01-03. This update will be part of v8-01-02.

--phs 17:33, 18 November 2008 (EST)


Quick fix for GEOS-5 optical depth

Please see the discussion for a quick fix for the GEOS-5 optical depths.

--Bob Y. 16:13, 10 October 2008 (EDT)

Bad GEOS-4 A6 met data causing segmentation fault

Jesse Kenyon (kenyon@duke.edu) wrote:

In our runs of the GEOS-Chem model using GEOS4 data from 2006, we have run into a corrupt data problem that causes our runs to crash on rundate 20060913. We have been able to isolate this to bad values in the A6 files for the meridional wind (V component). Specifically, it appears two files contain bad data: 20060913.a6.4x5 and 20060915.a6.4x5. The bad data takes the form of unphysically huge values (e.g. 0.75553E+29), and occurs at many gridboxes on levels 6 and 7 of the 20060913 file and levels 18 and 19 of the 20060915 file. In both files, the bad data only occur for the 00 hour (06, 12, and 18 hours appear okay). We also checked the 20060914, 20060916, 20060917, 20060918, and 20060919.a6.4x5 files - they seem okay. A check of the U component wind on 09-13 and 09-15 shows no problems.
For us, the problem manifested itself as a segmentation fault when trying to access address -21474836 of array qtmp in subroutine xtp in module tpcore_fdvas_mod.f90. The address was calculated from xmass which is calculated from wind and pressure in another subroutine (Init_press_fix). We know some folks at Harvard have been able to get beyond this rundate without crashing and suspect it might be due to a difference in computer system or compiler.
We have not yet tried running with a "repaired" V wind, so cannot say for sure that there are no other problems in the A6 files besides V (or U which was also checked).

Philippe Le Sager (plesager@seas.harvard.edu) wrote:

Thanks for reporting the issue. We had the exact same problem with GCAP once and the problem was solved by repairing the met field (a very bad value in U or V). Then Mike and Bastien got the exact same error with GEOS4, on the same day as you. My first idea was to test the met fields but I did not get any bad value. I just did a run with Bastien's inputs, and was not able to reproduce the crash. You just confirmed that the first idea was the good one: a bad met field.
Since I seem to have a good met field and you do not, I check the met fields on the server this time. And I did find a problem with V for 20060913. Here are the output from test_met.pro (it gives min and max of each met fields):
   at Harvard (internal disk):
   20060913 000000 U             -72.816071     144.557083
   20060913 000000 V             -67.133759      61.291386

   on the server:
   20060913 000000 U             -72.816071     144.557083
   20060913 000000 V             -67.133759***************
All others fields give the exact same min/max. There is a huge or NaN value in the file we put on the server. We do not understand how that happened, since files are simply copied from one location to the other. So we are still investigating the issue, checking the whole archive, and will let you know as soon as we replace it.

Bob Yantosca (yantosca@seas.harvard.edu) wrote:

I have fixed the bad A-6 data for 2006 -- 09/13, 09/14, 09/15, 09/16. For some reason the data in the FTP site was corrupt (bad values in the winds at a couple of GMT times) but the data on our internal disk (behind the firewall) was not. I just copied the relevant data files over and re-created the TAR file.
Please obtain the new TAR file from:
  ftp ftp.as.harvard.edu
  cd pub/geos-chem/data/GEOS_4x5.d/GEOS_4_v4/2006/09
  get 09.tar.gz

--Bob Y. 15:40, 23 September 2008 (EDT)

Albedo diagnostic in GEOS-4

Lyatt Jaegle <jaegle@atmos.washington.edu> wrote:

I was saving the daily albedo output in GEOS-Chem (v8-01-01) via diagnostic DAO-FLDS and found that the albedo is greater than 1. In fact, it seems to be a factor of 2 higher than it should be. [...] I was running an offline aerosol simulation. Is there a reason for the albedo output to be multiplied by two or is it a bug?

Philippe Le Sager <plesager@seas.harvard.edu> replied:

This is a bug. ALBEDO is an I6 field in GEOS-3 but an A3 in GCAP, GEOS-4 and GEOS-5. The difference between models was not accounted for and ALBEDO was considered an I6 field in all cases. The fix is easy and will go in the next release. Around line 2996 in diag3.f, Case (14) should read:
              CASE ( 14 )

     #if   defined( GEOS_3 ) 
                 SCALEX = SCALE_I6 
     #else
                 SCALEX = SCALE_A3
     #endif 
                 UNIT   = 'unitless'

             CASE ( 15 )

Note that this bug was only affecting the diagnostic output. The Albedo value was correct within the code.

NOTE: This fix is slated for inclusion into v8-01-02.

--phs 08:53, 3 September 2008 (EDT)

Too many levels in photolysis code

The scattering module (OPMIE.f) for Fast-J requires many additional vertical levels. It happens that the limit (NL set in jv_mie.h) can be reached in some situations, causing the program to stop with a "Too many levels in photolysis code.." error message. Sometimes you can increase NL to solve the problem. Now a new version of OPMIE.f is available, which still warns you if NL is reached, but works with that limit.

Before being released into the standard model, you can find the new OPMIE.f at: ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/OPMIE.f

--phs 11:29, 17 June 2008 (EDT)

This can also be an indication that there may be a problem in your visual optical depths, dust emissions, or aerosol emissions. Dust and aerosol optical depths are computed from the concentration array STT. If for some reason you end up emitting too much aerosol or dust (i.e. a unit conversion error), then this will result in an abnormally high dust or aerosol optical depth. A very high optical depth will cause FAST-J to want to keep adding points to the Gaussian quadrature in OPMIE.f. You can get into a situation where the number of points that FAST-J wants to add is greater than the array parameter NL (it may want to add thousands of points!).

Therefore, if you encounter this type of error, it is a good idea to doublecheck your aerosol & dust emissions to make sure that the monthly and annual totals are reasonable.

--Bob Y. 11:03, 26 June 2008 (EDT)

Mis-labeling of SOA quantities in carbon_mod.f

Colette Heald (heald@atmos.colostate.edu) wrote:

I noticed previously that the parameters in (routine) SOA_PARA were confusingly ordered, but it's only today that I realized that in fact they are mis-labeled as well. The actual numbers are all perfectly correct but they should correspond to the information in Tables 3 and 4 for the categories of Table 1 of the Chung and Seinfeld (2002) paper. I have re-organized the code & comments for clarity (lines 1875-1939), they should read as:
     ! Average of ALPHA-PINENE, BETA-PINENE, SABINENE, D3-CARENE
     RALPHA(1,1) = 0.067d0            
     RALPHA(2,1) = 0.35425d0

     ! LIMONENE
     RALPHA(1,2) = 0.239d0
     RALPHA(2,2) = 0.363d0

     ! Average of TERPINENES and TERPINOLENE
     RALPHA(1,3) = 0.0685d0
     RALPHA(2,3) = 0.2005d0

     ! Average of MYRCENE, LINALOOL, TERPINENE-4-OL, OCIMENE
     RALPHA(1,4) = 0.06675d0
     RALPHA(2,4) = 0.135d0

     ! Average of BETA-CARYOPHYLLENE and and ALPHA-HUMULENE
     RALPHA(1,5) = 1.0d0
     RALPHA(2,5) = 0.0d0

     ! Using BETA-PINENE for all species for NO3 oxidation
     ! Data from Table 4 of Griffin, et al., JGR 104 (D3): 3555-3567 (1999)
     RALPHA(3,:) = 1.d0           

     ! Here we define some alphas for isoprene (dkh, bmy, 5/22/06)

     ! high NOX  [Kroll et al, 2005]
     !RALPHA(1,6) = 0.264d0
     !RALPHA(2,6) = 0.0173d0
     !RALPHA(3,6) = 0d0

     ! low NOX   [Kroll et al, 2006; Henze and Seinfeld, 2006]
     RALPHA(1,6) = 0.232d0
     RALPHA(2,6) = 0.0288d0
     RALPHA(3,6) = 0d0

     !=================================================================
     ! Equilibrium gas-particle partition coefficients of 
     ! semi-volatile compounds [ug-1 m**3]
     !=================================================================

     ! Average of ALPHA-PINENE, BETA-PINENE, SABINENE, D3-CARENE
     KOM(1,1) = 0.1835d0
     KOM(2,1) = 0.004275d0

     ! LIMONENE
     KOM(1,2) = 0.055d0
     KOM(2,2) = 0.0053d0

     ! Average of TERPINENES and TERPINOLENE
     KOM(1,3) = 0.133d0
     KOM(2,3) = 0.0035d0

     ! Average of MYRCENE, LINALOOL, TERPINENE-4-OL, OCIMENE
     KOM(1,4) = 0.22375d0
     KOM(2,4) = 0.0082d0

     ! Average of BETA-CARYOPHYLLENE and and ALPHA-HUMULENE
     KOM(1,5) = ( 0.04160d0 + 0.0501d0 ) / 2.d0
     KOM(2,5) = 0.0d0

     ! NOT APPLICABLE -- using BETA-PINENE for all species
     ! Data from Table 4 of Griffin, et al., JGR 104 (D3): 3555-3567 (1999)
     KOM(3,:) = 0.0163d0

Bob Yantosca (heald@atmos.colostate.edu) wrote:

We will put this in the next update of the code. Thanks!

--Bob Y. 10:11, 13 June 2008 (EDT)

Numerical noise bug in upbdflx_mod.f

Ray Nassar (ray@io.as.harvard.edu) wrote:

The run I did last night confirmed that turning off "Use strat O3/NOy BC's" does change O3 and NOx at lower levels so I cannot turn this off for sensitivity tests if comparing to past runs.
I have tried the debugging approach which you have suggested and it indicates that the problem is coming from the triple loop:
        DO L = LMIN, LLPAR
        DO J = 1,    JJPAR
        DO I = 1,    IIPAR

           ! Skip over tropospheric boxes
           IF ( ITS_IN_THE_STRAT( I, J, L ) ) THEN

              ! PNOY = P(NOy) converted from [v/v/s] to [v/v]
              PNOY = STRATPNOY(J,L) * DTDYN

              ! Add [NOx] and [HNO3] to PNOY.
              ! PNOY is now the total [NOy] concentration
              PNOY = PNOY + STT(I,J,L,IDTNOX) + STT(I,J,L,IDTHNO3)

              ! Partition total [NOy] to [NOx], units are [v/v]
              STT(I,J,L,IDTNOX)  = PNOY * XRATIO(J,L)

              ! Partition total [NOy] to [HNO3], units are [v/v]
              STT(I,J,L,IDTHNO3) = PNOY * ( 1d0 - XRATIO(J,L) )

           ENDIF
        ENDDO
        ENDDO
        ENDDO
in the file upbdflx_mod.f. Since XRATIO can have a value of one, it means that numerical noise sometimes makes 1d0 - XRATIO(J,L) a negative number and then STT(I,J,L,IDTHNO3) becomes negative and triggers the error message. For the two instances where the difference is computed, I have now added the lines:
       ! Ray Nassar (2008-06-11) -------------------------------
       !
       ! The kludge below is to deal with errors which result
       ! from subtracting 1d0 - XRATIO(J,L). When XRATIO should
       ! be equal 1, the binary representation will have a
       ! finite error making it slightly more or less than 1.
       ! For XRATIO slightly more than 1, STT(I,J,L,IDTHNO3)<=0
       ! which triggers a CHECK_STT error message.
       !
       if (STT(I,J,L,IDTHNO3) <= 0d0) then
          STT(I,J,L,IDTHNO3) = 0d0
       endif
       !--------------------------------------------------------
immediately after the calculation of STT(I,J,L,IDTHNO3) and this fixes the problem! I think this will have to be fixed in the next version of the code.

Philippe Le Sager (plesager@seas.as.harvard.edu) wrote:

Thanks for catching that. We will fix this simply with a
   MAX( ( 1d0 -  XRATIO(J,L) ), 1d-20) 
in place of
   ( 1d0 - XRATIO(J,L) ), 
since we want to have strictly positive concentration in the troposphere and some boxes switch b/w troposphere and stratosphere.

You may obtain the modified file at ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/upbdflx_mod.f.

NOTE: This fix is slated for inclusion into GEOS-Chem v8-01-02.

--Bob Y. 11:13, 12 June 2008 (EDT)

Problem w/ ND65 diagnostic and dynamic tropopause

Yuxuan Wang (wang3@fas.harvard.edu) wrote:

It looks like that it's the dynamic tropopause in v8-01-01 that causes the unrealistically high AD65 values (prod and loss rate of chemical families). When I turn off the dynamic tropopause in input.geos, the model gives AD65 that's reasonable and also comparable to my previous simulations with code version 7. I took a careful look of tropopause_mod.f, but couldn't find where the problem is.
Yuxuan

Philippe Le Sager (plesager3@seas.harvard.edu) wrote:

I think I found the root of your problem with P/L diagnostic. The line 606 (subroutine do_diag_pl, in diag_pl_mod.f):
          ! Zero each "fake" ND65 prod/loss family for next iteration
          CSPEC(JLOOP,IFAM(N)) = 0.0d0
is done too late in the code. Here is the full story:
At each time step, the CSPEC_FULL array is used to initialize the CSPEC array before entering SMVGEAR. CSPEC_FULL is filled with CSPEC after SMVGEAR. So it should account for the line 606 above, but it does not since the family P/L diagnostic is done after CSPEC_FULL has been updated.
The fix is easy. In chemdr.f you need to move the call to SAVE_FULL_TROP to the end of the routine, so CSPEC_FULL is updated after the P/L diagnostic. You can look the updated file at: ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/chemdr.f.
If you could let me know if that fix you problem with a quick run, that would be perfect
Thanks,
-Philippe

Yuxuan Wang (wang3@fas.harvard.edu) wrote:

I just did a one-day test and it worked! P/L diagnostics now are at the right order of magnitude after I turned on the variable tropopause. That's indeed the fix. Thank you!!
Yuxuan

NOTE: This fix is slated for inclusion into v8-01-02.

--Bob Y. 16:17, 3 June 2008 (EDT)

Negative tracer in routine WETDEP because of negative RH

See this post: GEOS-5 issues#Small negative RH value in 20060206.a6.2x25 file

Fixes are available at ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01.

--phs 16:31, 6 June 2008 (EDT)

Negative tracer in routine WETDEP

Dylan Millet (dbm@umn.edu) wrote:

I'm having a run die consistently at the same time (October 1, 2005; first time step of the month) in large-scale wetdep, with an STT element < 0.
  • Platform: Linux cluster
  • Threads: 8
  • Version: v7-4-13 out of the box.
  • GEOS4, 4x5, 30L, full chemistry
  • IFORT 10.1
In Section 6 (No Downward Precip) of wetscav_mod.f, subroutine safety is getting called.
    WETDEP - STT < 0 at    1   1  29 for tracer    7 in area    6
(First of all it seems odd to do wetdep for L=29, this is 63 km up). Have you seen anything like this? I ran for the whole year starting Jan 1 successfully until this point.
... By the way, the problem persists when I turn off chemistry altogether.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

I used your restart file and the same input.geos (w/ chemistry on and off). My code went thru without problem. I tried both Sun Studio and Ifort 9 compilers, and the later on two different machines (altix and ceres). I used v7-04-13 and v8-01-01. I never reproduced your error.
We just got the new Ifort 10, and tried it too. I run v8-01-01 without an error. But when I tried v7-04-13, I finally reproduced your error, with the exact same negative values!
In other words: the bug happens with IFort 10 and v7-04-13 only.
Also, have a look at this recent development. This is not the reason for your bug (I tried v8 w/ ifort 10 and isorropia -like v7-04-13- and it did not crash), but using RPMARES instead of Isorropia may be a way to fix it.
... More about the Ifort 10 / v7-04-13 issue. When I wanted to debug with TotalView, I could not reproduce the bug anymore.... because I simply suppress any optimization. So, I did more test and found that if the default -O2 optimization is used, GEOS-Chem crashes. But it works fine with -O1. It is hard to tell what happens, since only the emissions step is done between reading the restart file and the crash.
Bob and I will further test Ifort 10 for optimization on our machines. Maybe we will find something... For the time being, you may have to switch to -O1, at least for the run that crashes. You will find the optimization flag at the beginning of the Makefile.ifort.

Long story short: This appears to be an optimization issue with IFORT 10 and v7-04-13. Upgrading to GEOS-Chem v8-01-01 should solve this problem.

--Bmy 10:38, 17 April 2008 (EDT)

ISORROPIA and RPMARES

Please see the discussion about the bugs & fixes for ISORROPIA and RPMARES on the Code Developer's Forum for Aerosol thermodynamical equilibrium.

Also, if you are trying to run an aerosol-only simulation, then please see this discussion about a bug that manifested itself only after switching from ISORROPIA to RPMARES.

--Bob Y. 10:44, 26 June 2008 (EDT)