Bugs and fixes
On this page we list the GEOS-Chem bugs that users have recently encountered, and how to fix them. Please also see the following wiki pages:
- Outstanding issues yet to be resolved
- This page contains a list of known issues (e.g. shortcomings in scientific algorithms, diagnostics, technical problems, etc.) which are slated to be fixed in the standard GEOS-Chem code but have not been done so at this time.
- Previous issues now resolved in v8-01-01
- This page contains a list of known issues that have already been fixed in the GEOS-Chem v8-01-01 at this time.
- Previous issues now resolved in v8-01-02
- This page contains a list of known issues that have already been fixed in the GEOS-Chem v8-01-02 at this time.
- Previous issues now resolved in v8-01-03
- This page contains a list of known issues that have already been fixed in the GEOS-Chem v8-01-03 at this time.
- 1 Error message in partition.f
- 2 Erroneous O3 diagnostic (ND45)
- 3 Seg fault in GEOS-5 China nested grid simulation
- 4 Output of ND51 does not match output in the ctm.bpch file
- 5 Quick fix for GEOS-5 optical depth
- 6 Bad GEOS-4 A6 met data causing segmentation fault
- 7 Too many levels in photolysis code
- 8 Negative tracer in routine WETDEP because of negative RH
- 9 Negative tracer in routine WETDEP
- 10 ISORROPIA and RPMARES
Error message in partition.f
Tzung-May Fu (email@example.com) wrote:
- I ran into a bug in v8-01-03 in partition.f. In line 285~291:
! Get NOx concentration from STT CONCNOX = STT(I,J,L,IDTNOX) ! Stop w/ error IF ( CONCNOX - SUM1 < 0.d0 ) THEN CALL ERROR_STOP( 'STOP 30000', 'partition.f' ) ENDIF
- However, it is possible to run into a situation where CONCNOX=0d0, but SUM1=NO2+NO3 = 1d-99 + 1d-99, due to the underflow check in line 258~262. If this happens, CONCNOX-SUM1=-2d-99, and will stop the simulation.
- I think the stop w/ error statement should be revised to something more robust, or at least changed to:
! Stop w/ error IF ( CONCNOX - SUM1 < -3.d-99 ) THEN CALL ERROR_STOP( 'STOP 30000', 'partition.f' ) ENDIF
- In case CONCNOX-SUM1 is a very small negative value, it can still be caught in the underflow check in line 317.
- Do you agree? I am a little worried that this will propagate unwanted error into the code.
Bob Yantosca (firstname.lastname@example.org) wrote:
- Yes, we found that out recently too, particulary when you use the new TPCORE. I ran into the same problem in the NRT. As you say, it is an underflow error. I might not have posted this on the wiki yet, if so I'll do it today.
- The cheap solution is to just print a warning message, and then the user can decide if it's a terrible error or not. If the CONCNOX - SUM1 is of order 1e-99 or 2e-99 then it's OK. This will go into the next version (v8-01-04). In the meantime, try this:
! Error test IF ( CONCNOX - SUM1 < 0.d0 ) THEN !------------------------------------------------------ ! Prior to 1/7/09 ! Don't stop w/ error, but just print warning msg. ! Sometimes the new TPCORE can cause this error to ! trap if there CONCNOX = 0, but that can be purely ! a numerical condition and not really an error. ! (phs, ccc, bmy, 1/7/09) !CALL ERROR_STOP( 'STOP 30000', 'partition.f' ) !------------------------------------------------------ !$OMP CRITICAL PRINT*, '### In partition.f: CONCNOX - SUM1 < 0' PRINT*, '### If CONCNOX = 0 and SUM1 ~ 1e-99 it is OK' PRINT*, '### I, J, L : ', I, J, L PRINT*, '### CONCNOX : ', CONCNOX PRINT*, '### SUM1 : ', SUM1 !$OMP END CRITICAL ENDIF
--Bob Y. 11:19, 3 February 2009 (EST)
Erroneous O3 diagnostic (ND45)
Lee Murray and Claire Carouge reported a typo in the way that v8-01-02 and v8-01-03 outputs the O3 ND45 diagnostic. Ox is used in place of O3 line 2601 of diag3.f. You need to replace:
ARRAY(:,:,1:LD45) = AD45(:,:,1:LD45,N) / $ FLOAT( CTO3 )
ARRAY(:,:,1:LD45) = AD45(:,:,1:LD45,N_TRACERS+1) / $ FLOAT( CTO3 )
--phs 08:52, 3 February 2009 (EST)
Seg fault in GEOS-5 China nested grid simulation
Xiaoguang Gu (email@example.com) wrote:
- I ran into a problem while running GEOS-5, version v8-01-02 for 0.5x0.66 China nested simulation. I followed the instructions on GEOS-Chem Manual to make the simulation. And, I successfully run the model with 4x5 resolution to save out the boundary conditions. However, the China nested simulation stops after saving out ctm.bpch file and restart file with the error message as following. I run the model on a LINUX cluster with LINUX/IFORT (64-bit) compiler. The processor number I set is 32 for 4 threads.
- The error message was:
---> DATE: 2006/07/03 GMT: 00:00 X-HRS: 48.000 - DIAG3: Diagnostics written to bpch! - MAKE_RESTART_FILE: Writing restart.2006070300.05x0667 - INITIALIZE: Diag arrays zeroed! - INITIALIZE: Diag counters zeroed! =============================================================================== ND23: Mass-Weighted OH Concentration Mean OH = 20.8207222069296 [1e5 molec/cm3] =============================================================================== - CLEANUP: deallocating arrays now... forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source ... geos 0000000000AE3447 tpcore_geos5_wind 318 tpcore_geos5_window_mod.f90 geos 0000000000AFF834 transport_mod_mp_ 1256 transport_mod.f geos 0000000000AF02B7 transport_mod_mp_ 140 transport_mod.f geos 0000000000C47BE5 MAIN__ 812 main.f
Bob Yantosca (firstname.lastname@example.org) replied:
- I think the problem is an out-of-bounds error somewhere. If you are going outside of array bounds in an allocatable array then that could cause the error you saw at the end of the run.
- Try adding -traceback -CB to the FFLAGS line in the makefile. Traceback will give you more detailed error output and -CB will turn on checking for "array out of bounds" errors.
Xiaoguang Gu (email@example.com) wrote:
- Thanks for your response! By adding -traceback -CB to the FFLAGS line in the makefile following your suggestion, I have found the problem and solved it.
- It was showed in the log file that the problem is originated from an allocatable array 'COSE'. The error messages in the log file are the following:
- UPBDFLX_NOY: Reading /home/jwang7/xxu/ GEOS-Chem/data/GEOS_0.5x0.666_CH/pnoy_200106/pnoy_nox_hno3.geos5.05x0666 forrtl: severe (408): fort: (2): Subscript #1 of the array COSE has value 134 which is greater than the upper bound of 133
- So, I checked the array COSE in the code tpcore_geos5_window_mod.f90, and found those lines:
....... 263 !---------------- 264 ! Allocate arrays 265 !---------------- 266 267 allocate ( cosp(jm) ) 268 allocate ( cose(jm) ) ........ 308 do j=1,jm+1 !(dan) 309 elat(j) = 0.5*(clat(j-1) + clat(j)) 310 sine(j) = sin(elat(j)) 311 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 312 !%%% MODIFICATION by Harvard Atmospheric Chemistry Modeling Group 313 !%%% 314 !%%% Initialize SINE_25 array (bmy, bdf, 10/29/04) 315 !%%% 316 SINE_25(J) = SIN( CLAT(J) ) 317 !%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 318 cose(j) = cos(elat(j)) 319 enddo ..........
- So, I modified
allocate ( cose(jm) )
- to be
allocate ( cose(jm+1) )
- in the line 268. And, I got a successful run after this modification.
Dan Chen (firstname.lastname@example.org) wrote:
- I made a test on Xiaoguang's modification and I found that it's indeed a bug that may cause memory allocating problem. It would not change the simulation result, but may cause the machines running out of memory. I think this modification is necessary and correct!
--Bob Y. 09:31, 22 January 2009 (EST)
Output of ND51 does not match output in the ctm.bpch file
Please see this discussion about an inconsistency between the ND51 diagnostic (ts_satellite.bpch file) and the ctm.bpch file.
Quick fix for GEOS-5 optical depth
Please see the discussion for a quick fix for the GEOS-5 optical depths.
--Bob Y. 16:13, 10 October 2008 (EDT)
Bad GEOS-4 A6 met data causing segmentation fault
Jesse Kenyon (email@example.com) wrote:
- In our runs of the GEOS-Chem model using GEOS4 data from 2006, we have run into a corrupt data problem that causes our runs to crash on rundate 20060913. We have been able to isolate this to bad values in the A6 files for the meridional wind (V component). Specifically, it appears two files contain bad data: 20060913.a6.4x5 and 20060915.a6.4x5. The bad data takes the form of unphysically huge values (e.g. 0.75553E+29), and occurs at many gridboxes on levels 6 and 7 of the 20060913 file and levels 18 and 19 of the 20060915 file. In both files, the bad data only occur for the 00 hour (06, 12, and 18 hours appear okay). We also checked the 20060914, 20060916, 20060917, 20060918, and 20060919.a6.4x5 files - they seem okay. A check of the U component wind on 09-13 and 09-15 shows no problems.
- For us, the problem manifested itself as a segmentation fault when trying to access address -21474836 of array qtmp in subroutine xtp in module tpcore_fdvas_mod.f90. The address was calculated from xmass which is calculated from wind and pressure in another subroutine (Init_press_fix). We know some folks at Harvard have been able to get beyond this rundate without crashing and suspect it might be due to a difference in computer system or compiler.
- We have not yet tried running with a "repaired" V wind, so cannot say for sure that there are no other problems in the A6 files besides V (or U which was also checked).
Philippe Le Sager (firstname.lastname@example.org) wrote:
- Thanks for reporting the issue. We had the exact same problem with GCAP once and the problem was solved by repairing the met field (a very bad value in U or V). Then Mike and Bastien got the exact same error with GEOS4, on the same day as you. My first idea was to test the met fields but I did not get any bad value. I just did a run with Bastien's inputs, and was not able to reproduce the crash. You just confirmed that the first idea was the good one: a bad met field.
- Since I seem to have a good met field and you do not, I check the met fields on the server this time. And I did find a problem with V for 20060913. Here are the output from test_met.pro (it gives min and max of each met fields):
at Harvard (internal disk): 20060913 000000 U -72.816071 144.557083 20060913 000000 V -67.133759 61.291386 on the server: 20060913 000000 U -72.816071 144.557083 20060913 000000 V -67.133759***************
- All others fields give the exact same min/max. There is a huge or NaN value in the file we put on the server. We do not understand how that happened, since files are simply copied from one location to the other. So we are still investigating the issue, checking the whole archive, and will let you know as soon as we replace it.
Bob Yantosca (email@example.com) wrote:
- I have fixed the bad A-6 data for 2006 -- 09/13, 09/14, 09/15, 09/16. For some reason the data in the FTP site was corrupt (bad values in the winds at a couple of GMT times) but the data on our internal disk (behind the firewall) was not. I just copied the relevant data files over and re-created the TAR file.
- Please obtain the new TAR file from:
ftp ftp.as.harvard.edu cd pub/geos-chem/data/GEOS_4x5.d/GEOS_4_v4/2006/09 get 09.tar.gz
--Bob Y. 15:40, 23 September 2008 (EDT)
Too many levels in photolysis code
The scattering module (OPMIE.f) for Fast-J requires many additional vertical levels. It happens that the limit (NL set in jv_mie.h) can be reached in some situations, causing the program to stop with a "Too many levels in photolysis code.." error message. Sometimes you can increase NL to solve the problem. Now a new version of OPMIE.f is available, which still warns you if NL is reached, but works with that limit.
Before being released into the standard model, you can find the new OPMIE.f at: ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01/OPMIE.f
--phs 11:29, 17 June 2008 (EDT)
This can also be an indication that there may be a problem in your visual optical depths, dust emissions, or aerosol emissions. Dust and aerosol optical depths are computed from the concentration array STT. If for some reason you end up emitting too much aerosol or dust (i.e. a unit conversion error), then this will result in an abnormally high dust or aerosol optical depth. A very high optical depth will cause FAST-J to want to keep adding points to the Gaussian quadrature in OPMIE.f. You can get into a situation where the number of points that FAST-J wants to add is greater than the array parameter NL (it may want to add thousands of points!).
Therefore, if you encounter this type of error, it is a good idea to doublecheck your aerosol & dust emissions to make sure that the monthly and annual totals are reasonable.
--Bob Y. 11:03, 26 June 2008 (EDT)
Negative tracer in routine WETDEP because of negative RH
Fixes are available at ftp://ftp.as.harvard.edu/pub/geos-chem/patches/v8-01-01.
--phs 16:31, 6 June 2008 (EDT)
Negative tracer in routine WETDEP
Dylan Millet (firstname.lastname@example.org) wrote:
- I'm having a run die consistently at the same time (October 1, 2005; first time step of the month) in large-scale wetdep, with an STT element < 0.
- Platform: Linux cluster
- Threads: 8
- Version: v7-4-13 out of the box.
- GEOS4, 4x5, 30L, full chemistry
- IFORT 10.1
- In Section 6 (No Downward Precip) of wetscav_mod.f, subroutine safety is getting called.
WETDEP - STT < 0 at 1 1 29 for tracer 7 in area 6
- (First of all it seems odd to do wetdep for L=29, this is 63 km up). Have you seen anything like this? I ran for the whole year starting Jan 1 successfully until this point.
- ... By the way, the problem persists when I turn off chemistry altogether.
Philippe Le Sager (email@example.com) replied:
- I used your restart file and the same input.geos (w/ chemistry on and off). My code went thru without problem. I tried both Sun Studio and Ifort 9 compilers, and the later on two different machines (altix and ceres). I used v7-04-13 and v8-01-01. I never reproduced your error.
- We just got the new Ifort 10, and tried it too. I run v8-01-01 without an error. But when I tried v7-04-13, I finally reproduced your error, with the exact same negative values!
- In other words: the bug happens with IFort 10 and v7-04-13 only.
- Also, have a look at this recent development. This is not the reason for your bug (I tried v8 w/ ifort 10 and isorropia -like v7-04-13- and it did not crash), but using RPMARES instead of Isorropia may be a way to fix it.
- ... More about the Ifort 10 / v7-04-13 issue. When I wanted to debug with TotalView, I could not reproduce the bug anymore.... because I simply suppress any optimization. So, I did more test and found that if the default -O2 optimization is used, GEOS-Chem crashes. But it works fine with -O1. It is hard to tell what happens, since only the emissions step is done between reading the restart file and the crash.
- Bob and I will further test Ifort 10 for optimization on our machines. Maybe we will find something... For the time being, you may have to switch to -O1, at least for the run that crashes. You will find the optimization flag at the beginning of the Makefile.ifort.
Long story short: This appears to be an optimization issue with IFORT 10 and v7-04-13. Upgrading to GEOS-Chem v8-01-01 should solve this problem.
--Bmy 10:38, 17 April 2008 (EDT)
ISORROPIA and RPMARES
Please see the discussion about the bugs & fixes for ISORROPIA and RPMARES on the Code Developer's Forum for Aerosol thermodynamical equilibrium.
Also, if you are trying to run an aerosol-only simulation, then please see this discussion about a bug that manifested itself only after switching from ISORROPIA to RPMARES.
--Bob Y. 10:44, 26 June 2008 (EDT)