GAMAP tips and tricks: Difference between revisions

From Geos-chem
Jump to navigation Jump to search
Line 154: Line 154:
GAMAP ships with several routins for computing various statistical quantities:
GAMAP ships with several routins for computing various statistical quantities:


; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#CUM_TOTAL%20(FUNCTION) CUM_TOTAL]: Computes the cumulative total of an array.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#MEAN MEAN]: Computes the mean value of an array of any # of dimensions.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#MEAN MEAN]: Computes the mean value of an array of any # of dimensions.
;[http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#ORG_CORR ORG_CORR]: Calculates the reduced major-axis.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#PERCENTILES PERCENTILES]: Computes percentiles of an array
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#PERCENTILES PERCENTILES]: Computes percentiles of an array
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#CUM_TOTAL%20(FUNCTION) CUM_TOTAL]: Computes the cumulative total of an array.
;[http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#ORG_CORR ORG_CORR]: Calculates the reduced major-axis.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#QQNORM QQNORM]: Sorts a data array, assigns assign actual "probability" and calculates the expected deviation from the mean.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#QQNORM QQNORM]: Sorts a data array, assigns assign actual "probability" and calculates the expected deviation from the mean.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#RUN_AV%20(FUNCTION) RUN_AV]: Computes the running average or running total of a data vector.
; [http://www.as.harvard.edu/chemistry/trop/gamap/doc/by_category/MathAndUnits.html#RUN_AV%20(FUNCTION) RUN_AV]: Computes the running average or running total of a data vector.

Revision as of 13:43, 8 July 2008

On this page we list some very helpful tips on how to accomplish various tasks with the GAMAP routines.

General GAMAP usage

GAMAP and IDL 7

Version 7 of IDL comes with a new development environment (called Workbench) based on Eclipse. If you are not using the IDL Workbench, you will not notice any change. But if you do, you will see a major upgrade, ... and may soon have a problem with the !PATH definition. For GAMAP to work properly, we provide an idl_startup.pro template file for you to modify for your system. In this routine, you used to see the !PATH defined like that:

IdlPath = Expand_Path( '+' + !PATH,  /All_Dirs ) 

!PATH       = EXPAND_PATH('+~/IDL/gamap2/', /ALL_DIRS) + ':' + $
              EXPAND_PATH('+~/IDL/idlpro/')            + ':' + $
              EXPAND_PATH('+~/IDL/jhuapl/')            + ':' + $
              IdlPath

Now, for GAMAP to work with the new IDL Workbench, you must use the following syntax instead:

; Call the IDL routine PATH_SEP to get the path separator token 
; (i.e. the character that you need to separate directories in !PATH). 
; This token is different depending on which OS you are using.
Sep     = Path_Sep( /Search_Path )

; Use the PREF_SET command to append your directories into the default
; IDL search path.  This approach is necessary if you want to use the 
; IDL Workbench in IDL 7.0 and later versions. (phs, bmy, 4/15/08)
PREF_SET, 'IDL_PATH',                                        $
           EXPAND_PATH('+~/IDL/gamap2/', /ALL_DIRS ) + Sep + $
           EXPAND_PATH('+~/IDL/idlpro/'            ) + Sep + $
           EXPAND_PATH('+~/IDL/jhuapl/'            ) + Sep + $
           '<IDL_DEFAULT>', /COMMIT

NOTES:

  1. The PREF_SET command was introduced in IDL 6.2.
  2. This syntax can be added to your idl_startup.pro file even if you do not use the IDL Workbench. More about that issue can be found here.
  3. You can add/delete directories to the IDL_PATH corresponding to the particular directory structure in your space. In this particular example we are adding the gamap2, idlpro, and jhuapl directories to IDL_PATH. If, for example, you don't have an idlpro directory, you can of course omit that.
  4. The EXPAND_PATH function should expand a relative file path (i.e. with a ~) to a full file path. However, there may be some flavors of Linux for which ~ might be problematic.
  5. Adding a + before the directory name in the call to EXPAND_PATH will make IDL search the directory and all of its subdirectories for files of the appropriate type (*.pro, *.sav) for the given path.
  6. Using the /ALL_DIRS keyword in the call to EXPAND_PATH, such as EXPAND_PATH( '+~/IDL/gamap2/', /ALL_DIRS ), will cause IDL to return the full path names of all of the subdirectories under ~/IDL/gamap2, regardless of whether or not they contain *.pro or *.sav files. (IDL's default behavior is to only return path names for those directories containing *.pro or *.sav files.)
  7. If the EXPAND_PATH function for some reason does not expand the filepaths, then you can specify them manually. Windows users might need to start the string with the drive letter (e.g. C:\).
  8. It is best to create the <IDL_DEFAULT> string with PREF_SET. Do not attempt to add the file paths directly to the IDLDE or IDL Workbench GUI.

--phs 16:42, 14 April 2008 (EDT)

--Bob Y. 13:49, 7 July 2008 (EDT)

Usage of TRACERINFO.DAT and DIAGINFO.DAT

User A said he keeps getting this message:

% UPDATE_TRACER_DATAINFO: WARNING: Use_DataInfo appears independent from global array!

User B said:

My ctm.bpch files have duplicate data blocks for some parameters. Something's wrong!

Answer:

These are symptoms of diaginfo.dat or/and tracerinfo.dat that do not match the binary punch file (user B), or that the bpch file was created with inadequate diag/tracerinfo.dat files and end up with duplicate 999 tracers without tracername for example (user B). The solution is to use correct diag/tracerinfo.dat when reading and creating bpch files. GEOS-Chem now writes those files for each runs in your run directory. If you use them when reading run outputs, you will not have a problem.

The first time a GAMAP routine is used, it will look for and read these files. First, in the current directory, then in the !path variable (which means it will used the default ones in ../gamap2/input_files/). So, to read those corresponding to your run you can:

  1. start your IDL session in your run directory
  2. start your IDL anywhere, and then: cd, 'your_run_directory', before using GAMAP
  3. copy your diag/tracerinfo.dat from your run dir to your ../gamap2/input_files/ (i.e., overwrite default ones), before using GAMAP
  4. Advance users can also, at anytime in their session, force GAMAP to read specific diag/tracerinfo.dat:
ctm_diaginfo, /all, /force_reading, filename='your_diaginfo.dat'
ctm_tracerinfo, /all, /force_reading, filename='your_tracerinfo.dat'

If you are using a default location, double check that it is really used with the following query:

print, file_which('tracerinfo.dat')

--phs 15:52, 4 April 2008 (EDT)

Date and time

This section has been split off into its own wiki page, Date and Time Computations with GAMAP.

Text manipulation with GAMAP

This section has been split off into its own wiki page, Text manipulation with GAMAP.

Memory management and CTM_MAKE_DATAINFO

When using CTM_MAKE_DATAINFO, you create few pointers and allocate memory to pointed data. Three scenarios to free that memory are possible.

(1) By default, GAMAP keeps track of the pointers in a global structure, and you can clean up the memory by calling CTM_CLEANUP (with or without the keyword /NO_GC):

    ctm_make_datainfo(data, ...)
    CTM_WriteBpch, DataInfo, FileInfo, FileName=file
    ctm_cleanup

(2) If you use the keyword /NO_GLOBAL when calling CTM_MAKE_DATAINFO, the story is a little bit more subtle. Now, GAMAP has no idea of the created pointers. If you use CTM_CLEANUP without the keyword /NO_GC, everything is clean up because there is a call to heap_gc. But you are also loosing **all other** pointers and objects.

    ctm_make_datainfo(data, ..., /No_global)
    CTM_WriteBpch, DataInfo, FileInfo, FileName=file
    ctm_cleanup

Note if you do not call ctm_cleanup or call it with /No_GC, then the memory allocated by CTM_MAKE_DATAINFO is still allocated and the pointers that refers to it are alive... until you exit the routine (unless they are passed back). Once you are out of the routine, this memory remains allocated but is useless since unaccessible, in other words you have memory leak.

(3) So, if you want to keep some objects and/or pointers alive in your code (i.e., you do not want to call heap_gc or ctm_cleanup,/no_gc), you need to free only the created pointers as follows:

    ctm_make_datainfo(data, datainfo, fileinfo, ..., /No_global)
    CTM_WriteBpch, DataInfo, FileInfo, FileName=file
    ptr_free, DataInfo.data
    ptr_free, fileinfo.gridinfo

To free only the heap memory created by multiple calls to CTM_MAKE_DATAINFO, the procedure is:

  for D=0L, NTracers-1L do begin
     
     Success = CTM_Make_DataInfo( Data[*,*,D], DataInfo, FileInfo, ..., /No_Global )
     
     ArrDataInfo = D eq 0l ? [ DataInfo ] : [ ArrDataInfo, DataInfo ]
     
     if D ne Ntracers-1l then ptr_free, Fileinfo.gridinfo
     
  endfor
     
  CTM_WriteBpch, ArrDataInfo, FileInfo, FileName=OutFileName
     
  for d=0, n_elements(ArrDataInfo)-1l do ptr_free, ArrDataInfo[d].data
  ptr_free, fileinfo.gridinfo

--phs 10:28, 13 February 2008

Too many logical unit numbers used by GAMAP

Chris Holmes (cholmes@seas.harvard.edu) wrote:

A lot of gamap routines seem to use up LUNs for file io. This means that after iterating several read cycles, I can no longer open more files, even though I'm closing them. Here's what's going on:
A lot of gamap2/gamap_util/ routines are opening files with the following command
   OPEN_FILE, File, LUN, /GET_LUN
OPEN_FILE checks whether the LUN variable is undefined (lines 151 and 241) and if so it allocates one (line 241).
But then OPEN_FILE also passes the /GET_LUN keyword to OPENW or OPENR via the _EXTRA keyword (lines 244-249), which allocates another LUN without freeing the first one.
I'm not sure what the best solution is. Probably remove the /GET_LUN keyword in all instances of OPEN_FILE. You could also add a GET_LUN keyword to OPEN_FILE that does nothing except prevent it from passing it on to OPENW and OPENR.
Having resolved that, I still run out of LUNs and I've traced the problem to CTM_OPEN_FILE. Actually there's already a comment about this problem on lines 725-731. So yes I think it is a problem and should be fixed.

Philippe Le Sager (plesager@seas.harvard.edu) wrote:

I looked at the issues you pointed out. OPEN_FILE was indeed automatically using get_lun, while the user could pass /get_lun with _extra. To avoid the double LUN assignment, I think the dummy /get_lun keyword is the best solution (users can put whatever they want in the _extra), although we could be more sophisticated and check on the field names in the extra keyword structure.
CTM_OPEN_FILE was more difficult. I did not find it well written, and it had quite a few coding errors and misleading comments. I decided to simplify it, and get rid of all the "testlun" business. It uses all LUN instead of every other one now. I am testing my version now and it seems to work fine. Feel free to look at it if you want.
Thanks for reporting the bugs. These will be fixed in the next version of GAMAP (v2-12).

--Bob Y. 13:19, 1 July 2008 (EDT)

GAMAP routines for statistical analysis

GAMAP ships with several routins for computing various statistical quantities:

CUM_TOTAL
Computes the cumulative total of an array.
MEAN
Computes the mean value of an array of any # of dimensions.
ORG_CORR
Calculates the reduced major-axis.
PERCENTILES
Computes percentiles of an array
QQNORM
Sorts a data array, assigns assign actual "probability" and calculates the expected deviation from the mean.
RUN_AV
Computes the running average or running total of a data vector.

Also, the IDL routine MOMENT computes the 4 statistical moments of a data set: mean, variance, skewness, and kurtosis.

--Bob Y. 09:42, 8 July 2008 (EDT)

Binary punch file output

Corrupted binary file error

Dylan Millet (dbm@umn.edu) wrote:

Hi guys -- I ran into a problem with bpch_link.pro:
   IDL> bpch_link,'2006*bpch','new.bpch' 
   Now Reading 20060101_2x25.ctm.bpch
   % READU: Corrupted f77 unformatted file detected. Unit: 108, File: 20060101_2x25.ctm.bpch
   % Execution halted at: BPCH_LINK         142 /users/dbm/IDL/gamap_v2.10/file_io/bpch_link.pro
   %                      BPCH_LINK         142 /users/dbm/IDL/gamap_v2.10/file_io/bpch_link.pro
   %                      $MAIN$         IDL>

Bob Yantosca (yantosca@seas.harvard.edu) replied:

You may be using an older version of bpch_link.pro. We recently fixed a problem for the little/big endian but that may not be in the std gamap as of yet. The fix is simple. You have to use the SWAP_ENDIAN keyword to all instances of OPEN_FILE:
   ; External functions
   FORWARD_FUNCTION CTM_Grid, CTM_Type, MFindFile, Little_Endian

   ; Open the output file (write as big-endian)
   Open_File, OutFile, Ilun_OUT, /F77, /GET_LUN, /Write, Swap_Endian=Little_Endian()

Specifying SWAP_ENDIAN=LITTLE_ENDIAN() in each call to OPEN_FILE will cause IDL to swap the endian ordering if you are running IDL on a little-endian machine. NOTE: In GAMAP v2-12 (coming soon!), all routines that read binary files now use the SWAP_ENDIAN!

--Bob Y. 14:34, 6 June 2008 (EDT)

Reading binary punch files directly from IDL

The best way to read a GEOS-Chem binary punch file is by using the GAMAP main program gamap.pro. More experienced users may also use one of the lower-level routines (e.g. CTM_GET_DATA or CTM_GET_DATABLOCK) to read the file. However, if for any reason you need to read a binary punch file directly from IDL (i.e. without using GAMAP, CTM_GET_DATA, or CTM_GET_DATABLOCK), then you have a couple of options:

  1. The GAMAP routine BPCH_TEST (which is located in the file_io subdirectory of the main GAMAP directory) can be used to quickly parse a binary punch file. BPCH_TEST prints out the header information from the file as well as the min and max data values. This is very useful for debugging. You can look at the structure of the file gamap2/file_io/bpch_test.pro to see how the binary punch file is read.
  2. If you are more accustomed to working with R, GrADS, Matlab, or any other visualization package that takes netCDF files as input, then you might want to use the GAMAP routines BPCH2NC or BPCH2COARDS to first convert the binary punch file into netCDF format.

--Bob Y. 13:28, 23 May 2008 (EDT)

Splitting and joining binary punch files

GAMAP comes with two routines for working with GEOS-Chem binary punch file output directly. Routine BPCH_SEP will extract a data block from a big binary punch file and save it to a separate file:

Colette Heald (heald@atmos.colostate.edu) wrote:

I did a year-long run starting from 20011201 and all the output appears to have saved in one file labeled ctm.bpch.v7-04-13-tra54.2001120100
So first off I'm wondering, is there a way to save each monthly mean as a separate ctm.bpch file? Secondly, the output file won't read into gamap - I thought the file might be corrupt, but actually it seems to read in fine with bpch_test. When I do gamap, fi='<filename>', I get this error:
   Error (-261) reading file
   ctm.bpch.v7-04-13-tra54.2001120100
    in block 3912
   POINT_LUN: Negative postition argument not allowed. Position: -2147377716,
   Unit: 21
   File: ctm.bpch.v7-04-13-tra54.2001120100
I get the same error when I try to use ctm_get_datablock to pull out one tracer. Do you think the file is salvageable? Have you seen this problem before? I'm wondering if the file is just too big...

Bob Yantosca (yantosca@seas.harvard.edu) replied:

Right now the code can't save to separate bpch files. However, you can split them up wtih bpch_sep.pro into individual files in post-processing. For example:
   pro split

      ; Splits into monthly bpch files (change as necesary)
  
      ; The big input file
      InFile = 'ctm.bpch.v8-01-01-geos5-Run0.2005010100'
      ctm_cleanup

      ; Split the data in the big file for 20050101 into a separate file
      bpch_sep, InFile, 'ctm.bpch.v8-01-01-geos5-Run0.20050101', $
         tau=nymd2tau( 20050101 )
   end

GAMAP routine BPCH_LINK will concatenate data blocks from several smaller bpch files and save them into a large bpch file.

   ; Consolidates data from the 'ctm.bpch.*' files
   ; into a single file named 'new.ctm.bpch'
   BPCH_LINK, 'ctm.bpch.*', 'new.ctm.bpch'

--Bmy 16:53, 22 April 2008 (EDT)

Renaming and Regridding APROD/GPROD restart files

When using secondary aerosols, you need a restart_aprod_gprod.YYYYMMDDhh file. Unlike the general restart file used for GEOS-Chem runs, it cannot be simply renamed and ready to use for another simulation date.

Renaming APROD/GPROD restart files

You must rewrite you restart_gprod_aprod.YYYYMMDDhh so that the date in the filename is the same as the one in the datablock headers. A new routine in GAMAP v2.12 will do it for you: ../gamap2/date_time/rewrite_agprod.pro

Regridding APROD/GPROD restart files

You can regrid a 2ndary aerosol restart file with the same routines used to regrid the standard restart file. However you need to tell the routines to regrid all tracers with the keyword diagn=0 (which means "all tracers"):

regridh_restart, ..., diagn=0
regridv_restart, ..., diagn=0

--phs 14:51, 4 April 2008 (EDT)

Combining output from timeseries files

Ray Nassar (ray@io.as.harvard.edu) wrote:

I just have a quick question, is there any GAMAP routine that can average all hourly data blocks in a timeseries file to make a single daily average bpch file?
It appears like I could use the average option of ctm_sum.pro but I do not quite understand where the averaged data goes and afterwards would still have to write to bpch.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

Check the
   /gamap2/timeseries/gc_combine_nd49.pro
   /gamap2/timeseries/gc_combine_nd48.pro 
routines, which combines daily bpch files (nd48 & nd49 output, but also met field input files) into 4D data blocks. It has many options. You can extract a subset of data according to time and/or location, or process the data (moving average, daily max, shift to local time). You can either save the data into a new bpch file or just get an array of data in output.
I wrote a tutorial that gives some pointers here.
-Philippe

--Bmy 09:43, 1 April 2008 (EDT)

Adding tracers to a restart file

Duncan Fairlie (t.d.fairlie@nasa.gov) wrote:

I need to construct a new restart file with 55 tracers from an existing full chem (43 tracer) restart file. The extra 12 tracers will include dust-nitrate, dust-sulfate, and dust-alkalinity (4 tracers each). I just want to start them uniform 1.0e-15 or so.
I can probably figure out how to do this, but wondered if you have something off the shelf that can be used to add several blank tracers to an existing restart file .

Bob Yantosca (yantosca@seas.harvard.edu) replied:

Maybe the easiest way to do this is to use GAMAP and BPCH_RENUMBER. You can load 2 files

   gamap, file=file1
   gamap, file=file2
   gamap, /nofile

then type s1-55 (for all 55 data blocks) and this will save out all data blocks to a new file (you'll be prompted for the name).

Then you can hack into IDL code bpch_renumber.pro to renumber some of the tracer numbers. Or even better yet, renumber the tracer numbers on the file2 before loading it into GAMAP and saving to the 55-tracer file.

Error -262 encountered when reading a bpch file

The following error was encountered when using GAMAP to read a binary punch file:

Error (-262) reading file ctm.bpch in block 2510
POINT_LUN: Negative position argument not allowed.
Position: -2147068380, Unit:21  File: ctm.bpch

The cause of this error is that the file being read had a size of 2.430 GB. Internal GAMAP routine CTM_READ3DB_HEADER uses a long integer variable to compute the offset in bytes from the start of the file to each individual data block -- that way it can tell GAMAP "skip" to each data block directly. However, the long integer type cannot represent numbers much larger than 2e9. If we try to represent a value larger than this in a long integer variable, the value can "wrap around" to a negative number, which is what caused the error above.

Here is a demonstration of the maximum value of a long integer type in IDL:

IDL> a = 2430111016L

a = 2430111016
     ^
% Long integer constant must be less than 2147483648.

Therefore, if we try to use GAMAP to read any files larger than 2,147,483,648 bytes, we will encounter this type of error.

The solution: you should use GAMAP routine BPCH_SEP to split larger bpch files into smaller ones (perhaps one file for each day). Basically you would call BPCH_SEP repeatedly such as:

bpch_sep, 'ctm.bpch', 'ctm.bpch.20060101'
bpch_sep, 'ctm.bpch', 'ctm.bpch.20060102'
bpch_sep, 'ctm.bpch', 'ctm.bpch.20060103'
...

Then you can use GAMAP to read the smaller bpch files.

Philippe Le Sager has pointed out that some of the newer IDL versions support 64-bit integer (LON64). Perhaps in a future version of GAMAP we will make the file pointer variable of this data type.

--Bob Y. 14:29, 25 June 2008 (EDT)

Graphics output

Creating PDF files

The current version of IDL (v7.x) cannot save directly to Adobe PDF format. The best way to create PDF files from IDL is to first create a PostScript file, and then use the utility ps2pdf to create a PDF file from the PostScript file. Most Unix or Linux distributions should come with a version of ps2pdf already installed.

For example:

IDL> open_device, /ps, bits=8, color, file='myplot.ps'
IDL> plot, findgen(100), color=!myct.black
IDL> close_device
IDL> spawn, 'ps2pdf myplot.ps'

This will create a file named myplot.pdf. The advantage of using PDF files is that they may be displayed from within a web page. Also, PDF files are generally smaller in size than the equivalent PostScript file.

Making movies from GAMAP output

The best way to make movies with GAMAP is to save out individual frames as GIF images, and then use a 3rd-party GIF utility to concatenate those into an animated GIF.

GIFsicle

You can obtain the GIFsicle distribution from http://www.lcdf.org/gifsicle/. When you build gifsicle, the following executables will be created:

gifsicle
Utility to concatenate individual GIFs into an animated GIF. Can also be used to extract individual frames from an animated GIF image.
gifview
A lightweight GIF viewer for X. It can display animated GIFs as slideshows, one frame at a time, or as animations.
gifdiff
Compares two GIF images for identical visual appearance

Using GIFsicle to create an animated GIF from individual GIF's:

gifsicle --delay=10 --loop *.gif > anim.gif 

WhirlGIF

You can obtain WhirlGIF from http://hpux.cs.utah.edu/hppd/hpux/Networking/WWW/whirlgif-3.04/.

Using WhirlGIF to create an animated GIF from individual GIFs:

whirlgif -loop -time 10 -o anim.gif *.gif

ImageMagick

Philippe Le Sager (plesager@seas.harvard.edu) wrote:

You can also use "convert" at the command line to get animated GIF. This is a powerful command line, but the basic for animated gif is:
     convert -delay 20 image.* image.gif
You need to be in the proper directory. The input images are numbered like image.01, image.02, and so on. If the image file names differ too much you will have to explicitly type them, before the output file name, which is last). The delay is the time interval. I think it is in ms.
You can get more info and tips at:

NOTE: GIFsicle, WhirlGIF, and ImageMagick have already been installed on the Harvard Linux machines.

--Bob Y. 15:50, 9 April 2008 (EDT)