File I/O with GAMAP

From Geos-chem
Revision as of 19:35, 17 March 2010 by Ccarouge (Talk | contribs) (Renaming and Regridding APROD/GPROD restart files)

Jump to: navigation, search

On this page we list some tips & tricks for reading and writing files with GAMAP. Please also see the following pages:

--Bob Y. 16:06, 26 November 2008 (EST)

Binary punch (bpch) file I/O

Reading binary punch files directly from IDL

The best way to read a GEOS-Chem binary punch file is by using the GAMAP main program gamap.pro. More experienced users may also use one of the lower-level routines (e.g. CTM_GET_DATA or CTM_GET_DATABLOCK) to read the file. However, if for any reason you need to read a binary punch file directly from IDL (i.e. without using GAMAP, CTM_GET_DATA, or CTM_GET_DATABLOCK), then you have a couple of options:

  1. The GAMAP routine BPCH_TEST (which is located in the file_io subdirectory of the main GAMAP directory) can be used to quickly parse a binary punch file. BPCH_TEST prints out the header information from the file as well as the min and max data values. This is very useful for debugging. You can look at the structure of the file gamap2/file_io/bpch_test.pro to see how the binary punch file is read.
  2. If you are more accustomed to working with R, GrADS, Matlab, or any other visualization package that takes netCDF files as input, then you might want to use the GAMAP routines BPCH2NC or BPCH2COARDS to first convert the binary punch file into netCDF format.

--Bob Y. 13:28, 23 May 2008 (EDT)

Splitting and joining binary punch files

GAMAP comes with two routines for working with GEOS-Chem binary punch file output directly. Routine BPCH_SEP will extract a data block from a big binary punch file and save it to a separate file:

Colette Heald (heald@atmos.colostate.edu) wrote:

I did a year-long run starting from 20011201 and all the output appears to have saved in one file labeled ctm.bpch.v7-04-13-tra54.2001120100
So first off I'm wondering, is there a way to save each monthly mean as a separate ctm.bpch file? Secondly, the output file won't read into gamap - I thought the file might be corrupt, but actually it seems to read in fine with bpch_test. When I do gamap, fi='<filename>', I get this error:
   Error (-261) reading file
   ctm.bpch.v7-04-13-tra54.2001120100
    in block 3912
   POINT_LUN: Negative postition argument not allowed. Position: -2147377716,
   Unit: 21
   File: ctm.bpch.v7-04-13-tra54.2001120100
I get the same error when I try to use ctm_get_datablock to pull out one tracer. Do you think the file is salvageable? Have you seen this problem before? I'm wondering if the file is just too big...

Bob Yantosca (yantosca@seas.harvard.edu) replied:

Right now the code can't save to separate bpch files. However, you can split them up wtih bpch_sep.pro into individual files in post-processing. For example:
   pro split

      ; Splits into monthly bpch files (change as necesary)
  
      ; The big input file
      InFile = 'ctm.bpch.v8-01-01-geos5-Run0.2005010100'
      ctm_cleanup

      ; Split the data in the big file for 20050101 into a separate file
      bpch_sep, InFile, 'ctm.bpch.v8-01-01-geos5-Run0.20050101', $
         tau=nymd2tau( 20050101 )
   end

GAMAP routine BPCH_LINK will concatenate data blocks from several smaller bpch files and save them into a large bpch file.

   ; Consolidates data from the 'ctm.bpch.*' files
   ; into a single file named 'new.ctm.bpch'
   BPCH_LINK, 'ctm.bpch.*', 'new.ctm.bpch'

--Bmy 16:53, 22 April 2008 (EDT)

Renaming and Regridding APROD/GPROD restart files

When using secondary aerosols, you need a APROD/GPROD restart file:

  • restart_aprod_gprod.YYYYMMDDhh for GEOS-Chem versions v7-04-11 to v8-02-04
  • soaprod.YYYYMMDDhh for GEOS-Chem versions v8-03-01 and after

Unlike the general restart file used for GEOS-Chem runs, it cannot be simply renamed and ready to use for another simulation date.

Requisite for versions v7-04-11 to v8-02-04

Before processing the restart_aprod_gprod.YYYYMMDDhh files with the Gamap package, be sure that your diaginfo.dat and your tracerinfo.dat have the following:

         in diaginfo.dat:
         
           6000 IJ-GPROD                                 GPROD for SOA partitioning
           6000 IJ-APROD                                 APROD for SOA partitioning
         
         
         in tracerinfo.dat:
         
         #==============================================================================
         # OFFSET=9000: SOA PARTITION fractions (IJ-GPROD and IJ-APROD)
         #==============================================================================
         # NAME ]x[FULLNAME                    ][   MOLWT][C][ TRACER][   SCALE]x[UNIT
         PROD01   PROD01                         1.000E+00  1     6001 1.000E+00 kg/kg
         PROD02   PROD02                         1.000E+00  1     6002 1.000E+00 kg/kg
         PROD03   PROD03                         1.000E+00  1     6003 1.000E+00 kg/kg
         PROD04   PROD04                         1.000E+00  1     6004 1.000E+00 kg/kg
         PROD05   PROD05                         1.000E+00  1     6005 1.000E+00 kg/kg
         PROD06   PROD06                         1.000E+00  1     6006 1.000E+00 kg/kg
         PROD07   PROD07                         1.000E+00  1     6007 1.000E+00 kg/kg
         PROD08   PROD08                         1.000E+00  1     6008 1.000E+00 kg/kg
         PROD09   PROD09                         1.000E+00  1     6009 1.000E+00 kg/kg
         PROD10   PROD10                         1.000E+00  1     6010 1.000E+00 kg/kg
         PROD11   PROD11                         1.000E+00  1     6011 1.000E+00 kg/kg
         PROD12   PROD12                         1.000E+00  1     6012 1.000E+00 kg/kg
         PROD13   PROD13                         1.000E+00  1     6013 1.000E+00 kg/kg
         PROD14   PROD14                         1.000E+00  1     6014 1.000E+00 kg/kg
         PROD15   PROD15                         1.000E+00  1     6015 1.000E+00 kg/kg
         PROD16   PROD16                         1.000E+00  1     6016 1.000E+00 kg/kg
         PROD17   PROD17                         1.000E+00  1     6017 1.000E+00 kg/kg
         PROD18   PROD18                         1.000E+00  1     6018 1.000E+00 kg/kg
         

Requisite for versions v8-03-01 and after

Before processing the soaprod.YYYYMMDDhh files with the Gamap package, be sure that your diaginfo.dat and your tracerinfo.dat have the following:

         in diaginfo.dat:
         
           6000 SOAGPROD                                 GPROD for SOA partitioning
           6000 SOAAPROD                                 APROD for SOA partitioning
         
         
         in tracerinfo.dat:
         
         #==============================================================================
         # SOA GPROD & APROD [kg/kg]
         #==============================================================================
         PROD01   PROD01                         1.000E+00  1     6001 1.000E+00 kg/kg
         PROD02   PROD02                         1.000E+00  1     6002 1.000E+00 kg/kg
         PROD03   PROD03                         1.000E+00  1     6003 1.000E+00 kg/kg
         PROD04   PROD04                         1.000E+00  1     6004 1.000E+00 kg/kg
         PROD05   PROD05                         1.000E+00  1     6005 1.000E+00 kg/kg
         PROD06   PROD06                         1.000E+00  1     6006 1.000E+00 kg/kg
         PROD07   PROD07                         1.000E+00  1     6007 1.000E+00 kg/kg
         PROD08   PROD08                         1.000E+00  1     6008 1.000E+00 kg/kg
         PROD09   PROD09                         1.000E+00  1     6009 1.000E+00 kg/kg
         PROD10   PROD10                         1.000E+00  1     6010 1.000E+00 kg/kg
         PROD11   PROD11                         1.000E+00  1     6011 1.000E+00 kg/kg
         PROD12   PROD12                         1.000E+00  1     6012 1.000E+00 kg/kg
         PROD13   PROD13                         1.000E+00  1     6013 1.000E+00 kg/kg
         PROD14   PROD14                         1.000E+00  1     6014 1.000E+00 kg/kg
         PROD15   PROD15                         1.000E+00  1     6015 1.000E+00 kg/kg
         PROD16   PROD16                         1.000E+00  1     6016 1.000E+00 kg/kg
         PROD17   PROD17                         1.000E+00  1     6017 1.000E+00 kg/kg
         PROD18   PROD18                         1.000E+00  1     6018 1.000E+00 kg/kg
         PROD19   PROD19                         1.000E+00  1     6019 1.000E+00 kg/kg
         PROD20   PROD20                         1.000E+00  1     6020 1.000E+00 kg/kg
         PROD21   PROD21                         1.000E+00  1     6021 1.000E+00 kg/kg
         PROD22   PROD22                         1.000E+00  1     6022 1.000E+00 kg/kg
         PROD23   PROD23                         1.000E+00  1     6023 1.000E+00 kg/kg
         PROD24   PROD24                         1.000E+00  1     6024 1.000E+00 kg/kg
         PROD25   PROD25                         1.000E+00  1     6025 1.000E+00 kg/kg
         PROD26   PROD26                         1.000E+00  1     6026 1.000E+00 kg/kg
         PROD27   PROD27                         1.000E+00  1     6027 1.000E+00 kg/kg
         PROD28   PROD28                         1.000E+00  1     6028 1.000E+00 kg/kg
        

Renaming APROD/GPROD restart files

You must rewrite your restart_gprod_aprod.YYYYMMDDhh or soaprod.YYYYMMDDhh so that the date in the filename is the same as the one in the datablock headers. A new routine in GAMAP v2.12 will do it for you: ../gamap2/date_time/rewrite_agprod.pro

Regridding APROD/GPROD restart files

You can regrid a 2ndary aerosol restart file with the same routines used to regrid the standard restart file. However you need to tell the routines to regrid all tracers with the keyword diagn=0 (which means "all tracers"):

regridh_restart, ..., diagn=0
regridv_restart, ..., diagn=0

--phs 14:51, 4 April 2008 (EDT)

Combining output from timeseries files

Ray Nassar (ray.nassar@utoronto.ca) wrote:

I just have a quick question, is there any GAMAP routine that can average all hourly data blocks in a timeseries file to make a single daily average bpch file?
It appears like I could use the average option of ctm_sum.pro but I do not quite understand where the averaged data goes and afterwards would still have to write to bpch.

Philippe Le Sager (plesager@seas.harvard.edu) replied:

Check the
   /gamap2/timeseries/gc_combine_nd49.pro
   /gamap2/timeseries/gc_combine_nd48.pro 
routines, which combines daily bpch files (nd48 & nd49 output, but also met field input files) into 4D data blocks. It has many options. You can extract a subset of data according to time and/or location, or process the data (moving average, daily max, shift to local time). You can either save the data into a new bpch file or just get an array of data in output.
I wrote a tutorial that gives some pointers here.
-Philippe

--Bob Y. 09:43, 1 April 2008 (EDT)

Under certain circumstances, a minor bug can occur in routine gc_combine_nd49.pro. See this description on our GAMAP bugs & fixes page for more information. This bug is slated for release in the next GAMAP release (v2-13).

--Bob Y. 13:48, 8 September 2008 (EDT)

Adding tracers to a restart file

Duncan Fairlie (t.d.fairlie@nasa.gov) wrote:

I need to construct a new restart file with 55 tracers from an existing full chem (43 tracer) restart file. The extra 12 tracers will include dust-nitrate, dust-sulfate, and dust-alkalinity (4 tracers each). I just want to start them uniform 1.0e-15 or so.
I can probably figure out how to do this, but wondered if you have something off the shelf that can be used to add several blank tracers to an existing restart file .

Bob Yantosca (yantosca@seas.harvard.edu) replied:

Maybe the easiest way to do this is to use GAMAP and BPCH_RENUMBER. You can load 2 files

   gamap, file=file1
   gamap, file=file2
   gamap, /nofile

then type s1-55 (for all 55 data blocks) and this will save out all data blocks to a new file (you'll be prompted for the name).

Then you can hack into IDL code bpch_renumber.pro to renumber some of the tracer numbers. Or even better yet, renumber the tracer numbers on the file2 before loading it into GAMAP and saving to the 55-tracer file.

Error -262 encountered when reading a bpch file

The following error was encountered when using GAMAP to read a binary punch file:

Error (-262) reading file ctm.bpch in block 2510
POINT_LUN: Negative position argument not allowed.
Position: -2147068380, Unit:21  File: ctm.bpch

The cause of this error is that the file being read had a size of 2.430 GB. Internal GAMAP routine CTM_READ3DB_HEADER uses a long integer variable to compute the offset in bytes from the start of the file to each individual data block -- that way it can tell GAMAP "skip" to each data block directly. However, the long integer type cannot represent numbers much larger than 2e9. If we try to represent a value larger than this in a long integer variable, the value can "wrap around" to a negative number, which is what caused the error above.

Here is a demonstration of the maximum value of a long integer type in IDL:

IDL> a = 2430111016L

a = 2430111016
     ^
% Long integer constant must be less than 2147483648.

Therefore, if we try to use GAMAP to read any files larger than 2,147,483,648 bytes, we will encounter this type of error.

The solution: you should use GAMAP routine BPCH_SEP to split larger bpch files into smaller ones (perhaps one file for each day). Basically you would call BPCH_SEP repeatedly such as:

bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060101 )
bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060102 )
bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060103 )
...

The %date% token in the output filename will be replaced with the YYYYMMDD date. The above commands will create output files:

ctm.bpch.20060101
ctm.bpch.20060102
ctm.bpch.20060103
...

which will contain all of the data blocks for dates 20060101, 20060102, 20060103, ...

Then you can use GAMAP to read the smaller bpch files. Philippe Le Sager has pointed out that some of the newer IDL versions support 64-bit integer (LON64). Perhaps in a future version of GAMAP we will make the file pointer variable of this data type. However we must be careful not to make a modification that would "break" the existing functionality of GAMAP for those users who may only have the 32-bit version of IDL.

--Bob Y. 09:23, 25 November 2008 (EST)

Endian issues when reading/writing binary punch files

Rynda Hudman (hudman@berkeley.edu) wrote:

Recently, something seemed to change w/r/t reading binary files and writing with ctm_writebpch.pro (on prometheus.as.harvard.edu). I [finally] realized it was something to do with swap_endian from little to big. What happened?

Bob Yantosca (yantosca@seas.harvard.edu) wrote:

Are you using the latest GAMAP? In GAMAP v2-12 we updated all routines that read from binary files to add the
  Open_File, ..., Swap_Endian=Little_Endian(), ...
keyword. This will do the byteswapping if you are reading/writing bpch on a little-endian machine. All of the machines now at Harvard (including prometheus.as.harvard.edu) are little-endian now.
If you were using a version prior to v2-12, then some routines may not have had this fix. So that might have been the problem. Updating your GAMAP should fix it.

--Bob Y. 09:33, 19 November 2008 (EST)

GEOS-Chem met field I/O

Using test_met.pro to check met field files

Moeko Yoshitomi (moeko@fas.harvard.edu) wrote:

I am afraid of that the GCAP met files have some weird values at GMT date & time 2047/07/26 00:00. Is there a way to directly read the GCAP meteorology files without using GAMAP?

Bob Yantosca (yantosca@seas.harvard.edu) replied:

You can read any met field file (GCAP or GEOS) with the GAMAP routine:
   test_met, filename='20470726.a6.4x5'
This will print out the min & max of the data (see example below), which is very useful to see if there are corrupted values, as are listed below. You can also get a plot of the met fields for a given level by using the PLOTLEVEL keyword.
   20470726 030000 T             173.094833     314.135101
   20470726 030000 U            -125.676346     170.471573
   20470726 030000 V            -119.282471     103.119499
   20470726 030000 OPTDEPTH        0.000000      33.720524
   20470726 030000 CLDF            0.000000       0.969475
   20470726 030000 MOISTQ        -46.055706***************
   20470726 030000 Q               0.000000***************
   20470726 030000 UPDE            0.000000       0.578976
   20470726 030000 DNDE           -0.252207      -0.000000
   20470726 030000 ENTRAIN         0.000000       0.177645
   20470726 030000 DETRAINE        0.000000       0.557239
   20470726 030000 UPDN            0.000000       0.225111
   20470726 030000 DNDN           -0.056653      -0.000000
   20470726 030000 DETRAINN        0.000000       0.140580
It looks like there is definitely a problem in that day in the met data, in the MOISTQ and Q fields. If you use the /EXPFORMAT keyword to print out data in exponential rather than floating point format you get:
   test_met, filename='20470726.a6.4x5', /expformat

   20470726 030000 MOISTQ     -4.605571e+01   1.869630e+34
   20470726 030000 Q           8.480171e-33   7.736141e+34

so it looks like you have ~1e34 values for MOISTQ and Q. This can be a floating-point problem, and is near to the maximum representable value for a REAL*4 variable. My guess is that there was a problem in the original regridding of the met data.

--Bob Y. 10:02, 29 May 2009 (EDT)

HDF5 and HDF5-EOS file I/O

H5_BROWSER

IDL 6 and 7 has a function called H5_BROWSER which lets you look at the contents of an HDF5 file interactively:

A = H5_BROWSER( FILE )

This will pop open a dialog box which allows you to navigate through the file's contents. If you click on a dataset in the file, H5_BROWSER will show a quick plot, so that you can see at a glance if the data inthe file matches your expectations.

It is recommended to use H5_BROWSER to examine an HDF file before reading it. You can quickly look up the names of all the variables that you want to extract from the file ahead of time.

Reading data from an HDF5 or HDF5-EOS file

The basic way you read data from an HDF5 file is as follows:

; Specify the file name (it's a GOME file w/ a long name)
; Technically this is an HDF-EOS5 file but we can read it as if it were an HDF5 file!
FILE = 'GOME_xxx_1B_M02_20070501062959Z_20070501081159Z_R_O_20080615193546Z.337p4_356p1.brs.hcho.he5'

; First make sure the file is a HDF5 or HDF5-EOS file
IF ( H5F_IS_HDF5( FILE ) eq 0 ) then MESSAGE, 'Not an HDF-5 file!'

; Open the HDF file and get the file ID # (FID)
fId = H5F_OPEN( FILE ) 
IF ( fId lt 0 ) then MESSAGE, 'Error opening file!'

;----------------------------------------------------------------------------------
; These next 3 commands are contained in convenience function
; "read_h5dataset.pro", to be released in GAMAP v2-13!!!

; Get the dataset ID (dId) for the variable we want to read
; NOTE: "Column" is the HDF5-EOS swath name & "CloudFraction" is the variable name!
dId = H5D_OPEN( fid, "/Column/CloudFraction" )

; Read the data 
DATA = H5D_READ( dId )

; Free the dataset Id
H5D_CLOSE, dId
;----------------------------------------------------------------------------------

; Close the file and free the file Id
H5_CLOSE, fId

However, if you are reading a lot of data from disk, or are reading many HDF5 or HDF5-EOS files, it may be more efficient to read them with Fortran.

--Bob Y. 10:30, 29 May 2009 (EDT)

netCDF file I/O

Requirements as of GAMAP v2.13

For Gamap to recognize a netCDF file, the file must have a suffix nc or NC or ncdf or NCDF. To see the datablocks, categories (ie diagnostics) and tracer names must be in diaginfo.dat and tracerinfo.dat resource files, respectively.

To convert from GEOS-Chem bpch file format to netCDF, we suggest to use BPCH2COARDS. It is COARDS compliant. To convert several files at once, MAKE_MULTI_NC is a convenient wrapper available in GAMAP v2.13. It can travel directories recursively, and automatically recreates the input directory tree in the output directory if needed.

Note that, when writing netCDF, some characters require special handling (like $ for example): BPCH2COARDS replaces them with an underscore. Then, to read the converted files back into GAMAP, underscores must be converted back to their original character.

The later operation is done in CRC_Get_Tracer subroutine of CTM_READ_COARDS. In GAMAP v2.13, few tracer names (air density, tropopause levels) have been added to the list of names to process. The list is not exhaustive. Several warnings in CTM_READ_COARDS and CTM_TRACERINFO have been added in GAMAP v2.13 to help you to quickly find the name(s) that need to be added to the to-be-processed list in CRC_Get_Tracer, were you unable to read netCDF files you created with GAMAP.

--phs 11:25, 9 October 2009 (EDT)