File I/O with GAMAP
On this page we list some tips & tricks for reading and writing files with GAMAP. Please also see the following pages:
- General GAMAP usage
- Color and graphics with GAMAP
- Regridding with GAMAP
- Date and time computations with GAMAP
- Text manipulation with GAMAP
--Bob Y. 16:06, 26 November 2008 (EST)
- 1 Binary punch (bpch) file I/O
- 1.1 Reading binary punch files directly from IDL
- 1.2 Splitting and joining binary punch files
- 1.3 Renaming and Regridding APROD/GPROD restart files
- 1.4 Combining output from timeseries files
- 1.5 Adding tracers to a restart file
- 1.6 Error -262 encountered when reading a bpch file
- 1.7 Endian issues when reading/writing binary punch files
- 2 GEOS-Chem met field I/O
- 3 HDF5 and HDF5-EOS file I/O
- 4 netCDF file I/O
Binary punch (bpch) file I/O
Reading binary punch files directly from IDL
The best way to read a GEOS-Chem binary punch file is by using the GAMAP main program gamap.pro. More experienced users may also use one of the lower-level routines (e.g. CTM_GET_DATA or CTM_GET_DATABLOCK) to read the file. However, if for any reason you need to read a binary punch file directly from IDL (i.e. without using GAMAP, CTM_GET_DATA, or CTM_GET_DATABLOCK), then you have a couple of options:
- The GAMAP routine BPCH_TEST (which is located in the file_io subdirectory of the main GAMAP directory) can be used to quickly parse a binary punch file. BPCH_TEST prints out the header information from the file as well as the min and max data values. This is very useful for debugging. You can look at the structure of the file gamap2/file_io/bpch_test.pro to see how the binary punch file is read.
- If you are more accustomed to working with R, GrADS, Matlab, or any other visualization package that takes netCDF files as input, then you might want to use the GAMAP routines BPCH2NC or BPCH2COARDS to first convert the binary punch file into netCDF format.
--Bob Y. 13:28, 23 May 2008 (EDT)
Splitting and joining binary punch files
GAMAP comes with two routines for working with GEOS-Chem binary punch file output directly. Routine BPCH_SEP will extract a data block from a big binary punch file and save it to a separate file:
Colette Heald (email@example.com) wrote:
- I did a year-long run starting from 20011201 and all the output appears to have saved in one file labeled ctm.bpch.v7-04-13-tra54.2001120100
- So first off I'm wondering, is there a way to save each monthly mean as a separate ctm.bpch file? Secondly, the output file won't read into gamap - I thought the file might be corrupt, but actually it seems to read in fine with bpch_test. When I do gamap, fi='<filename>', I get this error:
Error (-261) reading file ctm.bpch.v7-04-13-tra54.2001120100 in block 3912 POINT_LUN: Negative postition argument not allowed. Position: -2147377716, Unit: 21 File: ctm.bpch.v7-04-13-tra54.2001120100
- I get the same error when I try to use ctm_get_datablock to pull out one tracer. Do you think the file is salvageable? Have you seen this problem before? I'm wondering if the file is just too big...
Bob Yantosca (firstname.lastname@example.org) replied:
- Right now the code can't save to separate bpch files. However, you can split them up wtih bpch_sep.pro into individual files in post-processing. For example:
pro split ; Splits into monthly bpch files (change as necesary) ; The big input file InFile = 'ctm.bpch.v8-01-01-geos5-Run0.2005010100' ctm_cleanup ; Split the data in the big file for 20050101 into a separate file bpch_sep, InFile, 'ctm.bpch.v8-01-01-geos5-Run0.20050101', $ tau=nymd2tau( 20050101 ) end
GAMAP routine BPCH_LINK will concatenate data blocks from several smaller bpch files and save them into a large bpch file.
; Consolidates data from the 'ctm.bpch.*' files ; into a single file named 'new.ctm.bpch' BPCH_LINK, 'ctm.bpch.*', 'new.ctm.bpch'
--Bmy 16:53, 22 April 2008 (EDT)
Renaming and Regridding APROD/GPROD restart files
When using secondary aerosols, you need a restart_aprod_gprod.YYYYMMDDhh file. Unlike the general restart file used for GEOS-Chem runs, it cannot be simply renamed and ready to use for another simulation date.
Renaming APROD/GPROD restart files
You must rewrite you restart_gprod_aprod.YYYYMMDDhh so that the date in the filename is the same as the one in the datablock headers. A new routine in GAMAP v2.12 will do it for you: ../gamap2/date_time/rewrite_agprod.pro
Regridding APROD/GPROD restart files
You can regrid a 2ndary aerosol restart file with the same routines used to regrid the standard restart file. However you need to tell the routines to regrid all tracers with the keyword diagn=0 (which means "all tracers"):
regridh_restart, ..., diagn=0 regridv_restart, ..., diagn=0
--phs 14:51, 4 April 2008 (EDT)
Combining output from timeseries files
Ray Nassar (email@example.com) wrote:
- I just have a quick question, is there any GAMAP routine that can average all hourly data blocks in a timeseries file to make a single daily average bpch file?
- It appears like I could use the average option of ctm_sum.pro but I do not quite understand where the averaged data goes and afterwards would still have to write to bpch.
Philippe Le Sager (firstname.lastname@example.org) replied:
- Check the
- routines, which combines daily bpch files (nd48 & nd49 output, but also met field input files) into 4D data blocks. It has many options. You can extract a subset of data according to time and/or location, or process the data (moving average, daily max, shift to local time). You can either save the data into a new bpch file or just get an array of data in output.
- I wrote a tutorial that gives some pointers here.
--Bob Y. 09:43, 1 April 2008 (EDT)
Under certain circumstances, a minor bug can occur in routine gc_combine_nd49.pro. See this description on our GAMAP bugs & fixes page for more information. This bug is slated for release in the next GAMAP release (v2-13).
--Bob Y. 13:48, 8 September 2008 (EDT)
Adding tracers to a restart file
Duncan Fairlie (email@example.com) wrote:
- I need to construct a new restart file with 55 tracers from an existing full chem (43 tracer) restart file. The extra 12 tracers will include dust-nitrate, dust-sulfate, and dust-alkalinity (4 tracers each). I just want to start them uniform 1.0e-15 or so.
- I can probably figure out how to do this, but wondered if you have something off the shelf that can be used to add several blank tracers to an existing restart file .
Bob Yantosca (firstname.lastname@example.org) replied:
Maybe the easiest way to do this is to use GAMAP and BPCH_RENUMBER. You can load 2 files
gamap, file=file1 gamap, file=file2 gamap, /nofile
then type s1-55 (for all 55 data blocks) and this will save out all data blocks to a new file (you'll be prompted for the name).
Then you can hack into IDL code bpch_renumber.pro to renumber some of the tracer numbers. Or even better yet, renumber the tracer numbers on the file2 before loading it into GAMAP and saving to the 55-tracer file.
Error -262 encountered when reading a bpch file
The following error was encountered when using GAMAP to read a binary punch file:
Error (-262) reading file ctm.bpch in block 2510 POINT_LUN: Negative position argument not allowed. Position: -2147068380, Unit:21 File: ctm.bpch
The cause of this error is that the file being read had a size of 2.430 GB. Internal GAMAP routine CTM_READ3DB_HEADER uses a long integer variable to compute the offset in bytes from the start of the file to each individual data block -- that way it can tell GAMAP "skip" to each data block directly. However, the long integer type cannot represent numbers much larger than 2e9. If we try to represent a value larger than this in a long integer variable, the value can "wrap around" to a negative number, which is what caused the error above.
Here is a demonstration of the maximum value of a long integer type in IDL:
IDL> a = 2430111016L a = 2430111016 ^ % Long integer constant must be less than 2147483648.
Therefore, if we try to use GAMAP to read any files larger than 2,147,483,648 bytes, we will encounter this type of error.
The solution: you should use GAMAP routine BPCH_SEP to split larger bpch files into smaller ones (perhaps one file for each day). Basically you would call BPCH_SEP repeatedly such as:
bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060101 ) bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060102 ) bpch_sep, 'ctm.bpch', 'ctm.bpch.%date%', tau0=nymd2tau( 20060103 ) ...
The %date% token in the output filename will be replaced with the YYYYMMDD date. The above commands will create output files:
ctm.bpch.20060101 ctm.bpch.20060102 ctm.bpch.20060103 ...
which will contain all of the data blocks for dates 20060101, 20060102, 20060103, ...
Then you can use GAMAP to read the smaller bpch files. Philippe Le Sager has pointed out that some of the newer IDL versions support 64-bit integer (LON64). Perhaps in a future version of GAMAP we will make the file pointer variable of this data type. However we must be careful not to make a modification that would "break" the existing functionality of GAMAP for those users who may only have the 32-bit version of IDL.
--Bob Y. 09:23, 25 November 2008 (EST)
Endian issues when reading/writing binary punch files
Rynda Hudman (email@example.com) wrote:
- Recently, something seemed to change w/r/t reading binary files and writing with ctm_writebpch.pro (on prometheus.as.harvard.edu). I [finally] realized it was something to do with swap_endian from little to big. What happened?
Bob Yantosca (firstname.lastname@example.org) wrote:
- Are you using the latest GAMAP? In GAMAP v2-12 we updated all routines that read from binary files to add the
Open_File, ..., Swap_Endian=Little_Endian(), ...
- keyword. This will do the byteswapping if you are reading/writing bpch on a little-endian machine. All of the machines now at Harvard (including prometheus.as.harvard.edu) are little-endian now.
- If you were using a version prior to v2-12, then some routines may not have had this fix. So that might have been the problem. Updating your GAMAP should fix it.
--Bob Y. 09:33, 19 November 2008 (EST)
GEOS-Chem met field I/O
Using test_met.pro to check met field files
Moeko Yoshitomi (email@example.com) wrote:
- I am afraid of that the GCAP met files have some weird values at GMT date & time 2047/07/26 00:00. Is there a way to directly read the GCAP meteorology files without using GAMAP?
Bob Yantosca (firstname.lastname@example.org) replied:
- You can read any met field file (GCAP or GEOS) with the GAMAP routine:
- This will print out the min & max of the data (see example below), which is very useful to see if there are corrupted values, as are listed below. You can also get a plot of the met fields for a given level by using the PLOTLEVEL keyword.
20470726 030000 T 173.094833 314.135101 20470726 030000 U -125.676346 170.471573 20470726 030000 V -119.282471 103.119499 20470726 030000 OPTDEPTH 0.000000 33.720524 20470726 030000 CLDF 0.000000 0.969475 20470726 030000 MOISTQ -46.055706*************** 20470726 030000 Q 0.000000*************** 20470726 030000 UPDE 0.000000 0.578976 20470726 030000 DNDE -0.252207 -0.000000 20470726 030000 ENTRAIN 0.000000 0.177645 20470726 030000 DETRAINE 0.000000 0.557239 20470726 030000 UPDN 0.000000 0.225111 20470726 030000 DNDN -0.056653 -0.000000 20470726 030000 DETRAINN 0.000000 0.140580
- It looks like there is definitely a problem in that day in the met data, in the MOISTQ and Q fields. If you use the /EXPFORMAT keyword to print out data in exponential rather than floating point format you get:
test_met, filename='20470726.a6.4x5', /expformat 20470726 030000 MOISTQ -4.605571e+01 1.869630e+34 20470726 030000 Q 8.480171e-33 7.736141e+34
so it looks like you have ~1e34 values for MOISTQ and Q. This can be a floating-point problem, and is near to the maximum representable value for a REAL*4 variable. My guess is that there was a problem in the original regridding of the met data.
--Bob Y. 10:02, 29 May 2009 (EDT)
HDF5 and HDF5-EOS file I/O
IDL 6 and 7 has a function called H5_BROWSER which lets you look at the contents of an HDF5 file interactively:
A = H5_BROWSER( FILE )
This will pop open a dialog box which allows you to navigate through the file's contents. If you click on a dataset in the file, H5_BROWSER will show a quick plot, so that you can see at a glance if the data inthe file matches your expectations.
It is recommended to use H5_BROWSER to examine an HDF file before reading it. You can quickly look up the names of all the variables that you want to extract from the file ahead of time.
Reading data from an HDF5 or HDF5-EOS file
The basic way you read data from an HDF5 file is as follows:
; Specify the file name (it's a GOME file w/ a long name) ; Technically this is an HDF-EOS5 file but we can read it as if it were an HDF5 file! FILE = 'GOME_xxx_1B_M02_20070501062959Z_20070501081159Z_R_O_20080615193546Z.337p4_356p1.brs.hcho.he5' ; First make sure the file is a HDF5 or HDF5-EOS file IF ( H5F_IS_HDF5( FILE ) eq 0 ) then MESSAGE, 'Not an HDF-5 file!' ; Open the HDF file and get the file ID # (FID) fId = H5F_OPEN( FILE ) IF ( fId lt 0 ) then MESSAGE, 'Error opening file!' ;---------------------------------------------------------------------------------- ; These next 3 commands are contained in convenience function ; "read_h5dataset.pro", to be released in GAMAP v2-13!!! ; Get the dataset ID (dId) for the variable we want to read ; NOTE: "Column" is the HDF5-EOS swath name & "CloudFraction" is the variable name! dId = H5D_OPEN( fid, "/Column/CloudFraction" ) ; Read the data DATA = H5D_READ( dId ) ; Free the dataset Id H5D_CLOSE, dId ;---------------------------------------------------------------------------------- ; Close the file and free the file Id H5_CLOSE, fId
However, if you are reading a lot of data from disk, or are reading many HDF5 or HDF5-EOS files, it may be more efficient to read them with Fortran.
--Bob Y. 10:30, 29 May 2009 (EDT)