Reading binary files in IDL
Contents
"Big Endian" vs. "Little Endian" byte ordering
The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally. Normally each number type is composed of one or more bytes. (A byte consists of 8 bits and can be used to represent 256 distinct values.)
IDL Number Type | Fortran Equivalent | Number of bytes |
---|---|---|
byte | BYTE | 1 |
fix | INTEGER*2 | 2 |
long | INTEGER*4 | 4 |
float | REAL*4 | 4 |
double | REAL*8 | 8 |
For all of these number types (except BYTE), you need more than one byte to create integer or floating point numbers in IDL and Fortran.
There are two ways that the bytes can be ordered: from right to left, or from left to right.
- A Big Endian machine orders the bytes from left to right (think: a car's odometer)
- A Little Endian machine orders the bytes from right to left (reverse of a car's odometer)
A good discussion about Big Endian vs. Little Endian may be found here: [1]
Why does it matter?
Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering. However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.
When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing. If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in wrong order and the data you are reading from disk will be interpreted as gibberish.
The GEOS-Chem binary punch file format is an example of unformatted binary files. All binary punch files are Big Endian by default.
How do I know which type of machine I am using?
IDL's LITTLE_ENDIAN function returns 1 if you are on a Little Endian machine, and returns 0 if you are on a Big Endian machine:
if ( Little_Endian() ) $ then print, "I'm running IDL on a little-endian machine." $ else print, "I'm running IDL on a big-endian machine."
In general:
- Machines with Intel or AMD chipsets tend to be Little Endian
- PC's & Macs
- SGI Altix
- Sun X4100
- Machines with RISC chipsets tend to be Big Endian
- Sun/SPARC
- Cray
- SGI Power Challenge/Origin
I used to be able to read a binary file in IDL but now I can't. What's wrong?
Chances are, you are reading a GEOS-Chem binary punch file