Reading binary files in IDL

From Geos-chem
Revision as of 19:59, 1 April 2008 by Bmy (Talk | contribs) (How do I know which type of machine I am using?)

Jump to: navigation, search

"Big Endian" vs. "Little Endian" byte ordering

The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally. Normally each number type is composed of one or more bytes. (A byte consists of 8 bits and can be used to represent 256 distinct values.)

IDL Number Type Fortran Equivalent Number of bytes
byte BYTE 1
fix INTEGER*2 2
long INTEGER*4 4
float REAL*4 4
double REAL*8 8

For all of these number types (except BYTE), you need more than one byte to create integer or floating point numbers in IDL and Fortran.

There are two ways that the bytes can be ordered: from right to left, or from left to right.

  • A Big Endian machine orders the bytes from left to right
    • Think of a car's odometer: digit order is "thousands", "hundreds", "tens", "ones"
  • A Little Endian machine orders the bytes from right to left
    • Reverse of a car's odometer: digit order is "ones", "tens", "hundreds", "thousands"

A good discussion about Big Endian vs. Little Endian may be found here:

Why does it matter?

Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering. However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.

When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing. If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in wrong order and the data you will be interpreted as gibberish.

The GEOS-Chem binary punch file format is an example of unformatted binary files. All binary punch files are Big Endian by default.

How do I know which type of machine I am using?

IDL's LITTLE_ENDIAN function returns 1 if you are on a Little Endian machine, and returns 0 if you are on a Big Endian machine:

 if ( Little_Endian() )                                       $
    then print, "I'm using IDL on a little-endian machine." $
    else print, "I'm using IDL on a big-endian machine."

In general:

  • Machines with Intel or AMD chipsets tend to be Little Endian
    • PC's & Macs
    • SGI Altix
    • Sun X4100
  • Machines with RISC chipsets tend to be Big Endian
    • Sun/SPARC
    • Cray
    • SGI Power Challenge/Origin

I used to be able to read a binary file in IDL but now I can't. What's wrong?

Chances are, you are reading a binary file that was created on a machine with the opposite endian from the machine that you are using. For example, you may be trying to read a GEOS-Chem binary punch file (which is Big Endian) from IDL on a PC or Linux cluster machine (which is Little Endian).

Fortunately, there is a simple way to read binary files of the opposite endian in IDL. If you are using GAMAP's OPEN_FILE command, then be sure to use the SWAP_ENDIAN keyword as follows:

; For opening an existing binary file (or binary-punch file)
Open_File, FileName, Ilun, $
   /Get_LUN, /F77_Unformatted, /Write, Swap_Endian=Little_Endian()

or if you are using IDL's OPENU function directly:

 ; For reading an 
 OpenU, Ilun, FileName, /Get_Lun, Swap_Endian=Little_Endian()

This will tell IDL to byte-swap the data when reading from the binary file.

--Bob Yantosca 15:49, 1 April 2008 (EDT)