Reading binary files in IDL

From Geos-chem
Jump to: navigation, search

"Big Endian" vs. "Little Endian" byte ordering

The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally. Normally each number type is composed of one or more bytes. (A byte consists of 8 bits and can be used to represent 256 distinct values.)

IDL Number Type Fortran Equivalent Number of bytes
byte BYTE 1
fix INTEGER*2 2
long INTEGER*4 4
float REAL*4 4
double REAL*8 8

For all of these number types (except BYTE), you need more than one byte to create integer or floating point numbers in IDL and Fortran.

There are two ways that the bytes can be ordered: from right to left, or from left to right.

  • A Big Endian machine orders the bytes (in descending order) from left to right
    • Think of a car's odometer: digit order is "ten-thousands", "thousands", "hundreds", "tens", "ones"
  • A Little Endian machine orders the bytes (in ascending order) from right to left
    • Reverse of a car's odometer: digit order is "ones", "tens", "hundreds", "thousands", "ten-thousands"

A good discussion about Big Endian vs. Little Endian may be found here: http://www.dfanning.com/tips/endian_machines.html

Why does it matter?

Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering. However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.

When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing. If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in wrong order and the data you will be interpreted as gibberish.

The GEOS-Chem binary punch file format is an example of unformatted binary files. All binary punch files are Big Endian by default.

How do I know which type of machine I am using?

The LITTLE_ENDIAN function that ships with the GAMAP package returns 1 if you are on a Little Endian machine, and returns 0 if you are on a Big Endian machine:

 if ( Little_Endian() )                                     $
    then print, "I'm using IDL on a little-endian machine." $
    else print, "I'm using IDL on a big-endian machine."

In general:

  • Machines with Intel or AMD chipsets tend to be Little Endian
    • PC's & Macs
    • SGI Altix
    • Sun X4100
  • Machines with RISC chipsets tend to be Big Endian
    • Sun/SPARC
    • Cray
    • SGI Power Challenge/Origin

It is pretty safe to say that most of the newer machines being built today are Little Endian.

I used to be able to read a binary file in IDL but now I can't. What's wrong?

Chances are that you are reading a binary file that was created on a machine with the opposite endian from the machine that you are using. For example, you may be trying to read a GEOS-Chem binary punch file (which is Big Endian) from IDL on a PC or Linux cluster machine (which is Little Endian).

Fortunately, there is a simple way to read binary files of the opposite endian in IDL. If you are using GAMAP's OPEN_FILE command, then be sure to use the SWAP_ENDIAN keyword as follows:

; Open a binary file for reading
Open_File, FileName, Ilun, $
   /Get_LUN, /F77_Unformatted, Swap_Endian=Little_Endian()

; Open a binary file for writing
Open_File, FileName, Ilun, $
   /Get_LUN, /F77_Unformatted, /Write, Swap_Endian=Little_Endian()

or if you are using IDL's OPENR and OPENW functions directly:

; Open a binary file for reading
OpenR, Ilun, FileName, /Get_Lun, Swap_Endian=Little_Endian()

; Open a binary file for writing
OpenW, Ilun, FileName, /Get_Lun, Swap_Endian=Little_Endian()

This will tell IDL to byte-swap the data when reading from the binary file.

--Bmy 12:43, 4 April 2008 (EDT)