Difference between revisions of "Reading binary files in IDL"

From Geos-chem
Jump to: navigation, search
Line 1: Line 1:
===== "Big Endian" vs. "Little Endian" =====
+
===== "Big Endian" vs. "Little Endian" byte ordering =====
  
 
The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally.  Normally each number type is composed of one or more bytes.  (A byte consists of 8 bits and can be used to represent 256 distinct values.)
 
The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally.  Normally each number type is composed of one or more bytes.  (A byte consists of 8 bits and can be used to represent 256 distinct values.)
Line 38: Line 38:
  
 
A good discussion about Big Endian vs. Little Endian may be found here: [http://www.dfanning.com/tips/endian_machines.html]
 
A good discussion about Big Endian vs. Little Endian may be found here: [http://www.dfanning.com/tips/endian_machines.html]
 
  
 
===== Why does it matter? =====
 
===== Why does it matter? =====
Line 44: Line 43:
 
Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering.  However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.   
 
Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering.  However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.   
  
When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing.  If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in the wrong order and will be interpreted as gibberish.
+
When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing.  If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in wrong order and the data you are reading from disk will be interpreted as gibberish.
  
 
The GEOS-Chem [http://www.as.harvard.edu:16080/ctm/gamap/doc/Chapter_6.html#6.2 binary punch file format] is an example of unformatted binary files.  All binary punch files are Big Endian by default.
 
The GEOS-Chem [http://www.as.harvard.edu:16080/ctm/gamap/doc/Chapter_6.html#6.2 binary punch file format] is an example of unformatted binary files.  All binary punch files are Big Endian by default.
Line 66: Line 65:
 
** Cray
 
** Cray
 
** SGI Power Challenge/Origin
 
** SGI Power Challenge/Origin
 +
 +
== I used to be able to read a binary file in IDL but now I can't.  What's wrong? ==
 +
 +
Chances are, you are reading a GEOS-Chem binary punch file

Revision as of 19:38, 1 April 2008

"Big Endian" vs. "Little Endian" byte ordering

The terms "Big Endian" and "Little Endian" refer to the way that computers order bytes internally. Normally each number type is composed of one or more bytes. (A byte consists of 8 bits and can be used to represent 256 distinct values.)

IDL Number Type Fortran Equivalent Number of bytes
byte BYTE 1
fix INTEGER*2 2
long INTEGER*4 4
float REAL*4 4
double REAL*8 8

For all of these number types (except BYTE), you need more than one byte to create integer or floating point numbers in IDL and Fortran.

There are two ways that the bytes can be ordered: from right to left, or from left to right.

  • A Big Endian machine orders the bytes from left to right (think: a car's odometer)
  • A Little Endian machine orders the bytes from right to left (reverse of a car's odometer)

A good discussion about Big Endian vs. Little Endian may be found here: [1]

Why does it matter?

Normally, you would not be concerned whether the computer you are using uses Big Endian or a Little Endian byte ordering. However, if you are trying to read or write binary unformatted data, then the endian ordering becomes important.

When an IDL or Fortran program saves data to an unformatted file, it is saving the bytes directly from memory without any further data processing. If you save data from a Big Endian machine and try to read it back into a Little Endian machine, then the bytes will be in wrong order and the data you are reading from disk will be interpreted as gibberish.

The GEOS-Chem binary punch file format is an example of unformatted binary files. All binary punch files are Big Endian by default.

How do I know which type of machine I am using?

IDL's LITTLE_ENDIAN function returns 1 if you are on a Little Endian machine, and returns 0 if you are on a Big Endian machine:

 if ( Little_Endian() )                                       $
    then print, "I'm running IDL on a little-endian machine." $
    else print, "I'm running IDL on a big-endian machine."

In general:

  • Machines with Intel or AMD chipsets tend to be Little Endian
    • PC's & Macs
    • SGI Altix
    • Sun X4100
  • Machines with RISC chipsets tend to be Big Endian
    • Sun/SPARC
    • Cray
    • SGI Power Challenge/Origin

I used to be able to read a binary file in IDL but now I can't. What's wrong?

Chances are, you are reading a GEOS-Chem binary punch file