Text manipulation with GAMAP: Difference between revisions

From Geos-chem
Jump to navigation Jump to search
Line 83: Line 83:
But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.
But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.


== Testing ==
== String inquiry functions ==
 
GAMAP ships with the following string inquiry functions:
 
;ISALGEBRAIC: Locates the position of algebraic characters in a string (e.g. locations that are EITHER digits '.' OR +/- signs).
;ISALNUM: Locates the position of alphanumeric characters ( A...Z, a...z, 0..9 ) in a string.
;ISALPHA: Locates the positions of alphabetic characters ( A...Z, a...z ) in a string.
;ISDIGIT: Locates the positions of numeric characters ( '0' ... '9') in a string.
;ISGRAPH: Locates the positions of graphics characters (i.e. printable characters excluding SPACE) in a string.
;ISLOWER: Locates the positions of lowercase alphabetic characters in a string.
;ISPRINT: Locates the positions of all printable characters (including SPACE) in a string.
;ISSPACE: Locates the positions of all white space characters in a string.
;ISUPPER: Locates the positions of all uppercase alphabetic characters in a string.
 
Each of the above routines return a vector of 0's and 1's, corresponding to each character in the string that satisfies the given criteria.
 
Some examples:
 
IDL> str = '#99# Bottles of *Beer* on the Wall!' 
 
IDL> print, isalgebraic( str ), format='(35i1)'
01100000000000000000000000000000000
 
IDL> print, isalnum( str ), format='(35i1)'
01100111111101100111100110111011110
 
IDL> print, isalpha( str ), format='(35i1)'
00000111111101100111100110111011110
 
IDL> print, isdigit( str ), format='(35i1)'
01100000000000000000000000000000000
 
IDL> print, isgraph( str ), format='(35i1)'
11110111111101101111110110111011111
 
IDL> print, islower( str ), format='(35i1)'
00000011111101100011100110111001110
 
IDL> print, isprint( str ), format='(35i1)'
11111111111111111111111111111111111
 
IDL> print, isspace( str ), format='(35i1)'
00001000000010010000001001000100000
 
IDL> print, isupper( str ), format='(35i1)'
00000100000000000100000000000010000

Revision as of 20:14, 15 April 2008

Text strings in IDL

Creating strings

We may form a string of text characters in IDL either with the IDL's string function, or by placing the text between single quotes or double quotes. For example:

IDL> str1 = 'hello world'
IDL> help, str1
STR1            STRING    = 'hello world'
 
IDL> num2 = 3.14159
IDL> str2 = string( num2 )   
IDL> help, str2  
STR2            STRING    = '      3.14159'

Equivalence of strings and byte arrays

In IDL, a string of text characters is equivalent to an array of byte values. A byte is a collection of 8 bits and may express values from 0-255. The ASCII collating sequence has 255 values. (Actually, the original ASCII table had 128 values, but this was later extended to 255 values to include special characters.) One byte represents a single ASCII text character.

This means that it is easy to convert between strings and bytes in IDL. If you have an array of bytes, you can use any of the IDL string routines on them, for example:

IDL> byte_array = [ 72B, 69B, 76B, 76B, 79B ]
IDL> help, byte_array    
BYTE_ARRAY      BYTE      = Array[5]
IDL> print, strtrim( byte_array, 2 ) 
HELLO

GAMAP comes with a very useful routine called str2byte.pro. This allows you to take a text string and to convert it into the equivalent array of bytes.

IDL> str = 'IDL is neat!'
IDL> byte_array = str2byte( str, strlen( str ) )
IDL> help, byte_array
BYTE_ARRAY      BYTE      = Array[12]
IDL> print, byte_array   
  73  68  76  32 105 115  32 110 101  97 116  33

Note that we used IDL's STRLEN function to return the length of the string.

Representing special characters

We must specify some special non-printing ASCII characters with their byte value. For exaaple, the horizontal tab character is the 9th character in the ASCII table, so we may specify that as:

IDL> tab = 9B
IDL> help, tab
TAB             BYTE      =    9
IDL> str = 'hello' + string(tab) + 'world' 
IDL> print, str
hello   world

For more information about IDL's string functions, please see http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html.

Replacing characters in a string

STRPUT

IDL's STRPUT function is one way to insert characters into a string of text:

IDL> str1 = 'Now is the winter of our discontent'
IDL> strput, str1, 'summer', 11
IDL> print, str1
Now is the summer of our discontent

However, this requires that you provide the location in the string where the text replacement will take place. In the above example, we insert the text at character 11 (the 1st character in a string is always character 0).

REPLACE_TOKEN

The above task is much more easily accomplished with GAMAP's REPLACE_TOKEN function:

IDL> str1 = 'Now is the winter of our discontent'
IDL> str2 = replace_token( str1, 'winter', 'summer', delim= )
IDL> print, str2
Now is the summer of our discontent

With REPLACE_TOKEN you do not need to know the position in the string where the replacement text will be inserted.

STRREPL

GAMAP also has another function called STRREPL that allows you to replace multiple instances of a single character in a string. For example:

IDL> print, strrepl( 'Mississippi', 'i', 'a' )
Massassappa

But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.

String inquiry functions

GAMAP ships with the following string inquiry functions:

ISALGEBRAIC
Locates the position of algebraic characters in a string (e.g. locations that are EITHER digits '.' OR +/- signs).
ISALNUM
Locates the position of alphanumeric characters ( A...Z, a...z, 0..9 ) in a string.
ISALPHA
Locates the positions of alphabetic characters ( A...Z, a...z ) in a string.
ISDIGIT
Locates the positions of numeric characters ( '0' ... '9') in a string.
ISGRAPH
Locates the positions of graphics characters (i.e. printable characters excluding SPACE) in a string.
ISLOWER
Locates the positions of lowercase alphabetic characters in a string.
ISPRINT
Locates the positions of all printable characters (including SPACE) in a string.
ISSPACE
Locates the positions of all white space characters in a string.
ISUPPER
Locates the positions of all uppercase alphabetic characters in a string.

Each of the above routines return a vector of 0's and 1's, corresponding to each character in the string that satisfies the given criteria.

Some examples:

IDL> str = '#99# Bottles of *Beer* on the Wall!'  
IDL> print, isalgebraic( str ), format='(35i1)'
01100000000000000000000000000000000
IDL> print, isalnum( str ), format='(35i1)'
01100111111101100111100110111011110
IDL> print, isalpha( str ), format='(35i1)'
00000111111101100111100110111011110
IDL> print, isdigit( str ), format='(35i1)'
01100000000000000000000000000000000
IDL> print, isgraph( str ), format='(35i1)'
11110111111101101111110110111011111
IDL> print, islower( str ), format='(35i1)'
00000011111101100011100110111001110
IDL> print, isprint( str ), format='(35i1)'
11111111111111111111111111111111111
IDL> print, isspace( str ), format='(35i1)'
00001000000010010000001001000100000
IDL> print, isupper( str ), format='(35i1)'
00000100000000000100000000000010000