Difference between revisions of "Text manipulation with GAMAP"

From Geos-chem
Jump to: navigation, search
(Testing)
 
(61 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Text strings in IDL ==
+
----
 +
<span style="color:red"><big><strong>[https://geoschem.github.io/gamap-manual/GAMAP GAMAP] is now obsolete.  We recommend using [https://gcpy.readthedocs.io GCPy] for analyzing output from recent GEOS-Chem versions.  But we will preserve the GAMAP wiki documentation for reference.</strong></big></span>
 +
----
  
=== Creating strings ===
 
  
We may form a string of text characters in IDL either with the IDL's string function, or by placing the text between single quotes or double quotesFor example:
+
On this page we list some general tips & tricks for working with characters and strings with GAMAPPlease also see the following pages:  
  
 +
* [[General GAMAP usage]]
 +
* [[File I/O with GAMAP]]
 +
* [[Color and graphics with GAMAP]]
 +
* [[Regridding with GAMAP]]
 +
* [[Date and time computations with GAMAP]]
 +
--[[User:Bmy|Bob Y.]] 16:11, 26 November 2008 (EST)
 +
 +
== String basics ==
 +
 +
=== Creating text and numeric strings ===
 +
 +
We may form a string of text characters in IDL in the following ways:
 +
 +
# by placing text between single and double quotes
 +
# by parsing a number with IDL's STRING function
 +
# by concatenating one string with another
 +
 +
For example:
 +
 +
; Create a text string
 
  IDL> str1 = 'hello world'
 
  IDL> str1 = 'hello world'
 
  IDL> help, str1
 
  IDL> help, str1
 
  STR1            STRING    = 'hello world'
 
  STR1            STRING    = 'hello world'
 
    
 
    
 +
; Create a numeric string
 
  IDL> num2 = 3.14159
 
  IDL> num2 = 3.14159
 
  IDL> str2 = string( num2 )   
 
  IDL> str2 = string( num2 )   
 
  IDL> help, str2   
 
  IDL> help, str2   
 
  STR2            STRING    = '      3.14159'
 
  STR2            STRING    = '      3.14159'
 +
 +
; Strip leading and trailing white space
 +
IDL> str2 = strtrim( str2, 2 )
 +
IDL> help, str2
 +
STR2            STRING    = '3.14159'
 +
 +
; Create a new string by concatenating 2 strings
 +
IDL> a = 'I am string 1'
 +
IDL> b = 'I am string 2'
 +
IDL> c = a + ':' + b
 +
IDL> print, c
 +
I am string 1:I am string 2
 +
 +
You can use IDL's STRTRIM function to strip the leading and trailing whitespace.
  
 
=== Equivalence of strings and byte arrays ===
 
=== Equivalence of strings and byte arrays ===
Line 26: Line 62:
 
  HELLO
 
  HELLO
  
GAMAP comes with a very useful routine called <tt>str2byte.pro</tt>.  This allows you to take a text string and to convert it into the equivalent array of bytes.   
+
GAMAP comes with a very useful routine called STR2BYTE.  This allows you to take a text string and to convert it into the equivalent array of bytes.   
  
 
  IDL> str = 'IDL is neat!'
 
  IDL> str = 'IDL is neat!'
Line 39: Line 75:
 
=== Representing special characters ===
 
=== Representing special characters ===
  
We must specify some special non-printing ASCII characters with their byte value.  For exaaple, the horizontal tab character is the 9th character in the ASCII table, so we may specify that as:
+
We must specify some special non-printing ASCII characters with their byte value.  For example, the horizontal tab character is the 9th character in the ASCII table, so we may specify that as:
  
 
  IDL> tab = 9B
 
  IDL> tab = 9B
Line 49: Line 85:
  
 
For more information about IDL's string functions, please see [http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html.]
 
For more information about IDL's string functions, please see [http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html.]
 +
 +
== Locating text within a string ==
 +
 +
The following routines can be used to locate text within a string variable:
 +
 +
:;STRCMP: IDL routine to compare text in one string to another
 +
:;STRPOS: IDL routine to test for the existence of a substring within a string
 +
:;STRMATCH: IDL routine that will test for strings that match certain patterns
 +
:;STRMID: IDL routine that will return a substring from a string
 +
:;STRWHERE: GAMAP routine that returns the locations of a single character within a string
 +
:;RSEARCH: GAMAP routine to look for text in a string starting from the end of the string
 +
:;STRRIGHT: GAMAP routine that returns the last N characters from a string
 +
 +
STRCMP can be used to test if two strings are equivalent.  You can do the test for all characters in a string, or just for the first N characters.  For example:
 +
 +
; Test to see if 2 strings are equivalent
 +
; Specifying /FOLD_CASE keyword will do a case-insensitive search
 +
IDL> str1 = 'My baloney has a first name, it is O.S.C.A.R.'
 +
IDL> str2 = 'My baloney has a second name, it is M.A.Y.E.R.'
 +
IDL> print, strcmp( str1, str2, /fold_case )
 +
    0
 +
 +
; This time only test the 1st 16 characters in both strings
 +
IDL> print, strcmp( str1, str2, 16, /fold_case )
 +
    1
 +
 +
STRPOS is an easy way to test if a given substring is located within larger string:
 +
 +
IDL> print, strpos( 'She sells seashells by the seashore', 'sea' )
 +
          10
 +
 +
Note that even though the substring "sea" occurs twice in the above string, STRPOS will only return the location of the first occurrence.
 +
 +
STRMATCH can be used to test if certain strings match a given pattern.  You can use the following wild cards in your search string.
 +
 +
* <tt>*</tt> matches any string
 +
* <tt>?</tt> matches any single character
 +
* <tt>[a..c]</tt> matches any of the enclosed characters (in this case a thru c)
 +
 +
For example:
 +
 +
; Find ALL 4-LETTER WORDS in a string array
 +
; that begin with “f” or “F” and end with “t” or “T”:
 +
IDL> str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']
 +
IDL> ind = where( strmatch( str, 'f??t', /fold_case ) eq 1 )
 +
IDL> print, ind
 +
          0          1          3          5
 +
IDL> print, str[ind]
 +
foot Feet FAST fort
 +
 +
; Find all words OF ANY LENGTH in a string array
 +
; that begin with “f” or “F” and end with “t” or “T”:
 +
IDL> ind = where( strmatch( str, 'f*t', /fold_case ) eq 1 )
 +
IDL> print, ind
 +
            0          1          3          4          5
 +
IDL> print, str[ind]
 +
foot Feet FAST ferret fort
 +
 +
 +
STRMID can be used to extract any sized substring from a larger string.  You need to specify the starting location and number of characters to be extracted.  For example:
 +
 +
IDL> str = 'The quick brown fox jumped over the lazy dog'
 +
IDL> print, strmid( str, 4, 5 )
 +
quick
 +
IDL> print, strmid( str, 0, 3 )
 +
The
 +
IDL> print, strmid( str, 10, 5 )
 +
brown
 +
 +
'''''TIP: Always remember that the first character has index 0!'''''
 +
 +
STRWHERE returns the location of a single character in a larger string.
 +
 +
IDL> print, strwhere( 'anthony aardvark asked about auditory access', 'a' )
 +
    0    8    9  13  17  23  29  38
 +
 +
If you want to search for text starting from the end of the string, you can use the RSEARCH function.  This can be useful in removing or replacing from a file name.  For example:
 +
 +
; Replace the *.pro extension with *.txt
 +
IDL> str = '/home/bmy/IDL/gamap2/gamap_util/gamap.pro'
 +
IDL> ind = rsearch ( str, '.' )
 +
IDL> str2 = strmid( str, 0, ind ) + '.txt
 +
IDL> print, str2
 +
/home/bmy/IDL/gamap2/gamap_util/gamap.txt
 +
 +
Finally, STRRIGHT returns the last N characters from a string.
 +
 +
IDL> print, strright( 'anthony aardvark asked about auditory access', 6 )             
 +
access
  
 
== Replacing characters in a string ==
 
== Replacing characters in a string ==
  
=== STRPUT ===
+
The following routines can be used to replace text within a string variable:
 +
 
 +
:;STRPUT: IDL routine to insert text into a string
 +
:;REPLACE_TOKEN: GAMAP routine that replaces occurrences of tokens with text. Can also be used to expand wildcards with a name list.
 +
:;STRREPL: GAMAP routine that replaces all occurences of one character in a string with another character.
  
 
IDL's STRPUT function is one way to insert characters into a string of text:
 
IDL's STRPUT function is one way to insert characters into a string of text:
Line 62: Line 191:
  
 
However, this requires that you provide the location in the string where the text replacement will take place.  In the above example, we insert the text at character 11 (the 1st character in a string is always character 0).
 
However, this requires that you provide the location in the string where the text replacement will take place.  In the above example, we insert the text at character 11 (the 1st character in a string is always character 0).
 
=== REPLACE_TOKEN ===
 
  
 
The above task is much more easily accomplished with GAMAP's REPLACE_TOKEN function:
 
The above task is much more easily accomplished with GAMAP's REPLACE_TOKEN function:
  
 
  IDL> str1 = 'Now is the winter of our discontent'
 
  IDL> str1 = 'Now is the winter of our discontent'
  IDL> str2 = replace_token( str1, 'winter', 'summer', delim='' )
+
  IDL> str2 = replace_token( str1, 'winter', 'summer', delim="" )
 
  IDL> print, str2
 
  IDL> print, str2
 
  Now is the summer of our discontent
 
  Now is the summer of our discontent
  
With REPLACE_TOKEN you do not need to know the position in the string where the replacement text will be inserted.
+
With REPLACE_TOKEN you do not need to know the position in the string where the replacement text will be inserted. Also, if the text to be replaced occurs more than once in the string, REPLACE_TOKEN will do the text replacement for the whole string in one fell swoop.  For example:
 +
 
 +
IDL> print, replace_token( 'She sells seashells by the seashore', 'sea', 'ocean', delim="")
 +
She sells oceanshells by the oceanshore
  
=== STRREPL ===
+
Note that REPLACE_TOKEN can replace strings that are of different lengths (e.g. in the above example "sea" is replaced by "ocean").
  
GAMAP also has another function called STRREPL that allows you to replace multiple instances of a single character in a string.  For example:
+
STRREPL that allows you to replace multiple instances of a single character in a string.  For example:
  
 
  IDL> print, strrepl( 'Mississippi', 'i', 'a' )
 
  IDL> print, strrepl( 'Mississippi', 'i', 'a' )
Line 83: Line 213:
 
But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.
 
But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.
  
== String inquiry functions ==
+
== Splitting strings into substrings ==
 +
 
 +
You can split a string into individual substrings with GAMAP's STRBREAK function.
 +
 
 +
; Use STRBREAK to split the line by spaces
 +
IDL> result = strbreak( 'The sunshine of our li..ii..ii..ii..ife', ' ' )
 +
IDL> for j = 0, n_elements( result )-1 do print, fix(j), ':', result[j]
 +
        0:The
 +
        1:sunshine
 +
        2:of
 +
        3:our
 +
        4:li..ii..ii..ii..ife
 +
 +
; Use STRBREAK to split the line by commas
 +
IDL> result = strbreak( 'Parsley,Sage,Rosemary,and Thyme', ',' )
 +
IDL> for j = 0, n_elements( result )-1 do print, fix(j), ':', result[j]
 +
        0:Parsley
 +
        1:Sage
 +
        2:Rosemary
 +
        3:and Thyme
 +
 
 +
STRBREAK will return an array of values.  The first element is the first word, the second element is the second word, etc.
 +
 
 +
We recommend that you use GAMAP's STRBREAK rather than IDL's STRSPLIT or STR_SEP routines.  STR_SEP was the standard routine to separate strings until IDL 5.2.  In IDL 5.3 and higher, STR_SEP was obsoleted and replaced with the new STRSPLIT routine.
 +
 
 +
* If you are using IDL 5.2 or lower, then STRBREAK will call STR_SEP to break the string. 
 +
* If you are using IDL 5.3 or higher, then STRBREAK will call STRSPLIT to break the string.
 +
 
 +
Therefore, STRBREAK will work properly regardless of which version of IDL you are using.
 +
 
 +
== GAMAP's string inquiry functions ==
  
 
GAMAP ships with the following string inquiry functions:
 
GAMAP ships with the following string inquiry functions:
  
;ISALGEBRAIC: Locates the position of algebraic characters in a string (e.g. locations that are EITHER digits '.' OR +/- signs).
+
:;ISALGEBRAIC: Locates the position of algebraic characters in a string (e.g. locations that are EITHER digits '.' OR +/- signs).
;ISALNUM: Locates the position of alphanumeric characters ( A...Z, a...z, 0..9 ) in a string.
+
:;ISALNUM: Locates the position of alphanumeric characters ( A...Z, a...z, 0..9 ) in a string.
;ISALPHA: Locates the positions of alphabetic characters ( A...Z, a...z ) in a string.
+
:;ISALPHA: Locates the positions of alphabetic characters ( A...Z, a...z ) in a string.
;ISDIGIT: Locates the positions of numeric characters ( '0' ... '9') in a string.  
+
:;ISDIGIT: Locates the positions of numeric characters ( '0' ... '9') in a string.  
;ISGRAPH: Locates the positions of graphics characters (i.e. printable characters excluding SPACE) in a string.
+
:;ISGRAPH: Locates the positions of graphics characters (i.e. printable characters excluding SPACE) in a string.
;ISLOWER: Locates the positions of lowercase alphabetic characters in a string.
+
:;ISLOWER: Locates the positions of lowercase alphabetic characters in a string.
;ISPRINT: Locates the positions of all printable characters (including SPACE) in a string.
+
:;ISPRINT: Locates the positions of all printable characters (including SPACE) in a string.
;ISSPACE: Locates the positions of all white space characters in a string.
+
:;ISSPACE: Locates the positions of all white space characters in a string.
;ISUPPER: Locates the positions of all uppercase alphabetic characters in a string.
+
:;ISUPPER: Locates the positions of all uppercase alphabetic characters in a string.
  
Each of the above routines return a vector of 0's and 1's, corresponding to each character in the string that satisfies the given criteria.
+
Each of the above routines return a vector of 0's and 1's corresponding to each character in the string that satisfies the given criteria.
  
 
Some examples:
 
Some examples:
  
 
  IDL> str = '#99# Bottles of *Beer* on the Wall!'   
 
  IDL> str = '#99# Bottles of *Beer* on the Wall!'   
 
+
 
  IDL> print, isalgebraic( str ), format='(35i1)'
 
  IDL> print, isalgebraic( str ), format='(35i1)'
 
  01100000000000000000000000000000000
 
  01100000000000000000000000000000000
 
+
 
  IDL> print, isalnum( str ), format='(35i1)'
 
  IDL> print, isalnum( str ), format='(35i1)'
 
  01100111111101100111100110111011110
 
  01100111111101100111100110111011110
 
+
 
  IDL> print, isalpha( str ), format='(35i1)'
 
  IDL> print, isalpha( str ), format='(35i1)'
 
  00000111111101100111100110111011110
 
  00000111111101100111100110111011110
 
+
 
  IDL> print, isdigit( str ), format='(35i1)'
 
  IDL> print, isdigit( str ), format='(35i1)'
 
  01100000000000000000000000000000000
 
  01100000000000000000000000000000000
 
+
 
  IDL> print, isgraph( str ), format='(35i1)'
 
  IDL> print, isgraph( str ), format='(35i1)'
 
  11110111111101101111110110111011111
 
  11110111111101101111110110111011111
 
+
 
  IDL> print, islower( str ), format='(35i1)'
 
  IDL> print, islower( str ), format='(35i1)'
 
  00000011111101100011100110111001110
 
  00000011111101100011100110111001110
 
+
 
  IDL> print, isprint( str ), format='(35i1)'
 
  IDL> print, isprint( str ), format='(35i1)'
 
  11111111111111111111111111111111111
 
  11111111111111111111111111111111111
 
+
 
  IDL> print, isspace( str ), format='(35i1)'
 
  IDL> print, isspace( str ), format='(35i1)'
 
  00001000000010010000001001000100000
 
  00001000000010010000001001000100000
 
+
 
  IDL> print, isupper( str ), format='(35i1)'
 
  IDL> print, isupper( str ), format='(35i1)'
 
  00000100000000000100000000000010000
 
  00000100000000000100000000000010000
 +
 +
== GAMAP's string formatting functions ==
 +
 +
GAMAP ships with the following string formatting functions:
 +
 +
:;STRSCI: Converts a number to a string in scientific notation format ( e.g. A x 10^B )
 +
:;STRCHEM: Superscripts or subscripts numbers and special characters ('x', 'y') found in strings containing names of chemical species.
 +
 +
STRSCI can be used to put a string into scientific notation.  The string will contain the appropriate Hershey characters so that it can be passed to PLOT or XYOUTS.
 +
 +
IDL> str = STRSCI( 2000000, format='(i1)' )
 +
IDL> print, str
 +
2 x 10!u6!n
 +
 +
STRCHEM can be used to create strings with superscripts and subscripts (e.g. H<sub>2</sub>O, <sup>222</sup>Rn) for plotting purposes:
 +
 +
IDL> print, strchem( 'NOx', /sub )           
 +
NO!lx!n
 +
 +
IDL> print, strchem( '222Rn', /sup )
 +
!u2!n!u2!n!u2!nRn
 +
 +
== Functions for working with file and path names ==
 +
 +
The following string-handling functions are specially geared towards working with file and path names:
 +
 +
:;EXPAND_PATH: IDL routine to expand wild cards in a file name or path name
 +
:;EXTRACT_FILENAME: GAMAP routine to extract a file name from a fully qualified file path
 +
:;EXTRACT_PATH: GAMAP routine to extract the directory from a fully qualified file path
 +
:;ADD_SEPARATOR: GAMAP routine to make sure a file name ends with a directory path separator
 +
 +
Sometimes it is necessary to fully expand the wild cards in a filename.  (For example, the IDL routine HDF_BROWSER will choke if it finds the Unix wild card character ~ in the file path, so you have to expand to the full file path.)  This can be easily done with EXPAND_PATH:
 +
 +
IDL> print, expand_path( '~bmy/IDL/gamap2/gamap_util/gamap.pro' )
 +
/home/bmy/IDL/gamap2/gamap_util/gamap.pro
 +
 +
With EXTRACT_FILENAME, you can extract just the filename part from a full file path:
 +
 +
IDL> print, extract_filename( '/home/bmy/IDL/gamap2/gamap_util/gamap.pro' )
 +
gamap.pro
 +
 +
and with EXTRACT_PATH, you can extract just the directory part from a full file path:
 +
 +
IDL> print, extract_path( '/home/bmy/IDL/gamap2/gamap_util/gamap.pro' )
 +
/home/bmy/IDL/gamap2/gamap_util/
 +
 +
Finally, with ADD_SEPARATOR, you can ensure that your directory name always ends in a separator.  For example:
 +
 +
; Make sure the directory has a separator character
 +
; before we append the file name
 +
IDL> pwd, mydir                   
 +
IDL> print, mydir
 +
/home/bmy/IDL
 +
IDL> myfile = add_separator( mydir ) + 'myfile.pro'
 +
IDL> print, myfile
 +
/home/bmy/IDL/perl/myfile.pro
 +
 +
--[[User:Bmy|Bmy]] 13:02, 24 April 2008 (EDT)

Latest revision as of 19:08, 16 September 2022


GAMAP is now obsolete. We recommend using GCPy for analyzing output from recent GEOS-Chem versions. But we will preserve the GAMAP wiki documentation for reference.



On this page we list some general tips & tricks for working with characters and strings with GAMAP. Please also see the following pages:

--Bob Y. 16:11, 26 November 2008 (EST)

String basics

Creating text and numeric strings

We may form a string of text characters in IDL in the following ways:

  1. by placing text between single and double quotes
  2. by parsing a number with IDL's STRING function
  3. by concatenating one string with another

For example:

; Create a text string 
IDL> str1 = 'hello world'
IDL> help, str1
STR1            STRING    = 'hello world'
 
; Create a numeric string
IDL> num2 = 3.14159
IDL> str2 = string( num2 )   
IDL> help, str2  
STR2            STRING    = '      3.14159'

; Strip leading and trailing white space 
IDL> str2 = strtrim( str2, 2 )
IDL> help, str2
STR2            STRING    = '3.14159'

; Create a new string by concatenating 2 strings
IDL> a = 'I am string 1'
IDL> b = 'I am string 2'
IDL> c = a + ':' + b
IDL> print, c
I am string 1:I am string 2

You can use IDL's STRTRIM function to strip the leading and trailing whitespace.

Equivalence of strings and byte arrays

In IDL, a string of text characters is equivalent to an array of byte values. A byte is a collection of 8 bits and may express values from 0-255. The ASCII collating sequence has 255 values. (Actually, the original ASCII table had 128 values, but this was later extended to 255 values to include special characters.) One byte represents a single ASCII text character.

This means that it is easy to convert between strings and bytes in IDL. If you have an array of bytes, you can use any of the IDL string routines on them, for example:

IDL> byte_array = [ 72B, 69B, 76B, 76B, 79B ]
IDL> help, byte_array    
BYTE_ARRAY      BYTE      = Array[5]
IDL> print, strtrim( byte_array, 2 ) 
HELLO

GAMAP comes with a very useful routine called STR2BYTE. This allows you to take a text string and to convert it into the equivalent array of bytes.

IDL> str = 'IDL is neat!'
IDL> byte_array = str2byte( str, strlen( str ) )
IDL> help, byte_array
BYTE_ARRAY      BYTE      = Array[12]
IDL> print, byte_array   
  73  68  76  32 105 115  32 110 101  97 116  33

Note that we used IDL's STRLEN function to return the length of the string.

Representing special characters

We must specify some special non-printing ASCII characters with their byte value. For example, the horizontal tab character is the 9th character in the ASCII table, so we may specify that as:

IDL> tab = 9B
IDL> help, tab
TAB             BYTE      =    9
IDL> str = 'hello' + string(tab) + 'world' 
IDL> print, str
hello   world

For more information about IDL's string functions, please see http://idlastro.gsfc.nasa.gov/idl_html_help/Strings.html.

Locating text within a string

The following routines can be used to locate text within a string variable:

STRCMP
IDL routine to compare text in one string to another
STRPOS
IDL routine to test for the existence of a substring within a string
STRMATCH
IDL routine that will test for strings that match certain patterns
STRMID
IDL routine that will return a substring from a string
STRWHERE
GAMAP routine that returns the locations of a single character within a string
RSEARCH
GAMAP routine to look for text in a string starting from the end of the string
STRRIGHT
GAMAP routine that returns the last N characters from a string

STRCMP can be used to test if two strings are equivalent. You can do the test for all characters in a string, or just for the first N characters. For example:

; Test to see if 2 strings are equivalent
; Specifying /FOLD_CASE keyword will do a case-insensitive search
IDL> str1 = 'My baloney has a first name, it is O.S.C.A.R.'
IDL> str2 = 'My baloney has a second name, it is M.A.Y.E.R.'
IDL> print, strcmp( str1, str2, /fold_case )
   0

; This time only test the 1st 16 characters in both strings
IDL> print, strcmp( str1, str2, 16, /fold_case )
   1

STRPOS is an easy way to test if a given substring is located within larger string:

IDL> print, strpos( 'She sells seashells by the seashore', 'sea' )
          10

Note that even though the substring "sea" occurs twice in the above string, STRPOS will only return the location of the first occurrence.

STRMATCH can be used to test if certain strings match a given pattern. You can use the following wild cards in your search string.

  • * matches any string
  • ? matches any single character
  • [a..c] matches any of the enclosed characters (in this case a thru c)

For example:

; Find ALL 4-LETTER WORDS in a string array 
; that begin with “f” or “F” and end with “t” or “T”:
IDL> str = ['foot', 'Feet', 'fate', 'FAST', 'ferret', 'fort']
IDL> ind = where( strmatch( str, 'f??t', /fold_case ) eq 1 )
IDL> print, ind
          0           1           3           5
IDL> print, str[ind]
foot Feet FAST fort

; Find all words OF ANY LENGTH in a string array 
; that begin with “f” or “F” and end with “t” or “T”:
IDL> ind = where( strmatch( str, 'f*t', /fold_case ) eq 1 )
IDL> print, ind
           0           1           3           4           5
IDL> print, str[ind]
foot Feet FAST ferret fort

STRMID can be used to extract any sized substring from a larger string. You need to specify the starting location and number of characters to be extracted. For example:

IDL> str = 'The quick brown fox jumped over the lazy dog'
IDL> print, strmid( str, 4, 5 )
quick
IDL> print, strmid( str, 0, 3 )
The
IDL> print, strmid( str, 10, 5 )
brown

TIP: Always remember that the first character has index 0!

STRWHERE returns the location of a single character in a larger string.

IDL> print, strwhere( 'anthony aardvark asked about auditory access', 'a' )
    0    8    9   13   17   23   29   38

If you want to search for text starting from the end of the string, you can use the RSEARCH function. This can be useful in removing or replacing from a file name. For example:

; Replace the *.pro extension with *.txt
IDL> str = '/home/bmy/IDL/gamap2/gamap_util/gamap.pro'
IDL> ind = rsearch ( str, '.' )
IDL> str2 = strmid( str, 0, ind ) + '.txt
IDL> print, str2
/home/bmy/IDL/gamap2/gamap_util/gamap.txt

Finally, STRRIGHT returns the last N characters from a string.

IDL> print, strright( 'anthony aardvark asked about auditory access', 6 )              
access

Replacing characters in a string

The following routines can be used to replace text within a string variable:

STRPUT
IDL routine to insert text into a string
REPLACE_TOKEN
GAMAP routine that replaces occurrences of tokens with text. Can also be used to expand wildcards with a name list.
STRREPL
GAMAP routine that replaces all occurences of one character in a string with another character.

IDL's STRPUT function is one way to insert characters into a string of text:

IDL> str1 = 'Now is the winter of our discontent'
IDL> strput, str1, 'summer', 11
IDL> print, str1
Now is the summer of our discontent

However, this requires that you provide the location in the string where the text replacement will take place. In the above example, we insert the text at character 11 (the 1st character in a string is always character 0).

The above task is much more easily accomplished with GAMAP's REPLACE_TOKEN function:

IDL> str1 = 'Now is the winter of our discontent'
IDL> str2 = replace_token( str1, 'winter', 'summer', delim="" )
IDL> print, str2
Now is the summer of our discontent

With REPLACE_TOKEN you do not need to know the position in the string where the replacement text will be inserted. Also, if the text to be replaced occurs more than once in the string, REPLACE_TOKEN will do the text replacement for the whole string in one fell swoop. For example:

IDL> print, replace_token( 'She sells seashells by the seashore', 'sea', 'ocean', delim="")
She sells oceanshells by the oceanshore

Note that REPLACE_TOKEN can replace strings that are of different lengths (e.g. in the above example "sea" is replaced by "ocean").

STRREPL that allows you to replace multiple instances of a single character in a string. For example:

IDL> print, strrepl( 'Mississippi', 'i', 'a' )
Massassappa

But if you need to replace an entire word rather than just single characters it's better to use REPLACE_TOKEN.

Splitting strings into substrings

You can split a string into individual substrings with GAMAP's STRBREAK function.

; Use STRBREAK to split the line by spaces
IDL> result = strbreak( 'The sunshine of our li..ii..ii..ii..ife', ' ' )
IDL> for j = 0, n_elements( result )-1 do print, fix(j), ':', result[j]
       0:The
       1:sunshine
       2:of
       3:our
       4:li..ii..ii..ii..ife

; Use STRBREAK to split the line by commas
IDL> result = strbreak( 'Parsley,Sage,Rosemary,and Thyme', ',' )
IDL> for j = 0, n_elements( result )-1 do print, fix(j), ':', result[j]
       0:Parsley
       1:Sage
       2:Rosemary
       3:and Thyme

STRBREAK will return an array of values. The first element is the first word, the second element is the second word, etc.

We recommend that you use GAMAP's STRBREAK rather than IDL's STRSPLIT or STR_SEP routines. STR_SEP was the standard routine to separate strings until IDL 5.2. In IDL 5.3 and higher, STR_SEP was obsoleted and replaced with the new STRSPLIT routine.

  • If you are using IDL 5.2 or lower, then STRBREAK will call STR_SEP to break the string.
  • If you are using IDL 5.3 or higher, then STRBREAK will call STRSPLIT to break the string.

Therefore, STRBREAK will work properly regardless of which version of IDL you are using.

GAMAP's string inquiry functions

GAMAP ships with the following string inquiry functions:

ISALGEBRAIC
Locates the position of algebraic characters in a string (e.g. locations that are EITHER digits '.' OR +/- signs).
ISALNUM
Locates the position of alphanumeric characters ( A...Z, a...z, 0..9 ) in a string.
ISALPHA
Locates the positions of alphabetic characters ( A...Z, a...z ) in a string.
ISDIGIT
Locates the positions of numeric characters ( '0' ... '9') in a string.
ISGRAPH
Locates the positions of graphics characters (i.e. printable characters excluding SPACE) in a string.
ISLOWER
Locates the positions of lowercase alphabetic characters in a string.
ISPRINT
Locates the positions of all printable characters (including SPACE) in a string.
ISSPACE
Locates the positions of all white space characters in a string.
ISUPPER
Locates the positions of all uppercase alphabetic characters in a string.

Each of the above routines return a vector of 0's and 1's corresponding to each character in the string that satisfies the given criteria.

Some examples:

IDL> str = '#99# Bottles of *Beer* on the Wall!'  

IDL> print, isalgebraic( str ), format='(35i1)'
01100000000000000000000000000000000

IDL> print, isalnum( str ), format='(35i1)'
01100111111101100111100110111011110

IDL> print, isalpha( str ), format='(35i1)'
00000111111101100111100110111011110

IDL> print, isdigit( str ), format='(35i1)'
01100000000000000000000000000000000

IDL> print, isgraph( str ), format='(35i1)'
11110111111101101111110110111011111

IDL> print, islower( str ), format='(35i1)'
00000011111101100011100110111001110

IDL> print, isprint( str ), format='(35i1)'
11111111111111111111111111111111111

IDL> print, isspace( str ), format='(35i1)'
00001000000010010000001001000100000

IDL> print, isupper( str ), format='(35i1)'
00000100000000000100000000000010000

GAMAP's string formatting functions

GAMAP ships with the following string formatting functions:

STRSCI
Converts a number to a string in scientific notation format ( e.g. A x 10^B )
STRCHEM
Superscripts or subscripts numbers and special characters ('x', 'y') found in strings containing names of chemical species.

STRSCI can be used to put a string into scientific notation. The string will contain the appropriate Hershey characters so that it can be passed to PLOT or XYOUTS.

IDL> str = STRSCI( 2000000, format='(i1)' )
IDL> print, str
2 x 10!u6!n

STRCHEM can be used to create strings with superscripts and subscripts (e.g. H2O, 222Rn) for plotting purposes:

IDL> print, strchem( 'NOx', /sub )            
NO!lx!n

IDL> print, strchem( '222Rn', /sup )
!u2!n!u2!n!u2!nRn

Functions for working with file and path names

The following string-handling functions are specially geared towards working with file and path names:

EXPAND_PATH
IDL routine to expand wild cards in a file name or path name
EXTRACT_FILENAME
GAMAP routine to extract a file name from a fully qualified file path
EXTRACT_PATH
GAMAP routine to extract the directory from a fully qualified file path
ADD_SEPARATOR
GAMAP routine to make sure a file name ends with a directory path separator

Sometimes it is necessary to fully expand the wild cards in a filename. (For example, the IDL routine HDF_BROWSER will choke if it finds the Unix wild card character ~ in the file path, so you have to expand to the full file path.) This can be easily done with EXPAND_PATH:

IDL> print, expand_path( '~bmy/IDL/gamap2/gamap_util/gamap.pro' )
/home/bmy/IDL/gamap2/gamap_util/gamap.pro

With EXTRACT_FILENAME, you can extract just the filename part from a full file path:

IDL> print, extract_filename( '/home/bmy/IDL/gamap2/gamap_util/gamap.pro' )
gamap.pro

and with EXTRACT_PATH, you can extract just the directory part from a full file path:

IDL> print, extract_path( '/home/bmy/IDL/gamap2/gamap_util/gamap.pro' )
/home/bmy/IDL/gamap2/gamap_util/

Finally, with ADD_SEPARATOR, you can ensure that your directory name always ends in a separator. For example:

; Make sure the directory has a separator character
; before we append the file name
IDL> pwd, mydir                    
IDL> print, mydir
/home/bmy/IDL
IDL> myfile = add_separator( mydir ) + 'myfile.pro'
IDL> print, myfile
/home/bmy/IDL/perl/myfile.pro

--Bmy 13:02, 24 April 2008 (EDT)