Chapter 9. Parsing

Table of Contents
Simple Templates for Parsing into Words
Templates Containing String Patterns
Templates Containing Positional (Numeric) Patterns
Parsing with Variable Patterns
Using UPPER, LOWER, and CASELESS
Parsing Instructions Summary
Parsing Instructions Examples
Advanced Topics in Parsing

The parsing instructions are ARG, PARSE, and PULL (see ARG, PARSE, and PULL).

The data to be parsed is a source string. Parsing splits the data in a source string and assigns pieces of it to the variables named in a template. A template is a model specifying how to split the source string. The simplest kind of template consists of a list of variable names. Here is an example:

variable1 variable2 variable3

This kind of template parses the source string into blank-delimited words. More complicated templates contain patterns in addition to variable names:

String patterns

Match the characters in the source string to specify where it is to be split. (See Templates Containing String Patterns for details.)

Positional patterns

Indicate the character positions at which the source string is to be split. (See Templates Containing Positional (Numeric) Patterns for details.)

Parsing is essentially a two-step process:

  1. Parse the source string into appropriate substrings using patterns.

  2. Parse each substring into words.

Simple Templates for Parsing into Words

Here is a parsing instruction:

parse value "time and tide" with var1 var2 var3

The template in this instruction is: var1 var2 var3. The data to be parsed is between the keywords PARSE VALUE and the keyword WITH, the source string time and tide. Parsing divides the source string into blank-delimited words and assigns them to the variables named in the template as follows:

var1="time"
var2="and"
var3="tide"

In this example, the source string to be parsed is a literal string, time and tide. In the next example, the source string is a variable.

/* PARSE VALUE using a variable as the source string to parse    */
string="time and tide"
parse value string with var1 var2 var3           /* same results */

PARSE VALUE does not convert lowercase a-z in the source string to uppercase A-Z. If you want to convert characters to uppercase, use PARSE UPPER VALUE. See Using UPPER, LOWER, and CASELESS for a summary of the effect of parsing instructions on the case.

Note that if you specify the CASELESS option on a PARSE instruction, the string comparisons during the scanning operation are made independently of the alphabetic case. That is, a letter in uppercase is equal to the same letter in lowercase.

All of the parsing instructions assign the parts of a source string to the variables named in a template. There are various parsing instructions because of the differences in the nature or origin of source strings. For a summary of all the parsing instructions, see Parsing Instructions Summary.

The PARSE VAR instruction is similar to PARSE VALUE except that the source string to be parsed is always a variable. In PARSE VAR, the name of the variable containing the source string follows the keywords PARSE VAR. In the next example, the variable stars contains the source string. The template is star1 star2 star3.

/* PARSE VAR example                                             */
stars="Sirius Polaris Rigil"
parse var stars star1 star2 star3             /* star1="Sirius"  */
/* star2="Polaris" */
/* star3="Rigil"   */

All variables in a template receive new values. If there are more variables in the template than words in the source string, the leftover variables receive null (empty) values. This is true for the entire parsing: for parsing into words with simple templates and for parsing with templates containing patterns. Here is an example of parsing into words:

/* More variables in template than (words in) the source string  */
satellite="moon"
parse var satellite Earth Mercury               /* Earth="moon"  */
/* Mercury=""    */

If there are more words in the source string than variables in the template, the last variable in the template receives all leftover data. Here is an example:

/* More (words in the) source string than variables in template  */
satellites="moon Io Europa Callisto..."
parse var satellites Earth Jupiter              /* Earth="moon"  */
/* Jupiter="Io Europa Callisto..."*/

Parsing into words removes leading and trailing blanks from each word before it is assigned to a variable. The exception to this is the word or group of words assigned to the last variable. The last variable in a template receives leftover data, preserving extra leading and trailing blanks. Here is an example:

/* Preserving extra blanks                                       */
solar5="Mercury Venus  Earth   Mars     Jupiter  "
parse var solar5 var1 var2 var3 var4
/* var1  ="Mercury"                                              */
/* var2  ="Venus"                                                */
/* var3  ="Earth"                                                */
/* var4  ="  Mars     Jupiter  "                                 */

In the source string, Earth has two leading blanks. Parsing removes both of them (the word-separator blank and the extra blank) before assigning var3="Earth". Mars has three leading blanks. Parsing removes one word-separator blank and keeps the other two leading blanks. It also keeps all five blanks between Mars and Jupiter and both trailing blanks after Jupiter.

Parsing removes no blanks if the template contains only one variable. For example:

parse value "   Pluto   " with var1        /* var1="   Pluto   "*/

Message Term Assignments

In addition to assigning values to variables, the PARSE instruction also allows any message term value that can be used on the left side of an assignment instruction (See Assignments and Symbols). For example:

/* Preserving extra blanks                                       */
solar5="Mercury Venus  Earth   Mars     Jupiter  "
d = .directory~new
parse var solar5 d~var1 d~var2 d~var3 d~var4
/* d~var1  ="Mercury"                                              */
/* d~var2  ="Venus"                                                */
/* d~var3  ="Earth"                                                */
/* d~var4  ="  Mars     Jupiter  "                                 */

The Period as a Placeholder

A period in a template is a placeholder. It is used instead of a variable name, but it receives no data. It is useful as a "dummy variable" in a list of variables or to collect unwanted information at the end of a string. And it saves the overhead of unneeded variables.

The period in the first example is a placeholder. Be sure to separate adjacent periods with spaces; otherwise, an error results.

/* Period as a placeholder                                       */
stars="Arcturus Betelgeuse Sirius Rigil"
parse var stars . . brightest .            /* brightest="Sirius" */

/* Alternative to period as placeholder                          */
stars="Arcturus Betelgeuse Sirius Rigil"
parse var stars drop junk brightest rest   /* brightest="Sirius" */