Tcl also supports string operations known as regular expressions Several commands can access these methods with a -regexp argument, see the man pages for which commands support regular expressions.
There are also two explicit commands for parsing regular expressions.
regexp
?switches?
exp
string
?matchVar?
?subMatch1 ... subMatchN?
string
for the regular
expression exp
. If a parameter matchVar
is given, then the substring that
matches the regular expression is copied to matchVar
. If subMatchN
variables exist, then the
parenthetical parts of the matching string are copied to the
subMatch
variables, working from left
to right.
regsub
?switches?
exp
string
subSpec
varName
string
for substrings that
match the regular expression exp
and
replaces them with subSpec
. The
resulting string is copied into varName
.
Regular expressions can be expressed in just a few rules.
Regular expressions are similar to the globbing that was
discussed in lessons 16 and 18. The main difference is in the
way that sets of matched characters are handled. In globbing
the only way to select sets of unknown text is the *
symbol. This matches to any quantity
of any character.
In regular expression parsing, the *
symbol matches zero or more occurrences
of the character immediately proceeding the *
. For example a*
would match a, aaaaa, or a blank
string. If the character directly before the *
is a set of characters within square
brackets, then the *
will match any
quantity of all of these characters. For example, [a-c]*
would match aa, abc, aabcabc, or
again, an empty string.
The +
symbol behaves roughly the
same as the *
, except that it
requires at least one character to match. For example, [a-c]+
would match a, abc, or aabcabc,
but not an empty string.
Regular expression parsing is more powerful than globbing. With
globbing you can use square brackets to enclose a set of
characters any of which will be a match. Regular expression
parsing also includes a method of selecting any character
not in a set. If the first character after the [
is a caret (^
), then the regular expression parser
will match any character not in the set of characters between
the square brackets. A caret can be included in the set of
characters to match (or not) by placing it in any position other
than the first.
The regexp
command is similar to
the string match
command in that it
matches an exp
against a string.
It is different in that it can match a portion of a string,
instead of the entire string, and will place the characters
matched into the matchVar
variable.
If a match is found to the portion of a regular
expression enclosed within parentheses, regexp
will copy the subset of matching
characters is to the subSpec
argument. This can be used to parse simple strings.
Regsub
will copy the contents of
the string to a new variable, substituting the characters that
match exp
with the characters in
subSpec
. If subSpec
contains a &
or \0
,
then those characters will be replaced by the characters that
matched exp
. If the number following a
backslash is 1-9, then that backslash sequence will be replaced
by the appropriate portion of exp
that
is enclosed within parentheses.
Note that the exp
argument to regexp
or regsub
is processed by the Tcl
substitution pass. Therefore quite often the expression is
enclosed in braces to prevent any special processing by Tcl.
set sample "Where there is a will, There is a way." # # Match the first substring with lowercase letters only # set result [regexp {[a-z]+} $sample match] puts "Result: $result match: $match" # # Match the first two words, the first one allows uppercase set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ] puts "Result: $result Match: $match 1: $sub1 2: $sub2" # # Replace a word # regsub "way" $sample "lawsuit" sample2 puts "New: $sample2" # # Use the -all option to count the number of "words" # puts "Number of words: [regexp -all {[^ ]} $sample]"