[ Team LiB ] Previous Section Next Section

Syntax Summary

Table 11-1 summarizes the syntax of regular expressions available in all versions of Tcl:

Table 11-1. Basic regular expression syntax

.

Matches any character.

*

Matches zero or more instances of the previous pattern item.

+

Matches one or more instances of the previous pattern item.

?

Matches zero or one instances of the previous pattern item.

( )

Groups a subpattern. The repetition and alternation operators apply to the preceding subpattern.

|

Alternation.

[ ]

Delimit a set of characters. Ranges are specified as [x-y]. If the first character in the set is ^, then there is a match if the remaining characters in the set are not present.

^

Anchor the pattern to the beginning of the string. Only when first.

$

Anchor the pattern to the end of the string. Only when last.

Advanced regular expressions, which were introduced in Tcl 8.1, add more syntax that is summarized in Table 11-2:

Table 11-2. Additional advanced regular expression syntax

{m}

Matches m instances of the previous pattern item.

{m}?

Matches m instances of the previous pattern item. Nongreedy.

{m,}

Matches m or more instances of the previous pattern item.

{m,}?

Matches m or more instances of the previous pattern item. Nongreedy.

{m,n}

Matches m through n instances of the previous pattern item.

{m,n}?

Matches m through n instances of the previous pattern item. Nongreedy.

*?

Matches zero or more instances of the previous pattern item. Nongreedy.

+?

Matches one or more instances of the previous pattern item. Nongreedy.

??

Matches zero or one instances of the previous pattern item. Nongreedy.

(?:re)

Groups a subpattern, re, but does not capture the result.

(?=re)

Positive look-ahead. Matches the point where re begins.

(?!re)

Negative look-ahead. Matches the point where re does not begin.

(?abc)

Embedded options, where abc is any number of option letters listed in Table 11-5.

\c

One of many backslash escapes listed in Table 11-4.

[: :]

Delimits a character class within a bracketed expression. See Table 11-3.

[. .]

Delimits a collating element within a bracketed expression.

[= =]

Delimits an equivalence class within a bracketed expression.

Table 11-3 lists the named character classes defined in advanced regular expressions and their associated backslash sequences, if any. Character class names are valid inside bracketed character sets with the [:class:] syntax.

Table 11-3. Character classes

alnum

Upper and lower case letters and digits.

alpha

Upper and lower case letters.

blank

Space and tab.

cntrl

Control characters: \u0001 through \u001F.

digit

The digits zero through nine. Also \d.

graph

Printing characters that are not in cntrl or space.

lower

Lowercase letters.

print

The same as alnum.

punct

Punctuation characters.

space

Space, newline, carriage return, tab, vertical tab, form feed. Also \s.

upper

Uppercase letters.

xdigit

Hexadecimal digits: zero through nine, a-f, A-F.

Table 11-4 lists backslash sequences supported in Tcl 8.1.

Table 11-4. Backslash escapes in regular expressions

\a

Alert, or "bell", character.

\A

Matches only at the beginning of the string.

\b

Backspace character, \u0008.

\B

Synonym for backslash.

\cX

Control-X.

\d

Digits. Same as [[:digit:]]

\D

Not a digit. Same as [^[:digit:]]

\e

Escape character, \u001B.

\f

Form feed, \u000C.

\m

Matches the beginning of a word.

\M

Matches the end of a word.

\n

Newline, \u000A.

\r

Carriage return, \u000D.

\s

Space. Same as [[:space:]]

\S

Not a space. Same as [^[:space:]]

\t

Horizontal tab, \u0009.

\uXXXX

A 16-bit Unicode character code.

\v

Vertical tab, \u000B.

\w

Letters, digit, and underscore. Same as [[:alnum:]_]

\W

Not a letter, digit, or underscore. Same as [^[:alnum:]_]

\xhh

An 8-bit hexadecimal character code. Consumes all hex digits after \x.

\y

Matches the beginning or end of a word.

\Y

Matches a point that is not the beginning or end of a word.

\Z

Matches the end of the string.

\0

NULL, \u0000

\x

Where x is a digit, this is a back-reference.

\xy

Where x and y are digits, either a decimal back-reference, or an 8-bit octal character code.

\xyz

Where x, y and z are digits, either a decimal back-reference or an 8-bit octal character code.

Table 11-5 lists the embedded option characters used with the (?abc) syntax.

Table 11-5. Embedded option characters used with the (?x) syntax

b

The rest of the pattern is a basic regular expression (a la vi or grep).

c

Case sensitive matching. This is the default.

e

The rest of the pattern is an extended regular expression (a la Tcl 8.0).

i

Case insensitive matching.

m

Synonym for the n option.

n

Newline sensitive matching . Both lineanchor and linestop mode.

p

Partial newline sensitive matching. Only linestop mode.

q

The rest of the pattern is a literal string.

s

No newline sensitivity. This is the default.

t

Tight syntax; no embedded comments. This is the default.

w

Inverse partial newline-sensitive matching. Only lineanchor mode.

x

Expanded syntax with embedded white space and comments.

    [ Team LiB ] Previous Section Next Section