[ Team LiB ] Previous Section Next Section

The string Command

The string command is really a collection of operations you can perform on strings. The following example calculates the length of the value of a variable.

set name "Brent Welch"
string length $name
=> 11

The first argument to string determines the operation. You can ask string for valid operations by giving it a bad one:


string junk
=> bad option "junk": should be bytelength, compare, equal, first, index, is, last, length
graphics/ccc.gif, map, match, range, repeat, replace, tolower, totitle, toupper, trim, trimleft, trimright,
graphics/ccc.gif wordend, or wordstart

This trick of feeding a Tcl command bad arguments to find out its usage is common across many commands. Table 4-1 summarizes the string command.

Table 4-1. The string command

string bytelength str

Returns the number of bytes used to store a string, which may be different from the character length returned by string length because of UTF-8 encoding. See page 220 of Chapter 15 about Unicode and UTF-8.

string compare ?-nocase? ?-length len? str1 str2

Compares strings lexicographically. Use -nocase for case insensitive comparison. Use -length to limit the comparison to the first len characters. Returns 0 if equal, -1 if str1 sorts before str2, else 1.

string equal ?-nocase? str1 str2

Compares strings and returns 1 if they are the same. Use -nocase for case insensitive comparison.

string first subString string ?startIndex?

Returns the index in string of the first occurrence of subString, or -1 if string is not found. startIndex may be specified to start in the middle of string.

string index string index

Returns the character at the specified index. An index counts from zero. Use end for the last character.

string is class ?-strict? ?-failindex varname? string

Returns 1 if string belongs to class. If -strict, then empty strings never match, otherwise they always match. If -failindex is specified, then varname is assigned the index of the character in string that prevented it from being a member of class. See Table 4-3 on page 54 for character class names.

string last subString string ?startIndex?

Returns the index in string of the last occurrence of subString, or -1 if subString is not found. startIndex may be specified to start in the middle of string.

string length string

Returns the number of characters in string.

string map ?-nocase? charMap string

Returns a new string created by mapping characters in string according to the input, output list in charMap. See page 55.

string match ?-nocase? pattern str

Returns 1 if str matches the pattern, else 0. Glob-style matching is used. See page 53.

string range str i j

Returns the range of characters in str from i to j.

string repeat str count

Returns str repeated count times.

string replace str first last ?newstr?

Returns a new string created by replacing characters first through last with newstr, or nothing.

string tolower string ?first? ?last?

Returns string in lower case. first and last determine the range of string on which to operate.

string totitle string ?first? ?last?

Capitalizes string by replacing its first character with the Unicode title case, or upper case, and the rest with lower case. first and last determine the range of string on which to operate.

string toupper string ?first? ?last?

Returns string in upper case. first and last determine the range of string on which to operate.

string trim string ?chars?

Trims the characters in chars from both ends of string. chars defaults to whitespace.

string trimleft string ?chars?

Trims the characters in chars from the beginning of string. chars defaults to whitespace.

string trimright string ?chars?

Trims the characters in chars from the end of string. chars defaults to whitespace.

string wordend str ix

Returns the index in str of the character after the word containing the character at index ix.

string wordstart str ix

Returns the index in str of the first character in the word containing the character at index ix.

These are the string operations I use most:

  • The equal operation, which is shown in Example 4-2 on page 53.

  • String match. This pattern matching operation is described on page 53.

  • The tolower, totitle, and toupper operations convert case.

  • The trim, trimright, and trimleft operations are handy for cleaning up strings.

These new operations were added in Tcl 8.1 (actually, they first appeared in the 8.1.1 patch release):

  • The equal operation, which is simpler than using string compare.

  • The is operation that test for kinds of strings. String classes are listed in Table 4-3 on page 54.

  • The map operation that translates characters (e.g., like the Unix tr command.)

  • The repeat and replace operations.

  • The totitle operation, which is handy for capitalizing words.

String Indices

Several of the string operations involve string indices that are positions within a string. Tcl counts characters in strings starting with zero. The special index end is used to specify the last character in a string:

string range abcd 2 end
=> cd

Tcl 8.1 added syntax for specifying an index relative to the end. Specify end-N to get the Nth character before the end. For example, the following command returns a new string that drops the first and last characters from the original:

string range $string 1 end-1

There are several operations that pick apart strings: first, last, wordstart, wordend, index, and range. If you find yourself using combinations of these operations to pick apart data, it may be faster if you can do it with the regular expression pattern matcher described in Chapter 11.

Strings and Expressions

Strings can be compared with expr, if, and while using the comparison operators eq, ne, ==, !=, < and >. However, there are a number of subtle issues that can cause problems. First, you must quote the string value so that the expression parser can identify it as a string type. Then, you must group the expression with curly braces to prevent the double quotes from being stripped off by the main interpreter:

if {$x == "foo"} command

graphics/common_icon.gif

expr is only reliable for string comparison when using eq or ne.


Despite the quotes, the expression operators that work on numbers and strings first convert try converting items to numbers if possible, and then converts them back if it detects a case of string comparison. The conversion back is always done as a decimal number. This can lead to unexpected conversions between strings that look like hexadecimal or octal numbers. The following boolean expression is true!

if {"0xa" == "10"} { puts stdout ack! }
=> ack!

A safe way to compare strings is to use the string compare and string equal operations. The eq and ne expr operators were introduced in 8.4 to allow more compact strict string comparison. These operations also work faster because the unnecessary conversions are eliminated. Like the C library strcmp function, string compare returns 0 if the strings are equal, minus 1 if the first string is lexicographically less than the second, or 1 if the first string is greater than the second:

Example 4-1 Comparing strings with string compare
if {[string compare $s1 $s2] == 0} {
   # strings are equal
}

The string equal command added in Tcl 8.1 makes this simpler:

Example 4-2 Comparing strings with string equal
if {[string equal $s1 $s2]} {
   # strings are equal
}

The eq operator added in Tcl 8.4 is semantically equal, but more compact. It also avoids any internal format conversions. There is also a ne operator to efficiently test for inequality.

Example 4-3 Comparing strings with eq
if {$s1 eq $s2} {
   # strings are equal
}

String Matching

The string match command implements glob-style pattern matching that is modeled after the file name pattern matching done by various UNIX shells. The heritage of the word "glob" is rooted in UNIX, and Tcl preserves this historical oddity in the glob command that does pattern matching on file names. The glob command is described on page 122. Table 4-2 shows the three constructs used in string match patterns:

Table 4-2. Matching characters used with string match

*

Match any number of any characters.

?

Match exactly one character.

[chars]

Match any character in chars.

Any other characters in a pattern are taken as literals that must match the input exactly. The following example matches all strings that begin with a:

string match a* alpha
=> 1

To match all two-letter strings:

string match ?? XY
=> 1

To match all strings that begin with either a or b:

string match {[ab]*} cello
=> 0

Be careful! Square brackets are also special to the Tcl interpreter, so you will need to wrap the pattern up in curly braces to prevent it from being interpreted as a nested command. Another approach is to put the pattern into a variable:

set pat {[ab]*x}
string match $pat box
=> 1

You can specify a range of characters with the syntax [x-y]. For example, [a-z] represents the set of all lower-case letters, and [0-9] represents all the digits. You can include more than one range in a set. Any letter, digit, or the underscore is matched with:

string match {[a-zA-Z0-9_]} $char

The set matches only a single character. To match more complicated patterns, like one or more characters from a set, then you need to use regular expression matching, which is described on page 158.

If you need to include a literal *, ?, or bracket in your pattern, preface it with a backslash:

string match {*\?} what?
=> 1

In this case the pattern is quoted with curly braces because the Tcl interpreter is also doing backslash substitutions. Without the braces, you would have to use two backslashes. They are replaced with a single backslash by Tcl before string match is called.

string match *\\? what?

Character Classes

The string is command tests a string to see whether it belongs to a particular class. This is useful for input validation. For example, to make sure something is a number, you do:

if {![string is integer -strict $input]} {
    error "Invalid input. Please enter a number."
}

Classes are defined in terms of the Unicode character set, which means they are more general than specifying character sets with ranges over the ASCII encoding. For example, alpha includes many characters outside the range of [A-Za-z] because of different characters in other alphabets. The classes are listed in Table 4-3.

Table 4-3. Character class names

alnum

Any alphabet or digit character.

alpha

Any alphabet character.

ascii

Any character with a 7-bit character code (i.e., less than 128.)

boolean

A valid Tcl boolean value, such as 0, 1, true, false (in any case).

control

Character code less than 32, and not NULL.

digit

Any digit character.

double

A valid floating point number.

false

A valid Tcl boolean false value, such as 0 or false (in any case).

graph

Any printing characters, not including space characters.

integer

A valid integer.

lower

A string in all lower case.

print

A synonym for alnum.

punct

Any punctuation character.

space

Space, tab, newline, carriage return, vertical tab, backspace.

true

A valid Tcl boolean true value, such as 1 or true (in any case).

upper

A string all in upper case.

wordchar

Alphabet, digit, and the underscore.

xdigit

Valid hexadecimal digits.

Mapping Strings

The string map command translates a string based on a character map. The map is in the form of a input, output list. Wherever a string contains an input sequence, that is replaced with the corresponding output. For example:

string map {f p d l} food
=> pool

The inputs and outputs can be more than one character and they do not have to be the same length:

string map {f p d ll oo u} food
=> pull

Example 4-4 is more practical. It uses string map to replace fancy quotes and hyphens produced by Microsoft Word into ASCII equivalents. It uses the open, read, and close file operations that are described in Chapter 9, and the fconfigure command described on page 234 to ensure that the file format is UNIX friendly.

Example 4-4 Mapping Microsoft World special characters to ASCII
proc Dos2Unix {filename} {
   set input [open $filename]
   set output [open $filename.new]
   fconfigure $output -translation lf
   puts $output [string map {
      \223   "
      \224   "
      \222   '
      \226   -
   } [read $input]]
   close $input
   close $output
}
    [ Team LiB ] Previous Section Next Section