| [ Team LiB ] |
|
The string CommandThe string command is really a collection of operations you can perform on strings. The following example calculates the length of the value of a variable.
set name "Brent Welch"
string length $name
=> 11
The first argument to string determines the operation. You can ask string for valid operations by giving it a bad one: string junk => bad option "junk": should be bytelength, compare, equal, first, index, is, last, length This trick of feeding a Tcl command bad arguments to find out its usage is common across many commands. Table 4-1 summarizes the string command.
These are the string operations I use most:
These new operations were added in Tcl 8.1 (actually, they first appeared in the 8.1.1 patch release):
String IndicesSeveral of the string operations involve string indices that are positions within a string. Tcl counts characters in strings starting with zero. The special index end is used to specify the last character in a string:
string range abcd 2 end
=> cd
Tcl 8.1 added syntax for specifying an index relative to the end. Specify end-N to get the Nth character before the end. For example, the following command returns a new string that drops the first and last characters from the original: string range $string 1 end-1 There are several operations that pick apart strings: first, last, wordstart, wordend, index, and range. If you find yourself using combinations of these operations to pick apart data, it may be faster if you can do it with the regular expression pattern matcher described in Chapter 11. Strings and ExpressionsStrings can be compared with expr, if, and while using the comparison operators eq, ne, ==, !=, < and >. However, there are a number of subtle issues that can cause problems. First, you must quote the string value so that the expression parser can identify it as a string type. Then, you must group the expression with curly braces to prevent the double quotes from being stripped off by the main interpreter:
if {$x == "foo"} command
Despite the quotes, the expression operators that work on numbers and strings first convert try converting items to numbers if possible, and then converts them back if it detects a case of string comparison. The conversion back is always done as a decimal number. This can lead to unexpected conversions between strings that look like hexadecimal or octal numbers. The following boolean expression is true!
if {"0xa" == "10"} { puts stdout ack! }
=> ack!
A safe way to compare strings is to use the string compare and string equal operations. The eq and ne expr operators were introduced in 8.4 to allow more compact strict string comparison. These operations also work faster because the unnecessary conversions are eliminated. Like the C library strcmp function, string compare returns 0 if the strings are equal, minus 1 if the first string is lexicographically less than the second, or 1 if the first string is greater than the second: Example 4-1 Comparing strings with string compare
if {[string compare $s1 $s2] == 0} {
# strings are equal
}
The string equal command added in Tcl 8.1 makes this simpler: Example 4-2 Comparing strings with string equal
if {[string equal $s1 $s2]} {
# strings are equal
}
The eq operator added in Tcl 8.4 is semantically equal, but more compact. It also avoids any internal format conversions. There is also a ne operator to efficiently test for inequality. Example 4-3 Comparing strings with eq
if {$s1 eq $s2} {
# strings are equal
}
String MatchingThe string match command implements glob-style pattern matching that is modeled after the file name pattern matching done by various UNIX shells. The heritage of the word "glob" is rooted in UNIX, and Tcl preserves this historical oddity in the glob command that does pattern matching on file names. The glob command is described on page 122. Table 4-2 shows the three constructs used in string match patterns:
Any other characters in a pattern are taken as literals that must match the input exactly. The following example matches all strings that begin with a:
string match a* alpha
=> 1
To match all two-letter strings:
string match ?? XY
=> 1
To match all strings that begin with either a or b:
string match {[ab]*} cello
=> 0
Be careful! Square brackets are also special to the Tcl interpreter, so you will need to wrap the pattern up in curly braces to prevent it from being interpreted as a nested command. Another approach is to put the pattern into a variable:
set pat {[ab]*x}
string match $pat box
=> 1
You can specify a range of characters with the syntax [x-y]. For example, [a-z] represents the set of all lower-case letters, and [0-9] represents all the digits. You can include more than one range in a set. Any letter, digit, or the underscore is matched with:
string match {[a-zA-Z0-9_]} $char
The set matches only a single character. To match more complicated patterns, like one or more characters from a set, then you need to use regular expression matching, which is described on page 158. If you need to include a literal *, ?, or bracket in your pattern, preface it with a backslash:
string match {*\?} what?
=> 1
In this case the pattern is quoted with curly braces because the Tcl interpreter is also doing backslash substitutions. Without the braces, you would have to use two backslashes. They are replaced with a single backslash by Tcl before string match is called. string match *\\? what? Character ClassesThe string is command tests a string to see whether it belongs to a particular class. This is useful for input validation. For example, to make sure something is a number, you do:
if {![string is integer -strict $input]} {
error "Invalid input. Please enter a number."
}
Classes are defined in terms of the Unicode character set, which means they are more general than specifying character sets with ranges over the ASCII encoding. For example, alpha includes many characters outside the range of [A-Za-z] because of different characters in other alphabets. The classes are listed in Table 4-3.
Mapping StringsThe string map command translates a string based on a character map. The map is in the form of a input, output list. Wherever a string contains an input sequence, that is replaced with the corresponding output. For example:
string map {f p d l} food
=> pool
The inputs and outputs can be more than one character and they do not have to be the same length:
string map {f p d ll oo u} food
=> pull
Example 4-4 is more practical. It uses string map to replace fancy quotes and hyphens produced by Microsoft Word into ASCII equivalents. It uses the open, read, and close file operations that are described in Chapter 9, and the fconfigure command described on page 234 to ensure that the file format is UNIX friendly. Example 4-4 Mapping Microsoft World special characters to ASCII
proc Dos2Unix {filename} {
set input [open $filename]
set output [open $filename.new]
fconfigure $output -translation lf
puts $output [string map {
\223 "
\224 "
\222 '
\226 -
} [read $input]]
close $input
close $output
}
|
| [ Team LiB ] |
|