Building Data Structures with Arrays

This section describes several data structures you can build with Tcl arrays. These examples are presented as procedures that implement access functions to the data structure. Wrapping up your data structures in procedures is good practice. It shields the user of your data structure from the details of its implementation.

graphics/common_icon.gif
Use arrays to collect related variables.

A good use for arrays is to collect together a set of related variables for a module, much as one would use a record in other languages. By collecting these together in an array that has the same name as the module, name conflicts between different modules are avoided. Also, in each of the module's procedures, a single global statement will suffice to make all the state variables visible. You can also use upvar to manage a collection of arrays, as shown in Example 8-9 on page 101.

Simple Records

Suppose we have a database of information about people. The following examples show three different ways to store the employee name, ID, manager, and phone number. Each example implements Emp_AddRecord that stores the values, and one example accessor function that returns information about the employee (e.g., Emp_Manager.) By using simple procedures to return fields of the record, the implementation is hidden so that you can change it more easily. Example 8-5 uses on array for each field. The name of the person is the index into each array:

Example 8-5 Using arrays for records, version 1

proc Emp_AddRecord {id name manager phone} {
   global employeeID employeeManager \
      employeePhone employeeName
   set employeeID($name) $id
   set employeeManager($name) $manager
   set employeePhone($name) $phone
   set employeeName($id) $name
}
proc Emp_Manager {name} {
   global employeeManager
   return $employeeManager($name)
}

The employeeName array provides a secondary key. It maps from the employee ID to the name so that the other information can be obtained if you have an ID instead of a name. Example 8-6 implements the same little database using a single array with more complex indices:

Example 8-6 Using arrays for records, version 2

proc Emp_AddRecord {id name manager phone} {
   global employee
   set employee(id,$name) $id
   set employee(manager,$name) $manager
   set employee(phone,$name) $phone
   set employee(name,$id) $name
}
proc Emp_Manager {name} {
   global employee
   return $employee(manager,$name)
}

Example 8-7 shows the last approach. Each array element is a list of fields, and the accessor functions hide the lindex command used to pick out the right field. Here the cross referencing by ID is implement differently. If we can assume that names and IDs are distinct, we can keep the cross reference in the same array:

Example 8-7 Using arrays for records, version 3

proc Emp_AddRecord {id name manager phone} {
   global employee
   set employee($name) [list $name $id $manager $phone]
   set employee($id) $name
}
proc Emp_Manager {name} {
   global employee
   return [lindex $employee($name) 2]
}

The difference between these three approaches is partly a matter of taste. Using a single array can be more convenient because there are fewer variables to manage. Using the lists for the fields is probably the most space efficient because there are fewer elements in the array, but maintaining the lindex offsets is tedious. In any case, you should hide the implementation in a small set of procedures.

A Stack

A stack can be implemented with either a list or an array. If you use a list, then the push and pop operations have a runtime cost that is proportional to the size of the stack. If the stack has a few elements this is fine. If there are a lot of items in a stack, you may wish to use arrays instead.

Example 8-8 Using a list to implement a stack

proc Push { stack value } {
   upvar $stack list
   lappend list $value
}
proc Pop { stack } {
   upvar $stack list
   set value [lindex $list end]
   set list [lrange $list 0 [expr [llength $list]-2]]
   return $value
}

In these examples, the name of the stack is a parameter, and upvar is used to convert that into the data used for the stack. The variable is a list in Example 8-8 and an array in Example 8-9. The user of the stack module does not have to know.

The array implementation of a stack uses one array element to record the number of items in the stack. The other elements of the array have the stack values. The Push and Pop procedures both guard against a nonexistent array with the info exists command. When the first assignment to S(top) is done by Push, the array variable is created in the caller's scope. The example uses array indices in two ways. The top index records the depth of the stack. The other indices are numbers, so the construct $S($S(top)) is used to reference the top of the stack.

Example 8-9 Using an array to implement a stack

proc Push { stack value } {
   upvar $stack S
   if {![info exists S(top)]} {
      set S(top) 0
   }
   set S($S(top)) $value
   incr S(top)
}
proc Pop { stack } {
   upvar $stack S
   if {![info exists S(top)]} {
      return {}
   }
   if {$S(top) == 0} {
      return {}
   } else {
      incr S(top) -1
      set x $S($S(top))
      unset S($S(top))
      return $x
   }
}

A List of Arrays

Suppose you have many arrays, each of which stores some data, and you want to maintain an overall ordering among the data sets. One approach is to keep a Tcl list with the name of each array in order. Example 8-10 defines RecordInsert to add an array to the list, and an iterator function, RecordIterate, that applies a script to each array in order. The iterator uses upvar to make data an alias for the current array. The script is executed with eval, which is described in detail in Chapter 10. The Tcl commands in script can reference the arrays with the name data:

Example 8-10 A list of arrays

proc RecordAppend {listName arrayName} {
   upvar $listName list
   lappend list $arrayName
}
proc RecordIterate {listName script} {
   upvar $listName list
   foreach arrayName $list {
      upvar #0 $arrayName data
      eval $script
   }
}

Another way to implement this list-of-records structure is to keep references to the arrays that come before and after each record. Example 8-11 shows the insert function and the iterator function when using this approach. Once again, upvar is used to set up data as an alias for the current array in the iterator. In this case, the loop is terminated by testing for the existence of the next array. It is perfectly all right to make an alias with upvar to a nonexistent variable. It is also all right to change the target of the upvar alias. One detail that is missing from the example is the initialization of the very first record so that its next element is the empty string:

Example 8-11 A list of arrays

proc RecordInsert {recName afterThis} {
   upvar $recName record $afterThis after
   set record(next) $after(next)
   set after(next) $recName
}
proc RecordIterate {firstRecord body} {
   upvar #0 $firstRecord data
   while {[info exists data]} {
      eval $body
      upvar #0 $data(next) data
   }
}

A Simple In-Memory Database

Suppose you have to manage a lot of records, each of which contain a large chunk of data and one or more key values you use to look up those values. The procedure to add a record is called like this:

Db_Insert keylist datablob

The datablob might be a name, value list suitable for passing to array set, or simply a large chunk of text or binary data. One implementation of Db_Insert might just be:

foreach key $keylist {
    lappend Db($key) $datablob
}

The problem with this approach is that it duplicates the data chunks under each key. A better approach is to use two arrays. One stores all the data chunks under a simple ID that is generated automatically. The other array stores the association between the keys and the data chunks. Example 8-12, which uses the namespace syntax described in Chapter 14, illustrates this approach. The example also shows how you can easily dump data structures by writing array set commands to a file, and then load them later with a source command:

Example 8-12 A simple in-memory database

namespace eval db {
   variable data       ;# Array of data blobs
   variable uid 0      ;# Index into data
   variable index      ;# Cross references into data
}
proc db::insert {keylist datablob} {
   variable data
   variable uid
   variable index
   set data([incr uid]) $datablob
   foreach key $keylist {
      lappend index($key) $uid
   }
}
proc db::get {key} {
   variable data
   variable index
   set result {}
   if {![info exist index($key)]} {
      return {}
   }
   foreach uid $index($key) {
      lappend result $data($uid)
   }
   return $result
}
proc db::save {filename} {
   variable uid
   set out [open $filename w]
   puts $out [list namespace eval db \
      [list variable uid $uid]]
   puts $out [list array set db::data [array get db::data]]
   puts $out [list array set db::index [array get db::index]]
   close $out
}
proc db::load {filename} {
   source $filename
}

Alternatives to Using Arrays

While Tcl arrays are flexible and general purpose, they are not always the best solution to your data structure problems. If you find yourself building elaborate data structures, you should consider implementing a C library to encapsulate the data structure and expose it to the scripting level with Tcl commands. For example, Chapter 47 implements a blob data structure in C. You can also use the SWIG code generator can quickly generate a Tcl command interface for a C API. Find out about SWIG at http://www.swig.org.

The Metakit embedded database provides an efficient, easy, scriptable database for Tcl. It is more powerful than the simple "flat file" databases implemented in this Chapter, but it is not a full SQL database. It is part of Tclkit, or you can use it with the mk4tcl extension. Tclkit and Metakit are described in Chapter 22.

[ Team LiB ]