ISEARCH TUTORIAL
The Complete Guide to Using and Configuring the CNIDR 
Isearch System with Examples, Advice, and Editorial 
Comment
In the beginning, there was "grep".
Grep was good, but it lacked subtlety. It lacked speed. And while grep was cheap and hence 
widely used, it wasn't a text searching system.
It wasn't even a text searching program.
It was a regular expression matching program. It didn't sort or rank or try to get along well with 
others.
So the computer scientists of the world created Information Retrieval. It wasn't a product, it was a 
subject. They wrote many papers.
From the Information Retrieval people (who now called themselves "IR" so their colleagues 
would think they were developing remote controls for Tvs) came text search systems. Systems 
such as "SMART". Systems such as "WAIS".
WAIS was written by a guy who worked for a supercomputer company. He wanted to create an 
application for those big, hulking machines. He gave away a watered-down version of his program 
so people would see how nice a full-strength version of it would be on big, hulking machines. The 
supercomputer company that he worked for was going bankrupt, so he took his WAIS and he 
started his own company to sell WAIS.
The National Science Foundation saw hope in the free version of WAIS, so they worked with 
MCNC (it doesn't stand for anything, they're just MCNC) to create CNIDR (and it does stand for 
something: Truth, Justice, and the... no, actually, it used to stand for the Clearinghouse for 
Networked Information Discovery and Retrieval, but lately they've changed the "Clearinghouse" to 
"Center" since that's easier to type with one hand). CNIDR begat freeWAIS, which was the old, 
free WAIS (hence the name) with a series of enhancements.
But freeWAIS had problems. At its heart was the crippled stump of a search system. Small desktop 
machines had caught up with the old, hulking machines, but freeWAIS still only handled small 
collections. CNIDR said, "This sucks." I know, I heard them. So they created a new text search 
system from the ground up. And they called it Isite. Isearch is the part of Isite that actually does the 
searching.
And it was good.
 
ISEARCH TUTORIAL TABLE OF 
CONTENTS
1. A Quick Example of Isearch Use.
2. Indexing Collections.
2.1 Arranging the Text.
2.2 Running the Indexer.
2.3 Deciding on Doctypes.
3. Searching.
3.1 Simple Searches.
3.2 Searching in Subfields.
3.3 Boolean Searches.
3.4 Notes on Ranking.
3.5 Wildcards
3.6 Prefixes and Suffixes
4. Maintaining Your Data.
4.1 Removing old Files.
4.2 Adding new Files.
4.3 Moving Files Around.
4.4 Viewing Information.
5. Performance Issues.
5.1 Location of Files for Speed.
5.2 How Much RAM is Enough?
5.3 CPU Load vs. User Skill
 
1. A Quick Example of Isearch Use.
This section assumes that Isearch has already been installed. We'll assume that Isearch has been 
installed in the directory /local/project/Isearch-1.09.09. Naturally, that name will change 
depending on the version number and the preferences of your site, so remember to substitute your 
own directory name. If you haven't installed Isearch yet, see the file "QuickStart" in the Isearch 
documentation directory.
In this example, we'll index two text files from the Isearch distribution, "COPYRIGHT" and 
"README". We only picked these files because we know everyone has them. Create a directory 
for the textbase:
sti-gw% mkdir /local/text/example1
sti-gw% cp COPYRIGHT README /local/text/example1
 
Now create a directory to hold the indexes:
 
sti-gw% mkdir /local/indexes/example1
 
Now we can index the text:
sti-gw% Iindex -d /local/indexes/example1/EX1 /local/text/example1/*
Iindex 1.20
Building document list ...
Building database /local/indexes/example1/EX1:
   Parsing files ...
   Parsing /local/text/example1/COPYRIGHT ...
   Parsing /local/text/example1/README ...
   Indexing 1004 words ...
   Merging index ...
Database files saved to disk.
sti-gw% ls -l /local/indexes/example1/
total 16
-rw-r--r--   1 escott   staff         85 Apr 11 12:24 EX1.dbi
-rw-r--r--   1 escott   staff       4016 Apr 11 12:24 EX1.inx
-rw-r--r--   1 escott   staff         24 Apr 11 12:24 EX1.mdg
-rw-r--r--   1 escott   staff         40 Apr 11 12:24 EX1.mdk
-rw-r--r--   1 escott   staff        520 Apr 11 12:24 EX1.mdt
 
That indexed the text in /local/text/example1 and created indexes in /local/indexes/example1. Now 
we can use those indexes to search for text:
Isearch -d /local/indexes/example1/EX1 fee
Isearch 1.20
Searching database /local/indexes/example1/EX1:

1 document(s) matched your query, 1 document(s) displayed.

      Score   File
   1.   100   /local/text/example1/COPYRIGHT

Select file #: 
 
The word "fee" occurs only in the file COPYRIGHT.
2. Indexing Collections.
This section will discuss the issues surrounding indexing of text with Isearch, including how to 
arrange the text, decide on a doctype to use, and actually run the indexer.
2.1 Arranging the Text.
Arranging the text to be indexed is fairly straightforward. Good practice dictates that the textbase 
be given its own directory hierarchy to make maintenance simple. It's also a good idea to not mix 
up the text files with the indexes. Maintenance is simpler when they are separate, and there is also 
some performance gain to be realized when the indexes and text files are on separate disks.
The Iindex command will need to be given a list of files to index and directories to traverse. If you 
give Iindex the "-r" option then it will recursively plunge headlong into subdirectories indexing 
everything it can find. For simple collections this is probably a good thing, but for more subtle 
applications you'll probably want to do the filespace traversal yourself. This is especially true if 
you are using some kind of version control system like SCCS or RCS.
Unix filesystems impose virtually no penalty for using subdirectories, so feel free to impose quite 
a bit of organization on your tree of text.
Note that if you have a huge number of small files, you'll want to either use a lot of subdirectories 
or else combine the small files into larger ones. Consider the case of 100,000 files of one line 
each (an extreme example, but we see it all the time). If you put 100,000 files into one Unix 
directory, file access will be incredibly slow. Consider one of two solutions:
1) Create 100 subdirectories and put 1000 files into each subdirectory.
-or-
2) Concatenate all the files into one 100,000 line file. Index this file using the "ONELINE" doctype 
(discussed in Section 2.3).
The second solution is strongly preferred.
Do not put text or index files into a network-mounted filesystem. These files will be hit hard and 
hit often, and can bring a network to its knees. Don't let vendor claims of caching performance fool 
you either: Isearch and AFS (or DFS) do not get along well (the files are accessed in ways 
generally guaranteed to *not* be cached). So far, experiments with using CacheFS to back up 
accesses to NFS 3.0 mounted partitions hasn't been very promising either. Cheap SCSI disks are 
below 30 cents per megabyte now. Consider dedicating two spindles, preferably on separate SCSI 
controllers, to Isearch alone and you'll see a lot better performance.
2.2 Running the Indexer.
Iindex takes several command line options. Here is a list of the flags along with a brief discussion 
of each:
-d (X) 	This option specifies the name of the index. The index name doesn't have to have 
anything to do with the textbase, but it's a good idea to make it descriptive. For example, if 
you have a company phone book you want to make searchable, it would be a good idea to 
use a name like "PHONENUMBERS" or "DIRECTORY" instead of a name like "INDEX". 
Index names may have mixed upper and lower case, as long as you use the cases 
consistently. The indexes "Phone" and "phone" refer to entirely separate collections. The 
index names may also (and very often will) contain path names to point to a directory. An 
option like "-d /local/index/CompanyPhoneBook" is typical. This option is required for 
every Iindex command.
-a 	This option means "add to an existing textbase". Use this if you want to add a few 
files later on. Note that you should use this sparingly, since it can be a little slow. Also 
avoid adding files one at a time if possible. It's a lot better to add a whole bunch at once 
like:
Iindex -d INDEXNAME -a newFile1 newFile2 newFile3
instead of adding the files one command at a time.
-m (X) 	This option tells Iindex how many megabytes of text to read and index in 
memory at once. It doesn't tell Iindex how many megabytes total to use, so this figure 
should generally be about 4 megabytes less than the total amount of memory for an 
otherwise quiescent system. If you're indexing 5.5 megabytes of text, use "-m 6". It's a good 
idea from a performance standpoint to have at least as much RAM available as the size of 
the textbase to be indexed. Searching is almost independent of the amount of RAM 
available, but indexing needs lots and lots of memory to go quickly. Note that this means 
you can lease/borrow/steal time on a Cray CS6400 with 4 gigs of RAM for indexing and 
then use those indexes on a modest SparcStation 5 for searching if you have to. We've done 
it. If you're short on RAM for indexing, then there will be a lot of disk activity while 
smaller indexes are merged.
-s (X) 	This option informs Iindex that you have multiple logical documents per physical 
file, and that they are separated by something. Consider this example file:
Hi, I'm the first document.
###
Greetings to all from the second document.
 
If you index with:
Iindex -d mySeparatorExample -s "###" exampleFile
then you'll have two logical documents. A search for "greetings" will match the second 
paragraph, but not the first.
Use of the -s option can give you, in essence, a quick way to fake having new doctypes.
-t (X) 	This option tells Iindex what doctype to use when indexing the files. A doctype is a 
module that explains how to find logical documents and subfields within those documents. 
It also has code to display documents that it finds. For example, the "ONELINE" doctype 
tells Iindex that the file consists of a bunch of single line logical documents. The "PARA" 
doctype says that each paragraph is a document. The "SGMLTAG" doctype tells Isearch 
how to find the "<title>" fields and so forth inside an SGML document.
If you don't specify a doctype, then the "SIMPLE" doctype is used by default. SIMPLE 
doesn't really do much; it assumes that there is one document per file, no fields, and that 
presentation is handled by just dumping the contents of the file.
For more information, see the files "dtconf.inf", "BSn.doc", and "STI.doc" in the doctype 
directory.
-f (X) 	This option causes Iindex to read a list of file names to be indexed from a file. For 
example:
ls /local/text/example2 > myListOfFiles
Iindex -d exampleIndex -f myListOfFiles
causes Iindex to read the file "myListOfFiles" and then index every file it sees in there. 
This is very useful for huge lists of files. By default, older Unix systems can't pass 
command lines more than 10240 characters long. If you have several thousand files to 
index, the command line could quickly become too long.
(Note to the confused: No, you would never type a line that long. But consider how big a 
command line can get through filename expansion with wildcards. Try this:
echo /usr/man/*/* | wc
the wordcount utility reports that "echo /usr/man/*/* expanded into a line 122634 
characters long.
(Subnote to the cluefull: So, if that command expanded to 122K, how could it be passed to 
"echo" so I could wordcount it? Simple. I used Solaris 2.5, which allows 1 megabyte 
command lines. Nonetheless, there are a lot of Ultrix boxes still in use so it's worth 
noting.)).
-r 	This section says to recursively descend into subdirectories. If you have a whole 
tree of text then you can use this option and just give the top level directory name to index. 
Iindex will do the rest. Do not use this option if you're using SCCS or RCS. Iindex will 
blithely index the SCCS "s." files and will plunge into RCS directories and will just 
generally not do what you expected. How to escape this dilemma? Use "find". Find is your 
friend.
Iindex -d watchThisItsCool `find /deeptree -t f \! -name "s.*""`
will cause Iindex to index all files below "/local/text/deeptree" except for those that begin 
with "s." (SCCS files).
If you're using the "sccs" convenience shell, then you'll want to ignore "SCCS" directories.
Find is such a powerful command that you should spend a while getting to know it better if 
you haven't already. It especially useful for maintaining collections of text by doing things 
like weeding out old versions of files automatically.
-o (X) 	The -o option is used to pass information to specific doctypes. Each doctype treats 
this option differently (if at all) so check the documentation for your specific doctype 
before worrying about this.
Once you've decided what options to use, let Iindex bang away creating indexes. Don't be 
surprised if this takes a long time, especially if you have more text than will fit into RAM.
2.3 Deciding on Doctypes.
Doctypes are modules that explains how to find logical documents and fields within those 
documents. It also has code to display documents that it finds. 
The default doctype is "SIMPLE". It tells Iindex that there is one logical document per file, that 
there are no fields to index separately, and that when displaying these documents they will just be 
dumped out without any formatting. SIMPLE really is the simplest doctype you can have.
The other CNIDR-supplied doctype is "SGMLTAG". This doctype will make one logical 
document per file, will index text occurring inside of SGML-like markup pairs as fields, and will 
present the text by dumping it out without a lot of special formatting. This doctype can be used to 
index HTML web pages for making searchable WWW sites.
There are many other doctypes available. See the files "BSn.doc" and "STI.doc" in the doctype 
directory for more information. New doctypes are being added all the time, so look for other 
"*.doc" files in that directory.
3. Searching.
This section describes the "Isearch" command itself. Isearch is used to search through the text 
collection after it has been indexed with Iindex.
3.1 Simple Searches.
The simplest case searches in Isearch are to look for either one word or any of a list of words in 
any location within every file in the collection. Consider searching for "poetry" or "rhyme" or 
"verse" in a collection of journal articles:
 
Isearch -d MLAabstracts poetry rhyme verse
 
This will display a list of matching items. The items will have their "headlines" displayed. To 
select one for viewing, enter the number of the item and press "Return". To exit, press "Return" by 
itself.
Note that adding more search terms will generally cause more items to be returned, rather than 
fewer. This is counterintuitive, but isn't the big problem that it seems to be. Isearch has a very 
sophisticated ranking technique that will try to decide what documents are most likely to be useful 
and will return them at the top of the list. This ranking method actually works better for longer 
queries than for short ones.
3.2 Searching in Subfields.
Consider the following HTML text:
<title> Cool Page </title>
This is my excellent Web Page.
 
If you index this text with the SGMLTAG doctype, then you can search for words that occur 
anywhere in the document, or you can search for words that only occur in the title (inside the 
<title> and </title> markers).
For example:
sti-gw% Iindex -d why -t sgmltag /local/text/testfile
Iindex 1.20
Building document list ...
Building database why:
   Parsing files ...
   Parsing /local/text/testfile ...
   Indexing 7 words ...
   Merging index ...
Database files saved to disk.

sti-gw% Isearch -d why web
Isearch 1.20
Searching database why:

1 document(s) matched your query, 1 document(s) displayed.

      Score   File
   1.   100   /local/text/testfile
 Cool Page 

Select file #: 
sti-gw% Isearch -d why title/web
Isearch 1.10
Searching database why:

0 document(s) matched your query, 0 document(s) displayed.
 
In the first example, we searched for "web", and lo and behold we found it. In the second example 
we looked for "title/web", which means "find me occurrences of 'web' inside the 'title' field". 
There aren't any of those, so Isearch says there were no matching documents.
3.3 Boolean Searches.
This discussion of boolean searches assumes you have at least version 1.10.01. If you don't, then 
the brief section of "rpn" queries will apply to you but not the rest of this section. And if you aren't 
using at least 1.10.01 then I strongly urge you to upgrade because you're living with too many bugs.
3.3.1 "Infix" Notation Boolean Searches
Infix notation is the old familiar notation you remember from Cobol, Fortran, C, and TI calculators.
Reusing our example from section 3.2, we still have the Cool Page document. To search for a 
document that contained both "cool" and "page", we could use a query like:
 
sti-gw% Isearch -d why -infix cool and page
 
To search for a document with either cool or page (or even both) we can use:
 
sti-gw% Isearch -d why -infix cool or page
 
To search for documents that contain "cool" but not "page", we can say:
 
sti-gw% Isearch -d why -infix cool andnot page
 
Note that "andnot" is one word.
Note further than you can say "&&" instead of "and", "||" instead of "or", and "&!" instead of 
"andnot". This is handy for recovering C programmers.
You can group things with parenthesis to make more complicated queries, but remember that most 
Unix shells place special meaning in parenthesis. Protect them with quotation marks like this:
 
sti-gw% Isearch -d why -infix "(page or doc)" and "(cool or cold)"
3.3.2 "RPN" Notation Boolean Searches
RPN notation is the old familiar notation you remember from Forth, HP calculators, and your 
compiler writing class.
You probably shouldn't be using RPN unless you know what you're doing, so here's that last query 
from the previous section in rpn:
 
sti-gw% Isearch -d why -rpn page doc or cool cold or and
 
In a nutshell, you "push" search terms onto a stack and then feed enough operators to that stack to 
pop them all off. What could be more obvious?
3.4 Notes on Ranking.
In sections above, we have alluded to a sophisticated ranking algorithm. We use an algorithm 
developed by Gerald Salton for the SMART retrieval system. Without going into great detail, both 
the search terms and the resulting documents are scored. Documents that contain more instances of 
higher scoring words gets ranked higher.
What makes a high-scoring word? Basically, the fewer documents the word occurs in, the more 
important it must be. This is often bogus, but more often it's useful.
A high-scoring document is one that contains a lot of instances of search terms.
To combine the scores, the document and the search term scores are multiplied for each matching 
search term. These products are totaled over the set of all the search terms. Those who are good at 
math will recognize this as a "dot product". Finally, the scores are normalized so the highest score 
is 1.0.
Generally speaking, using a search word that appears in nearly every document won't affect the 
ranking of results. It will, however, slow down the query tremendously so it still isn't a good idea.
While we're on the subject, it's possible manually alter the search term weightings, albeit crudely. 
Consider this case:
 
sti-gw% Isearch -d why cool:5 page
 
This tells Isearch to look for "cool" or "page", but give "cool" five times the normal weight. The 
following example does the opposite:
 
sti-gw% Isearch -d why cool:-5 page
 
This tells Isearch to find documents that contain "page", but if they contain "cool" then subtract the 
value of that match five times. In other words, "I really want to see documents with "page" in them, 
but if they also contain "cool" then rank them really low". This is more useful than boolean queries 
with "andnot". Consider a set of documents on, say, databases. You might want a query like:
 
sti-gw% Isearch -d databases OODBMS andnot RDBMS
 
Problem is, if there is a good article on OODBMSes that just happens to say "Unlike 
braindamaged RDBMSes, our system can solve world hunger" then the above query will skip that 
document. Using a negative term weight will push that document down on the list, but at least it 
will still be there.
3.5 Wildcards
Isearch supports a limited wildcard facility. You can append an asterix to the end of a search term 
to match any word that begins with a given prefix. For example:
 
sti-gw% Isearch -d diseases chol*
 
will find any document with a word that begins with "chol", including "cholesterol", and yes, even, 
"cholesteral". This is useful for finding misspelled words in addition to its obvious uses.
3.6 Prefixes and Suffixes
Isearch will allow you to slightly change the output when it prints a document. The following query 
shows this:
sti-gw% Isearch -d why -prefix "<large>" -suffix "</large>" cool
Isearch 1.20
Searching database why:

1 document(s) matched your query, 1 document(s) displayed.

      Score   File
   1.   100   /local/text/testfile
 Cool Page 

Select file #: 1
<title> <large>Cool</large> Page </title>
This is my excellent Web Page.
 
The -prefix option lets you specify a string to be printed immediately before a word that matches. 
The -suffix option is for a string to be printed after the word. Note the SGML markup on either 
side of "Cool" in the above example.
NOTE: If you're going to specify SGML or HTML on a command line, protect it with quotation 
marks. Otherwise, the arrows will be interpreted as file redirection operators with completely 
unexpected results.
4. Maintaining Your Data.
After you index your collection and start searching in it, you'll eventually have to update the data. 
This section describes how to add, remove, and relocate your files. The command used for this is 
"Iutil".
4.1 Removing old Files.
To remove an old file from being considered for a search requires two steps. The first step is to 
find the document "key" for the one to be removed:
sti-gw% Iutil -d huh -v
Iutil 1.20
DocType: [Key] (Start - End) File
(* indicates deleted record)
USMARC: [10] (0 - 838) /local/text/marc/demomarc
USMARC: [1839] (839 - 1577) /local/text/marc/demomarc
 
In this case, we used Iutil's -v option to display the table of documents. The key for each document 
appears inside the brackets. We'll delete the second document, key number 1839:
sti-gw% Iutil -d huh -del 1839
Iutil 1.20
Marking documents as deleted ...
1 document(s) marked as deleted.
 
To see the deletion:
sti-gw% Iutil -d huh -v
Iutil 1.20
DocType: [Key] (Start - End) File
(* indicates deleted record)
USMARC: [10] (0 - 838) /local/text/marc/demomarc
USMARC: [1839] (839 - 1577) /local/text/marc/demomarc *
 
The asterisk at the end of the last line means that entry has been marked as deleted.
After deleting one or more documents, it's a good idea to force the deletion with Iutil -c:
sti-gw% Iutil -d huh -c

Iutil 1.20
Cleaning up database (removing deleted documents) ...
1 document(s) were removed.
sti-gw% Iutil -d huh -v
Iutil 1.20
DocType: [Key] (Start - End) File
(* indicates deleted record)
USMARC: [10] (0 - 838) /local/text/marc/demomarc
 
You should NEVER remove an actual data file from the collection until you run Iutil -c to commit 
the changes. Otherwise, Isearch will display an error message when you try to search.
4.2 Adding new Files.
There are two ways to add a new file to an existing collection: either reindex the whole collection 
from scratch or use incremental indexing. If you have enough RAM to reindex the whole 
collection, you should probably do that. It's actually faster to start from scratch than it is to add a 
few files piecemeal. If you don't have that much RAM, then you have no choice.
Here's an example, adding the file /local/text/example1/COPYRIGHT:
sti-gw% Iindex -d huh -a /local/text/example1/COPYRIGHT
Iindex 1.20
Building document list ...
Adding to database huh:
   Parsing files ...
   Parsing /local/text/example1/COPYRIGHT ...
   Indexing 5100 words ...
   Merging index ...
Database files saved to disk.
 
As a general rule, you want to specify as many new files to add at once as possible. Don't do this 
one at a time for even as little as two files because you'll be here for many minutes as it is.
And a note: You (generally) can't modify an existing data file and expect to get correct results. 
You'll have to delete the old one and index the new one.
4.3 Moving Files Around.
Sometimes you'll want to move files around in your directory structure, and the last thing you want 
to have to do is reindex all of them. Here's how to relocate the files from the above example:
sti-gw% Iutil -d huh -newpaths
Iutil 1.20
Scanning database for file paths ...
Enter new path or <Return> to leave unchanged:
Path=[/local/text/example1/]
    > /export/home/escott
Done.
4.4 Viewing Information.
We've already seen "Iutil -v" to look at index files to see what they contain. Here are some other 
options to Iutil to see different things:
sti-gw% Iutil -d huh -v
Iutil 1.20
DocType: [Key] (Start - End) File
(* indicates deleted record)
USMARC: [10] (0 - 838) /local/text/marc/demomarc
USMARC: [1839] (839 - 1577) /local/text/marc/demomarc
 
Pretty basic, right? The "-vi" option shows some summary information:
sti-gw% Iutil -d huh -vi
Iutil 1.20
Database name: /local/project/Isearch-1.10.02-nrn/bin/huh
Global document type: USMARC
Total number of documents: 2
Documents marked as deleted: 0
 
And the "-vf" option shows information about fields:
sti-gw% Iutil -d huh -vf
Iutil 1.20
The following fields are defined in this database:
001
008
010
LCCN
040
020
ISBN
043
 
[and so forth. The numbers above are also field names for the USMARC record format.]
5. Performance Issues.
There are several performance issues with Isearch. They can be roughly divided into "indexing 
concerns" and "search concerns". Indexing concerns only pop when files are indexed, but 
searching issues will show up every day.
5.1 Location of Files for Speed.
Location of files in your system is important for good speed. Good speed is, in turn, essential for 
supporting a lot of simultaneous users.
In general, it's a good idea to locate the index and the data files on separate disk drives. Do not 
make the mistake of putting them in separate partitions on the same drive: this is worse than doing 
nothing. Generally, the disk usage pattern will alternate between hitting the index files and the data 
files in short, alternating bursts. Near the conclusion of a search the index files will be accessed 
sequentially, but until then the pattern appears essentially random to the operating system. Putting 
the two sets of files on separate spindles will help minimize the amount of head motion. This is 
most noticable for searching, but affects indexing to some degree. The biggest gains will be for the 
largest collections. If you only have 100megs of data or so, then you'll be hard pressed to measure 
the gain.
In a related note: consider RAID. In particular, consider using RAID level 0 with a small 
granularity so you can get the most I/O operations per second. Most RAID management software 
will let you crank the configuration toward either "faster streaming speed" or "more operations per 
second" and you definitely want the latter. The only time Isearch will stream a device is during the 
result set generation phase of a search when the index file is essentially streamed (and, 
incidentally, the textbase storage device is being hit in a purely random fashion). These bursts 
don't last long, so most of the time the small granule size is a winner. Since most Unixes will 
transfer one page of data from a device at a minimum, use the "pagesize" command and set your 
chunk size to that.
5.2 How Much RAM is Enough?
Answer: how much can you fit into the box?
Better answer: Even that may not be enough.
Indexing is pretty slow with Isearch. To speed it up, Iindex will try to use lots of RAM. The "-m" 
option is used to specify how much RAM to allocate for buffering data. Note that Iindex will use 
considerably more than what you specify. On my little Sparc5 with 48 megs of RAM and an 
additional 16 megs of swap space, I can usually only use "-m 6" even though I have considerably 
more virtual memory available:
elvis% swap -s
total: 41304k bytes allocated + 8720k reserved = 50024k used, 15620k available
elvis% Iindex -d huh -m 7 -a /local/text/example1/COPYRIGHT
Iindex 1.20
Building document list ...
Adding to database huh:
   Parsing files ...
Virtual memory exceeded in `new'
 
Granted, I'm running CDE/Motif, netscape, and a few goodies, but nothing really serious. To index 
on this machine I usually shut down to single-user mode and run Iindex from the console. It's ugly, 
but it's also a lot faster.
If you need to index a lot of data, you need a lot of RAM to make it go quickly.
Searching is another issue entirely. More RAM helps, essentially without limit, but the reward isn't 
as great. The basic search algorithm runs in about 50 lines of code. The problem starts when that 
basic algorithm starts finding matches. When it does, those matches are stored in a result set. These 
result sets can get pretty big in a hurry. Imagine searching for "knee" in a textbase on knee injuries: 
the "knee" result set will get huge. Booleans won't help you, either. In fact, they make it worse. 
Imagine searching for "knee" and "dislocation". You'll still have to drag around the monster "knee" 
result set, and then you'll have to make a second one for "dislocation", and then finally you'll have 
to create a third one that is the union of the two. Isearch will delete the first two when the third one 
is created, but only after the third one has been finished. It can add up in a hurry.
While the relational database people have spent years optimizing for this situation, Isearch isn't 
that clever yet. Moral to the story: make sure your users know to use "good quality" search terms. 
They'll get better results and your office won't look like SIMM City.
