Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!snorkelwacker.mit.edu!ira.uka.de!news
From: S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt)
Subject: Re: The Boyer-Moore string searching algorithm
In-Reply-To: bojsen@moria.uucp's message of 8 Jun 91 23: 51:12 GMT
Message-ID: <1991Jun10.140028.9664@ira.uka.de>
Sender: news@ira.uka.de (USENET News System)
Organization: University of Karlsruhe, FRG (Informatik Rechnerabteilung)
References: <492@regina.uregina.ca> <19453c80.ARN360b@moria.uucp>
Date: Mon, 10 Jun 1991 14:41:59 GMT
X-News-Reader: VMS NEWS 1.07
Lines: 46

In <19453c80.ARN360b@moria.uucp> bojsen@moria.uucp writes:

> In article <492@regina.uregina.ca>, Dave Plummer writes:
> 
> > Would anyone happen to have source code for a (very) fast search routine?
> > Basically, what I need is something like strstr(), but I need to be able
> > to modify it to make it non-case sensitive.  I'm basically looking through
> > a small (512-2048 bytes) buffer for an ASCII string.
> >
> The fastest-known single-keyword string searching algorithm both in theory
> and practice is the Boyer-Moore string matching algorithm described in
> Boyer, R.S. and J.S. Moore, ``A Fast String Searching Algorithm'', Comm.
> ACM 20(10) (1977) pp. 62-72.
> 
Not totally correct. Boyer-Moore works in O(n*m), another algorithm, KMP only
needs O(n+m), where n is the length of the text you are searching in and m is
the length of the text you are searching for. So Boyer&Moore is not the fastest
in theory, only in practice:

I've already implemented both algorithms. I've also implemented the most
trivialest one which needs about O(n*m) and looks like this:
   for (i=0; i<n-m; i++) {
     j=0;
     while (j<m && a[i+j]==b[j]) {
       ...
       j++;
     }
   }
I stopped the time in all three cases. I searched ASCII-Strings on a harddisk,
using the device. So there were a lot of differnent characters. KMP needed
about twice as much(!!!) time as the normal algorithm which needs about
O(n*m) - like Boyer&Moore. But when I used Boyer&Moore, it run much faster -
depending of the length of the searching string. So Boyer&Moore is really the
fastest - in practice.

> [...]

Nice, your description, bit too long to quote it once more ;-)
>
> --
> .------------------------------------------------------------------------------.
> |  Greetings from Per Bojsen.                                                  |
> +------------------------------+-----------------------------------------------+
> |  EMail: cbmehq!lenler!bojsen | "Names do have power, after all, that of      |
> |     Or: bojsen@dc.dth.dk     |  conjuring images of places we have not seen" |
> `------------------------------+-----------------------------------------------'
