Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!snorkelwacker.mit.edu!ira.uka.de!news
From: S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt)
Subject: Re: The Boyer-Moore string searching algorithm
In-Reply-To: bojsen@moria.uucp's message of 11 Jun 91 00: 24:51 GMT
Message-ID: <1991Jun12.143556.4127@ira.uka.de>
Sender: news@ira.uka.de (USENET News System)
Organization: University of Karlsruhe, FRG (Informatik Rechnerabteilung)
References: <492@regina.uregina.ca> <19453c80.ARN360b@moria.uucp> <1991Jun10.140028.9664@ira.uka.de> <1947e763.ARN36eb@moria.uucp>
Date: Wed, 12 Jun 1991 15:16:19 GMT
X-News-Reader: VMS NEWS 1.08
Lines: 66

In <1947e763.ARN36eb@moria.uucp> bojsen@moria.uucp writes:

> In article <1991Jun10.140028.9664@ira.uka.de>, |S| Angela Schmidt writes:
> 
> > In <19453c80.ARN360b@moria.uucp> bojsen@moria.uucp writes:
> >
> > > The fastest-known single-keyword string searching algorithm both in theory
> > > and practice is the Boyer-Moore string matching algorithm described in
> > > Boyer, R.S. and J.S. Moore, ``A Fast String Searching Algorithm'', Comm.
> > > ACM 20(10) (1977) pp. 62-72.
> > >
> > Not totally correct. Boyer-Moore works in O(n*m), another algorithm, KMP only
> > needs O(n+m), where n is the length of the text you are searching in and m is
> > the length of the text you are searching for. So Boyer&Moore is not the
> > fastest in theory, only in practice:
> >
> I'm sorry, but I believe Boyer-Moore is O(m+n) like KMP.  But Boyer-Moore has
> a much better average behavior than KMP.  For sufficiently large alphabets
> Boyer-Moore only needs to examin about n/m characters of the input string
> on the average.
> 
Sorry, but I'm *very* shure it has O(n*m) since - in the worst case - it always
tries to match the whole string and only recognizes at the last character, that
there is a mismatch. Doing this n times causes O(m*n). Of course, normally it
only needs a little bit more than n/m - that's why in practice it's so quick.

aaaaaaa String you search in called A
baaa    String you search for called B
^ means that the character matches
! means that the character does not match

Strings    Characters that will be checked     Number of checks
---------------------------------------------------------------
aaaaaaa
baaa
!^^^       A[3] A[2] A[1] - mismatch at A[0]   m checks in 1st round

aaaaaaa
 baaa      A[4] A[3] A[2] - mismatch at A[1]   m checks in 2nd round
 !^^^

...

(n-m+1 times)

This will be done n-m+1 times, so you need O(m*(n-m+1)) and since m<<n
(otherwise nobody would implementat Boyer&Moore) I said that it needs O(n*m).
That does not mean that I dislike Boyer&Moore to the contrary - I like it very
much. But O(..) discusses the worst case and there Boyer&Moore is not very
useful.

> [...]
> --
> .------------------------------------------------------------------------------.
> |  Greetings from Per Bojsen.                                                  |
> +------------------------------+-----------------------------------------------+
> |  EMail: cbmehq!lenler!bojsen | "Names do have power, after all, that of      |
> |     Or: bojsen@dc.dth.dk     |  conjuring images of places we have not seen" |
> `------------------------------+-----------------------------------------------'

---------------------------------------------------------+---------------------
Angela Schmidt   Internet: S_ASchmidt@iravcl.ira.uka.de  |        //
(Nessy in IRC)   BITNET:   UK8B@DKAUNI2.BITNET           | Amiga //
                 Phone:    +49 731 712316 & 721 6904-263 |   \\ // only 1000
                           MEZ: (10am-9pm)  (12pm-10pm)  |    \X/ the real one!
---------------------------------------------------------+---------------------
