Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!snorkelwacker.mit.edu!ira.uka.de!news
From: S_ASCHMIDT@iravcl.ira.uka.de (|S| Angela Schmidt)
Subject: Re: Source code for search
In-Reply-To: Lee's message of Friday, 7 Jun 1991 18: 06:42 EDT
Message-ID: <1991Jun10.123235.7395@ira.uka.de>
Sender: news@ira.uka.de (USENET News System)
Organization: University of Karlsruhe, FRG (Informatik Rechnerabteilung)
References: <492@regina.uregina.ca> <DsN635w164w@undrground.UUCP> <91158.180643UH2@psuvm.psu.edu>
Date: Mon, 10 Jun 1991 13:05:18 GMT
X-News-Reader: VMS NEWS 1.07
Lines: 58

In <91158.180643UH2@psuvm.psu.edu> Lee writes:

> >> Would anyone happen to have source code for a (very) fast search routine?
> >> Basically, what I need is something like strstr(), but I need to be able
> >> to modify it to make it non-case sensitive.  I'm basically looking through
> >> a small (512-2048 bytes) buffer for an ASCII string.
> >>
> 
> If you mean what I think you mean, you probably want Boyer-Moore search.
> There is probably source for it buried deep in a copy of egrep, or
> somewhere, but I'm too lazy to dig it out.  You could look
> in a good data structures book---I think Sedgewick covers it.
> 
> Another likely source would be a compiler book.  Look in the chapter that
> covers token parsing.  There are techniques where you compile your
> search string into a FSM, and then run it using the buffer as the tape.

I've already implemented Boyer & Moore. It works very fast with long
searching-strings. Let me describe what Boyer & Moore does: "A" shall be
the string you are searching for, "B" the string you are searching in.

1) Create a field (here called h) with 256 elements, all set to FALSE.
2) for (i=strlen (A)-1; i>=0; i--) h[A[i]]=TRUE;
   So afterwards, if there appears the character  "A" in your
   searchingstring, the field 65 will be set to TRUE and so you'll get
   a table, which tells you wether there exists a special character or not.
   Set k to 0.
3) Now say, "A" is "a" characters long.
   Let c be B[a-1+k]

   ;;;;;;; (String A) => a=7
   ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,... (B)
   ^     ^
   k     c=B[a-1+k]
   (0)   (6)

   Now look if the field hv[c] is FALSE. If it's FALSE, the string must
   not touch B[a-1+k], so let k=k+a, and continue with (3)
   else there might be any touch. Test this:
   for (i=0; i<a; i++) if (A[i]==B[i+k]) ...
   If the string does not match, let k=k+1 and continue with (3)

I hope there were no mistakes. Sorry, my routine is at home, otherwise I would
simply post it. Especially with long strings and a lot of different characters,
it works very fast. I use it in my diskmonitor DisKey. If the string is long
enough, the LED of my harddrive doesn't blink, it just burns!

Enjoy

Angela   


---------------------------------------------------------+---------------------
Angela Schmidt   Internet: S_ASchmidt@iravcl.ira.uka.de  |        //
(Nessy in IRC)   BITNET:   UK8B@DKAUNI2.BITNET           | Amiga //
                 Phone:    +49 731 712316 & 721 6904-263 |   \\ // only 1000
                           MEZ: (10am-9pm)  (12pm-10pm)  |    \X/ the real one!
---------------------------------------------------------+---------------------
