Subj : Re: String searching with a twist To : comp.programming From : CBFalconer Date : Sat Aug 06 2005 02:15 pm Roger Willcocks wrote: > .... snip ... > > Using a KMP search on a binary string isn't particularly helpful > in any case since there are only two different symbols in the > dictionary, so you don't get the full benefit of skipping past > characters found in the target string that aren't in the search > string. The important point about a KMP search is that it eliminates all backtracking in the searched data, thus making it easy to search a stream on the fly. Here's an example: /* Leor Zolman wrote: > On 25 Feb 2004 07:34:40 -0800, joan@ljungh.se (spike) wrote: > >> Im trying to write a program that should read through a binary >> file searching for the character sequence "\name\" >> >> Then it should read the characters following the "\name\" >> sequence until a NULL character is encountered. >> >> But when my program runs it gets a SIGSEGV (Segmentation >> vioalation) signal. >> >> Whats wrong? And is there a better way than mine to solve >>this task (most likely) > > I think so. Here's a version I just threw together: */ /* And heres another throw -- binfsrch.c by CBF */ #include #include #include #include #include /* The difference between a binary and a text file, on read, is the conversion of end-of-line delimiters. What those delimiters are does not affect the action. In some cases the presence of 0x1a EOF markers (MsDos) does. This is a version of Knuth-Morris-Pratt algorithm. The point of using this is to avoid any backtracking in file reading, and thus avoiding any use of buffer arrays. */ size_t chrcount; /* debuggery, count of input chars, zeroed */ /* --------------------- */ /* Almost straight out of Sedgewick */ /* The next array indicates what index in id should next be compared to the current char. Once the (lgh - 1)th char has been successfully compared, the id has been found. The array is formed by comparing id to itself. */ void initnext(int *next, const char *id, int lgh) { int i, j; assert(lgh > 0); next[0] = -1; i = 0; j = -1; while (i < lgh) { while ((j >= 0) && (id[i] != id[j])) j = next[j]; i++; j++; next[i] = j; } #if (0) for (i = 0; i < lgh; i++) printf("id[%d] = '%c' next[%d] = %d\n", i, id[i], i, next[i]); #endif } /* initnext */ /* --------------------- */ /* reads f without rewinding until either EOF or *marker has been found. Returns EOF if not found. At exit the last matching char has been read, and no further. */ int kmpffind(const char *marker, int lgh, int *next, FILE *f) { int j; /* char position in marker to check */ int ch; /* current char */ assert(lgh > 0); j = 0; while ((j < lgh) && (EOF != (ch = getc(f)))) { chrcount++; while ((j >= 0) && (ch != marker[j])) j = next[j]; j++; } return ch; } /* kmpffind */ /* --------------------- */ /* Find marker in f, display following printing chars up to some non printing character or EOF */ int binfsrch(const char *marker, FILE *f) { int *next; int lgh; int ch; int items; /* count of markers found */ lgh = strlen(marker); if (!(next = malloc(lgh * sizeof *next))) { puts("No memory"); exit(EXIT_FAILURE); } else { initnext(next, marker, lgh); items = 0; while (EOF != kmpffind(marker, lgh, next, f)) { /* found, take appropriate action */ items++; printf("%d %s : \"", items, marker); while (isprint(ch = getc(f))) { chrcount++; putchar(ch); } puts("\""); if (EOF == ch) break; else chrcount++; } free(next); return items; } } /* binfsrch */ /* --------------------- */ int main(int argc, char **argv) { FILE *f; f = stdin; if (3 == argc) { if (!(f = fopen(argv[2], "rb"))) { printf("Can't open %s\n", argv[2]); exit(EXIT_FAILURE); } argc--; } if (2 != argc) { puts("Usage: binfsrch name [binaryfile]"); puts(" (file defaults to stdin text mode)"); } else if (binfsrch(argv[1], f)) { printf("\"%s\" : found\n", argv[1]); } else printf("\"%s\" : not found\n", argv[1]); printf("%lu chars\n", (unsigned long)chrcount); return 0; } /* main binfsrch */ -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. USE worldnet address! .