Subj : Re: Complicated email parse or text extraction and database insertion To : comp.programming From : Rob Thorpe Date : Tue Aug 16 2005 10:41 am craftereric@hotmail.com wrote: > Would you please post the CodeWorker script sample here. I am trying > to extract business > data from various email formats in a UNIX environment. I was trying to > using lex and yacc but > have been looking for a way to easily generate the yacc grammar. Yacc and bison are generally not very suitable for this kind of work. They're most useful for parsing languages with lots of structure and entering the data in trees. This problem is simple in concept, but practically difficult. I'd do it using a language that can handle regular expressions. If you have to use a language without them use lex to help. I'd do it like this: * Pass over the email to see which parser to send it to * If type1 send to parser1 if type2 send to parser2 .... So you have to write several parsers, but each is quite simple. The first one can read only a few lines to guess what the format of the message is. Then each of the other parsers only reads the format it's built to read, and emits errors when given something else. .