Subj : Parsing regular languages
To   : comp.programming
From : Omri Barel
Date : Wed Aug 03 2005 11:20 pm

Hi,

I was wondering about parsing regular languages. I know a regular 
grammar (right regular grammar) has production rules of the form:

A -> a
A -> aB
A -> epsilon


However, what happens if I have something that looks like a context free 
grammar, but the language it describes is regular? In my case it's even 
easier - the grammar is not recursive. So I have something like:

S -> CDE
S -> CDxF
C -> abcG
D -> m
D -> n
E -> xyz
F -> rG
G -> q

As you can see, I can simply expand the entire thing, because it's not 
recursive:

S -> abcqmxyz
S -> abcqnxyz
S -> abcqmxrq
S -> abcqnxrq

All very finite and simple. If each of the non-terminal has an action 
associated with it, I can keep that action in the expansion:

S -> abcq[CG]m[D]xyz[E]
S -> abcq[CG]n[D]xyz[E]
S -> abcq[CG]m[D]xrq[FG]
S -> abcq[CG]n[D]xrq[FG]

However, the grammar is quite large, and I would like to avoid this 
expansion. I can always use a general context-free parsing algorithm 
(Earley's algorithm, for example), but I *know* the grammar describes a 
regular language, so I'm hoping I could do better than that.

Is there anything I can do in this situation? I'm looking for a 
linear-time algorithm which doesn't take a lot of memory, and can use 
the grammar directly.

Note that although there's no left-recursion problem (it's not 
recursive), I'm trying to avoid any left-factoring, due to memory 
requirements and processing time. So, if my grammar is:

S -> a
S -> a a
S -> a a a
S -> a a a a
S -> a a a a a

I don't want to convert it first into:

S -> a A
A -> a B | epsilon
B -> a C | epsilon
C -> a D | epsilon
D -> a | epsilon

And one more thing: the grammar is not stable - it's actually computer 
generated, so anything manual is completely out of the question.


Thanks!



P.S. If you think this question is too specific for this newsgroup, 
please let me know where I can post it.

(And hopefully I don't have to say that this is NOT homework... ;-)

.