Newsgroups: comp.compilers
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!decwrl!world!iecc!compilers-sender
From: ressler@cs.cornell.edu (Gene Ressler)
Subject: Re: YACC, going the other way
Message-ID: <1991May1.022140.28194@cs.cornell.edu>
Followup-To: ressler@cs.cornell.edu
Summary: Deterministic enumeration of CFLs
Keywords: yacc, testing, Lisp, theory
Sender: compilers-sender@iecc.cambridge.ma.us
Reply-To: ressler@cs.cornell.edu (Gene Ressler)
Organization: Cornell Univ. CS Dept, Ithaca NY 14853
References: <1991Apr23.140427.5416@iecc.cambridge.ma.us> <72058@microsoft.UUCP>
Date: Wed, 1 May 1991 02:21:40 GMT
Approved: compilers@iecc.cambridge.ma.us

In article <1991Apr23.140427.5416@iecc.cambridge.ma.us> elk@cblpn.att.com (Edwin Lewis King +1 614 860 3394) writes:
>I'm interesting in generating strings that are described by a BNF (OK,
>YACC) grammar.

Many decidablility results rely on Turing machines enumerating CFLs, so
we'd better be able to do it without random numbers!  I know this may not
be what you want for testing, but I think it's interesting anyway.
Following is a rough hack in Common Lisp (runs under Lucid) that keeps a
queue of sentential forms sorted by length, pulling off the shortest one
to expand next.  `Expand' means replace each non-terminal A by the rhs of
each A-production in all combinations.  If a resulting form has only
non-terminals, print it; otherwise queue it for deeper expansion.  If you
have no epsilon productions, this generates strings of strictly
non-decreasing length.

Gene
----

; Enumerate strings generated by a CFL.
; Warning: Rough and probably buggy code.

(in-package 'user)

(let ((look-ahead nil)
      (stream t)
      (rules (make-hash-table))
      (start nil)
      (q nil))

(defmacro lex ()
  `(setq look-ahead (read stream nil)))

(defun parse (in-stream)
  ; set up
  (clrhash rules)
  (setq stream in-stream)
  (lex)
  ; assume first sym is start sym
  (setq start look-ahead)
  (loop
    ; done when lex returns nil for eof
    (unless look-ahead
      (return))
    ; look ahead is lhs. gather rhs.
    (let ((prod look-ahead)
          (rhs nil))
      (lex)
      (unless (eq look-ahead '->)
        (error "missing ->"))
      (lex)
      (loop
        ; gather up to ! or !! (`or' or end of prod)
        (loop
          (when (member look-ahead '(! !! ->))
	    (return))
          (push look-ahead rhs)
          (lex))
        ; put production in rules hash table indexed by lhs
	; (prod) so each entry is a list of rhs's
        (push (reverse rhs) (gethash prod rules))
        (case look-ahead
	  (!! (lex) (return))
	  (! (lex))
	  (t (error "unexpected ~A" look-ahead)))
	; start new rhs for same lhs (prod)
        (setq rhs nil))))
  ; check for undefined symbols
  (maphash #'(lambda (prod rhss)
	       (declare (ignore prod))
	       (dolist (rhs rhss)
		 (dolist (x rhs)
		   (unless (or (stringp x)
			       (gethash x rules))
		     (error "~A undefined" rhs)))))
           rules)
  ; return start symbol
  start)

; don't put any sentential form longer than this on the queue
(defparameter *cutoff-length* 10)

; insert form in queue sorted ascending by length
(defun enq (s-form)
  (when (< (list-length s-form) *cutoff-length*)
    (setq q (merge 'list
		   (list s-form) q
		     #'(lambda (x y)
		         (< (list-length x)
			    (list-length y)))))))

; get shortest sentential form from queue
(defmacro qpop () `(pop q))

; expand rhs every way that is possible by
; expanding each non-terminal exactly once.
; accumulate result in `sofar'.  when rhs
; is gone, look at `sofar' to see if it's all
; terminals (strings).  if so, print it; if not,
; queue it for deeper expansion.
(defun expand (rhs sofar)
  (cond
    ((null rhs)
       (if (every #'stringp sofar)
         (format t "~&~A"
		 (reduce #'(lambda (x y)
			     (concatenate 'string x y))
			 (reverse sofar)))
	 (enq (reverse sofar))))
    ((stringp (car rhs))
       (expand (cdr rhs) (cons (car rhs) sofar)))
    (t (dolist (rrhs (gethash (car rhs) rules))
	 (expand (cdr rhs) (revappend rrhs sofar))))))

; expand start symbol, iterate (expand)
; until the queue is empty.
(defun enum (start)
  (setq q nil)
  (dolist (rhs (gethash start rules))
    (expand rhs nil))
  (loop
    (let ((rhs (qpop)))
      (unless rhs (return))
      (expand rhs nil))))

) ; end (let (look-ahead ...

; tester assumes a grammar is in file "grammar"
(defun test ()
  (with-open-file (grammar "grammar" :direction :input)
    (enum (parse grammar))))

; a sample grammar.
; if you need epsilon, just say "".

#|

S -> expr
  !!

expr -> term
     !  expr "+" term
     !!

term -> factor
     !  term "*" factor
     !!

factor -> "1"
       !  "a"
       !  "(" expr ")"
       !!

|#
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.
