CAMLYACC PARSER SKELETON WITH ERROR RECOVERY

Mike Spivey
Oxford University Computing Laboratory


This parsing engine for camlyacc is compatible with the one in the
CAML Light distribution, but adds the ability to specify error
recovery in parser scripts.


INVENTORY

    README		This file

    yyparse.ml		New parse engine (ML part)
    yyparse.mli		
    yyprim.c		New parse engine (C part)
    yyprim.mli
    yyfix.awk		AWK script to fix camlyacc output

    Makefile		A little example with error recovery
    lexer.mll
    main.ml
    parser.mly
    tree.mli

    fac.b		Example input for the example parser

    diffs		Context diffs for a version of CAML Light
			system that uses the new parser skeleton


WRITING THE GRAMMAR

The grammar file must define or import (via #open) a function

	yyerror : string -> unit

This function is called automatically with the argument "syntax error"
when a syntax error is detected.  It can also be called from parser
actions (see below).

As in ordinary yacc, the special token 'error' denotes a point where
error recovery may take place.  When the parser detects an error, it
first calls yyerror, then discards states from the stack until it
reaches a place where the error token can be shifted.  It then
discards tokens from the input until it finds three successive tokens
that can be accepted, and starts processing with the first of these.
If no state can be uncovered where the error token can be shifted,
then the parser terminates by raising the Parse_error exception.
See documents on ordinary yacc for guidance in how to use error
recovery.

Parser actions can also call the user-supplied yyerror function to
output error messages, and can raise the exception Parse_error to
initiate error recovery; this is like the YYERROR action in ordinary
yacc.


BUILDING THE PARSER

The new parsing engine comes as the two modules yyparse (written in
ML) and yyprim (written in C).  There's a little AWK program yyfix.awk
that modifies the output of camlyacc to use the new parsing engine in
place of the standard one.  It's necessary to link with -custom to
link in the C code.

Here's a typical makefile:

    OBJ = yyprim.o yyparse.zo lexer.zo parser.zo main.zo

    go : $(OBJ)
	    camlc -g -o go -custom $(OBJ)

    lexer.ml : lexer.mll
	    camllex lexer.mll

    parser.mli parser.ml : parser.mly yyfix.awk
	    camlyacc -v -b yyout parser.mly
	    mv yyout.mli parser.mli
	    gawk -f yyfix.awk yyout.ml >parser.ml
	    @rm yyout.ml

    %.zi : %.mli
	    camlc -c -g $<

    %.zo : %.ml
	    camlc -c -g $<

    %.o : %.c
	    camlc -c $<

    ###

    parser.zi     : tree.zi
    lexer.zo      : parser.zi
    main.zo       : parser.zi lexer.zo yyparse.zi
    parser.zo     : parser.zi lexer.zo yyparse.zi
    yyprim.zi     : yyparse.zi
    yyparse.zo    : yyparse.zi yyprim.zi

Alternatively, it's a routine matter to replace the standard parsing
engine in the CAML Light system with the new one, and make a small
change to the output routine of camlyacc.


COMPATIBILITY

Old programs don't use error recovery, and issue error messages by
handling the Parse_error exception returned by the parser.  The new
parse engine can be used with such programs by adding the single line

    let yyerror msg = raise Parse_error;;

in the first part of the grammar file.  When the new engine calls
yyerror, the Parse_error exception is raised just as before.  This
hack makes it possible to rebuild the whole CAML Light system to use
the new skeleton in place of the standard one.  The next step is to
add some error recovery to the parser of the CAML Light compiler!
