[HN Gopher] Automatic Syntax Error Recovery (2020)
       ___________________________________________________________________
        
       Automatic Syntax Error Recovery (2020)
        
       Author : evertedsphere
       Score  : 13 points
       Date   : 2024-01-10 20:57 UTC (2 hours ago)
        
 (HTM) web link (tratt.net)
 (TXT) w3m dump (tratt.net)
        
       | Joker_vD wrote:
       | Why are people so enamoured with LR parsers, again?
       | 
       | Anyway, I wanted to comment about the (2) note in the post, about
       | using the FOLLOW sets for errror recovery in LL parsers: it's
       | actually a bit more nuanced than just "when parsing non-terminal
       | A, on unrecognized input: skip all tokens until a token in
       | FOLLOW(A) appears".
       | 
       | The actual strategy (which I've first learned from Per Brinch
       | Hansen's "On Pascal Compilers", sec. 5.8. "Error recovery" and
       | then re-encountered much later when studdying the internals of
       | the Go compiler) instead involves considering the FIRST sets of
       | the sibling non-terminals in the call stack. A simple and
       | efficient way to implement it is by augmenting each non-terminal
       | recognizing procedure with a "stop" parameter holding the "stop"
       | set of tokens, which would start as just set([EOF]) at the very
       | top level. Then, if you're parsing a rule of "A ::= B1 B2 ... Bn"
       | kind, you do it like this:                   def parseA(stop):
       | parseB1(FIRST(B2) + FIRST(B3) + ... + stop)
       | parseB2(FIRST(B3) + ... + stop)             ...
       | parseBn(stop)
       | 
       | and for a rule of "A ::= B1 | B2 | ... | Bn" kind you do it like
       | this:                   def parseA(stop):             if
       | atOneOf(FIRST(B1)): parseB1(stop)             elif
       | atOneOf(FIRST(B2)): parseB2(stop)             ...
       | elif atOneOf(FIRST(Bn)): parseBn(stop)             else:
       | setErrorAndSkipUntil(stop)
       | 
       | This approach allows for slightly more precise errory recovery
       | because it basically ends up using union of FOLLOW(A) and all of
       | its parent non-terminals as the stop sets. You can also see this
       | idea proposed e.g. in [0], at the paragraph starting with "What
       | is a reasonable RECOVERY set in a general case?" but it's not
       | implemented there in the way I've described.
       | 
       | [0] https://matklad.github.io/2023/05/21/resilient-ll-parsing-
       | tu...
        
       ___________________________________________________________________
       (page generated 2024-01-10 23:00 UTC)