hngopher.com

       [HN Gopher] Applicative Parsing
       ___________________________________________________________________
        
       Applicative Parsing
        
       Author : gbrown_
       Score  : 74 points
       Date   : 2021-03-01 10:59 UTC (12 hours ago)
        
 (HTM) web link (jobjo.github.io)
 (TXT) w3m dump (jobjo.github.io)
        
       | lindig wrote:
       | I believe the interesting twist is the use of a GADT to represent
       | the parse result. I see this in this line of history:
       | 
       | "Higher-Order Functions for Parsing", Graham Hutton - parser
       | combinators before monadic parsing
       | 
       | "Monadic parser combinators", Graham Hutton and Erik Meijer - the
       | discovery of monadic parsing, which is now the standard for
       | parser combinators
        
       | gambler wrote:
       | I've recently played with making simple JS-based parsers without
       | reading anything about the theory. My conclusion was that making
       | a parser by itself is a relatively easy task. The hard part is
       | making grammars easy to read and write, as well as handling
       | various errors.
       | 
       | My best stab at representing something like EBNF in JS looked
       | like this:                   class ArrayGrammar { //e.g: [1, 2,
       | 3, [4, 5,],]           array =()=> ['[', this.entries, ']'];
       | entries =()=> many(this.entry);           entry =()=>
       | [either(this.array, this.number, ''), /\s*,\s*/];
       | number =()=> /\d+/;         }
       | 
       | This is a grammar for "arrays" of positive integers. I tested it
       | out and concluded that this is too awful to be of any practical
       | use. Now I'm seeing something with significantly more involved
       | representation making it to the front page of HN. Hm.
       | 
       | ---
       | 
       | Edit: Not much to look at, but here is the source code of the toy
       | implementation if anyone is interested:
       | 
       | https://jsfiddle.net/306jrnt9/1/
       | 
       | The only "cool" part about it is that the parser implementation
       | is close to 100 lines of code.
        
         | rmasters wrote:
         | Tree-sitter uses JS for grammars and I'm sure there are others:
         | 
         | https://tree-sitter.github.io/tree-sitter/creating-parsers#t...
        
           | gambler wrote:
           | I was aiming for sub-100 lines of code implementation. IMO, a
           | full-fledged library would be much better off if they just
           | supported EBNF or McKeeman Form.
        
         | hardmath123 wrote:
         | Parsers in nearley.js [1] are written in a very readable EBNF-
         | like DSL; then they get desugared down to a JS file that's a
         | lot like your snippet.
         | 
         | [1] https://github.com/kach/nearley
        
       | platz wrote:
       | Does this fall under the category of a recursive descent parser?
        
         | chowells wrote:
         | No.
         | 
         | The other answer is also correct, that it's Yes.
         | 
         | In other words, the question is wrong.
         | 
         | Applicative parser combinators are an interface. The backing
         | implementation is pretty open. It might be recursive descent.
         | It might be the Earley algorithm. It might be a packrat parser.
         | 
         | The interesting thing about Applicative parser combinators is
         | that the way they are composed is amenable to static analysis.
         | That allows a wide variety of implementation strategies, all
         | behind a simple uniform interface...
         | 
         | ... So long as you ignore that there are still things left
         | unspecified by the interface, especially related to
         | backtracking behavior. (I really hate how much mindspace
         | Parsec-like libraries have in Haskell. They have the worst
         | backtracking behavior.)
        
           | platz wrote:
           | And yet all the implementations are actually just functional
           | recursion, in practice, no?
        
             | chowells wrote:
             | Yes, but that isn't saying much. It's basically the same as
             | saying "they're all assembly instructions when they're sent
             | to the CPU".
             | 
             | Functional recursion is just looping. "Recursive Descent"
             | is a very specific parsing algorithm.
        
               | platz wrote:
               | then you'd also claim that a parser generator grammer
               | could also be implemented with functional recursion. so
               | what is the value of the original claim though?
        
               | chowells wrote:
               | The original question was about "recursive descent".
               | 
               | You keep talking about "functional recursion".
               | 
               | These are not the same words, nor do they mean the same
               | thing. I do not understand your underlying question, or
               | what point you might be trying to make if the question is
               | intended to be rhetorical.
        
           | seagreen wrote:
           | > ... So long as you ignore that there are still things left
           | unspecified by the interface, especially related to
           | backtracking behavior. (I really hate how much mindspace
           | Parsec-like libraries have in Haskell. They have the worst
           | backtracking behavior.)
           | 
           | The one that auto-backtracks on Alternative is attoparsec
           | (IIRC), parsec and megaparsec don't unless you use `try`. Is
           | your problem with the attoparsec style or with all of them,
           | and if the latter what do you prefer instead?
        
             | kreetx wrote:
             | Came here to say this (that parsec, the old standard,
             | didn't backtrack).
        
             | chowells wrote:
             | Attoparsec and megaparsec don't backtrack. They hard commit
             | to choices and never reconsider them after having done so.
             | 
             | Let's say I want to match the strings "a" or "aa" followed
             | by "ab". Let's just go ahead and write that the most
             | trivially obviously correct way:                   thing =
             | (string "a" <|> string "aa") <> string "ab"
             | 
             | Woo! See how easy it is? ... except that's utterly broken
             | in any parsec-derived library. They don't backtrack.
             | There's nothing in the world you can do to make them
             | backtrack. The only thing you can do is refactor the
             | structure of the grammar to reflect the parser's
             | limitations.
             | 
             | Yes, this example is artificial, but that makes it easy to
             | understand. The broader issue of which this is one example
             | is that you cannot apply distributive laws in parsec or its
             | derivatives.                   (a <*> c) <|> (b <*> c)
             | 
             | Is not interchangeable with                   (a <|> b) <*>
             | c
             | 
             | On the other hand,                   (a <*> b) <|> (a <*>
             | c)
             | 
             | Is interchangeable with                   a <*> (b <|> c)
             | 
             | This might be good for efficiency, but it's not very good
             | for making it pleasant to write parsers.
        
               | seagreen wrote:
               | > Attoparsec and megaparsec don't backtrack.
               | 
               | There are two notion of backtracking at play here. One is
               | backtracking out of a failing first branch of <|> even
               | though it's consumed some input. The megaparsec authors
               | consider this a form of backtracking (see https://hackage
               | .haskell.org/package/megaparsec-9.0.1/docs/Te...).
               | 
               | The other is backtracking from a failure in the second
               | argument to <> or <*>.
               | 
               | I agree that none of the libraries being considered do
               | the latter. But the effect on memory use wouldn't just be
               | slightly inefficient, it would mean keeping the entire
               | input following a <|> in memory until the whole thing
               | completes. Do you think this is the best way to do things
               | in general, or just for specialized situations like
               | parsing things with a fixed size such as config files?
        
               | chowells wrote:
               | Very, very few parsers need to handle multiple gigabyte
               | inputs. For those, sure, focus on performance first. Your
               | tools should always be suited to your problem.
               | 
               | None of that changes the fact that Parsec-like designs
               | are especially obnoxious when it comes to backtracking.
               | You need to be aware of their limitations and design the
               | structure of your grammar around them.
               | 
               | As an idea for a design somewhere in between, maybe try
               | taking inspiration from Prolog. Add a cut combinator for
               | limiting the scope of backtracking explicitly. It still
               | solves the size issues on large inputs, but it allows
               | more freedom in refactoring as long as you don't cross
               | one of those explicit boundaries.
        
         | lirazel wrote:
         | Yes
        
       ___________________________________________________________________
       (page generated 2021-03-01 23:01 UTC)