[HN Gopher] Parser Combinators in Haskell
       ___________________________________________________________________
        
       Parser Combinators in Haskell
        
       Author : aroccoli
       Score  : 95 points
       Date   : 2021-12-22 13:36 UTC (9 hours ago)
        
 (HTM) web link (serokell.io)
 (TXT) w3m dump (serokell.io)
        
       | ra-mos wrote:
       | Novice question, when would you use something like this over
       | something like lex/yacc?
       | 
       | https://tldp.org/HOWTO/Lex-YACC-HOWTO-4.html
        
       | cosmic_quanta wrote:
       | I love parser combinators. It's a much more natural 'format' than
       | parsing with regular expressions which I have to do at $DAY_JOB.
       | 
       | One aspect that I see discussed less often is the idea of
       | 'builder combinators', which are kind of the inverse of parser
       | combinators. There are multiple libraries in Haskell which make
       | it very easy to write HTML/SVG/... this way:
       | 
       | https://hackage.haskell.org/package/blaze-html
       | 
       | https://hackage.haskell.org/package/blaze-svg
       | 
       | I haven't needed to do the same with SQL but I imagine that this
       | already exists as well
        
         | jayshua wrote:
         | I like using Elm's JSON library as an example because it's very
         | small. Since there are only a handful of functions you can
         | understand the entire thing pretty easily. Plus, encoding JSON
         | is a great jump-off into the decoding half of the library which
         | is basically a simple parser combinator system.
         | 
         | https://package.elm-lang.org/packages/elm/json/latest/
        
         | CornCobs wrote:
         | What are the benefits of these monadic builder interfaces
         | instead of simply pure functions? One can imagine a simple Html
         | type with html tag functions e.g.
         | 
         | `div :: [Attributes] -> [Html] -> Html`
         | 
         | Instead of using do notation to build up multiple children one
         | would simply supply a list of children. Added benefit of using
         | simpler map/filter/filterMap/mapMaybe vs their monadic
         | counterparts
        
           | louthy wrote:
           | You can pass contextual state/environment in the monad for
           | trivial data binding
        
           | jaspervdj wrote:
           | The builder interface has important performance benefits over
           | working with lists: similar to e.g. concatenating many small
           | Strings vs using StringBuilder in Java.
           | 
           | The builder interface indeed does not need to be monadic, you
           | can simply append things using <> and get the same benefits.
           | 
           | The monadic interface is purely a syntactic enhancement on
           | top of that, to reduce the number of operators you need.
        
         | jy14898 wrote:
         | Along the same line, there's also invertible syntax
         | descriptions
         | https://gist.github.com/tfausak/a8d7f135bf76e64ea6f35d3be692...
         | , which allow one definition to create both a parser and
         | pretty-printer
        
         | marginalia_nu wrote:
         | I'm normally a recursive descent-man, but I implemented parsec-
         | style parsing in Java once, just because I was curious to see
         | what the world looked like when it was on fire. I don't have
         | the code as this was just me wasting company time during a
         | hackathon they insisted we all attended. I did get it working,
         | despite the type system I was working in.
         | 
         | But yeah. Mistakes were made.
        
       | spekcular wrote:
       | Why are parser combinators not given any attention in traditional
       | academic treatments of parsing? Is there some drawback compared
       | to other popular methods? (And why are they relatively more
       | popular on HN and in functional programming communities?)
       | 
       | For instance, in the book of Grune and Jacobs on parsing
       | techniques, they are presented as essentially a curiosity (with
       | 2.5 pages of description in a 600+ page book).
        
         | auggierose wrote:
         | There is the PEG paper [1], which is basically a formalisation
         | of parser combinators.
         | 
         | But really, otherwise there is not much to say about parser
         | combinators, as they are just code. The cool thing about
         | parsing is that usually you can give much nicer descriptions of
         | the grammar than via parser combinators, and you can analyse
         | these descriptions (uniqueness, for example), und generate
         | optimised code.
         | 
         | [1] https://dl.acm.org/doi/10.1145/964001.964011
        
         | chowells wrote:
         | Most replies to this are making a category error. Parser
         | combinators aren't a parsing technology, they're a library
         | interface. You can use any parsing technology as a backend to
         | an interface that is built in terms of combinators. That most
         | of them are essentially recursive-descent is a result of common
         | implementation decisions, not a requirement of the interface.
         | 
         | And that's why parsing research doesn't pay much attention to
         | it - it's an interface, not a different approach to handling
         | data while parsing. It's all about ergonomics of using a
         | parser, not about the capability and performance
         | characteristics of parsing.
         | 
         | Should more research pay attention to that? Probably.
         | BISON/YACC/ANTLR are all hell to use. Combinator-based
         | libraries are easier, but current designs lack things like
         | extending parsers without modifying them. There's a lot of
         | research that could be done that would care about this aspect
         | of parsers, but it doesn't seem to be for now.
        
         | Athas wrote:
         | From an academic point of view, most parser combinator
         | libraries are basically LL parsers with backtracking. This is
         | eminently practical (especially if the backtracking part is
         | done well), but from an academic point of view that means
         | they're not very interesting - really just a nicer way of
         | implementing recursive descent parsers.
         | 
         | There is of course plenty of academic work specifically on how
         | to implement parser combinator libraries efficiently, how to
         | make them invertible (so they can also produce prettyprinters),
         | provide good error messages, etc.
        
         | kccqzy wrote:
         | Parser combinators are just a nicer syntax for implementing
         | recursive descent parsing+. That's not enough academic content
         | for the subset of academia focused on Algorithm and Data
         | Structures. But of course if you are interested in another
         | subset interested in Programming Languages and Types, you'll
         | find plenty of people interested not in the algorithm (which is
         | easy) but also the way it's written in a real programming
         | language, and leveraging type systems to make the code nice;
         | that's where you'll find most of the academic discussions on
         | it.
         | 
         | Also, higher-kinded types are pretty much required to make
         | parser combinators nice, so that restricts the appeal of parser
         | combinators to just Haskell and a few other niche languages.
         | 
         | Yet another reason is that like all recursive descent parsing
         | with backtracking, it's easy to accidentally make your parser
         | run in exponential time. It certainly requires discipline and
         | deliberation to avoid that.
         | 
         | +: I only know of one library in Haskell that has a parser-
         | combinator-like interface without recursive descent parsing,
         | and that's https://github.com/ollef/Earley but I'll quote from
         | its README the difference between that and typical parser
         | combinators:
         | 
         | > The grammar language is similar to that of many parser
         | combinators (Parsec, Attoparsec, parallel parsing processes,
         | etc.), providing an applicative interface, but the parser
         | gracefully handles all finite CFGs, including those with left-
         | recursion. On the other hand, its productions are not monadic
         | meaning that it does not support context-sensitive or infinite
         | grammars, which are supported by many parser combinator
         | libraries.
        
         | marcosdumay wrote:
         | As people said, they are simpler, less strict (what means you
         | can get bad performance if you use them wrong), and just an
         | abstraction over the very general concept of recursive context-
         | dependent parsing.
         | 
         | There isn't much to cover about them, they are just very
         | ergonomic. Even guaranteeing a good performance is covered on
         | the theory of other types of parser, that is them trivial to
         | replicate on them, but it's not about them.
        
         | Joker_vD wrote:
         | They have an exponential complexity in general case, I guess
         | that's the reason. There _is_ a way to make them be of
         | polynomial complexity in the worst case, but it requires quite
         | an ingenious memoization technique, coupled with lazy
         | evaluation.
        
       | ddek wrote:
       | Parser combinators in F# were where I really grasped the utility
       | of a Monad. F# doesn't have HKT's, and computational expressions
       | are a little different from Haskell 'do'.
       | 
       | It was the '(>>=) = Parser<'a> -> ('a -> Parser<'b>) ->
       | Parser<'b>' that flicked the lightswitch in my head.
        
         | contravariant wrote:
         | In .NET I still find it weird that people sometimes implement
         | the visitor pattern without the output type. You'd have to
         | truly think imperatively to consider:                   void
         | Structure(IVisitor visitor);
         | 
         | a good function signature, instead of:                   A
         | Structure(IParser<A> parser);
         | 
         | I mean fair enough that you can't be bothered to implement the
         | functor and monad methods, but it's downright silly to define a
         | function that is incapable of returning output.
        
           | frabert wrote:
           | In .NET specifically a reason I can think about doing that is
           | that you'd need to implement your visitor twice, as the type
           | argument cannot be void (if I recall correctly).
        
             | Joker_vD wrote:
             | It can't be void, but writing                   public
             | sealed class Unit {             private static Unit m_value
             | = new Unit();                  public static Unit Value =>
             | m_value;                  public Unit() { }
             | public override string ToString() => "()";
             | public override bool Equals(object obj) => obj is Unit;
             | public override int GetHashCode(object obj) => 0;         }
             | 
             | and then using it ("return Unit.Value") in place of void is
             | really not that big a problem.
        
             | contravariant wrote:
             | True, but I just never quite managed to understand why
             | you'd want your visitor to return nothing. And if you did
             | why it was too much hassle to just return '0' or something.
        
       | pizza wrote:
       | Is there such a thing as a stochastic parser combinator? E.g. to
       | do Viterbi/HMM decoding, etc.
       | 
       | Off-topic: I'd really like to get more and more Haskell features
       | in Python. Playing with the adt library for algebraic data types,
       | and I'd also like to find something to reproduce
       | monads/applicatives/functors/arrows etc
        
       | ashton314 wrote:
       | Rust has a parser combinator library called Nom [^1] that is
       | really a joy to use. I know a _little_ Rust--enough to know what
       | a lifetime is but not so much that I don't have to keep referring
       | to the documentation when building something. It took me just a
       | few hours with the docs and examples to get a decent parser
       | together. So nice to work with combinators!
       | 
       | [^1]: https://github.com/Geal/nom
        
       | lou1306 wrote:
       | Since it is not mentioned in the article: Python users may also
       | want to check out pyparsing [0]. It is slightly different from
       | Parsec/FParsec (for instance, it ignores all whitespace by
       | default), but I think it is a really good project.
       | 
       | [0]: https://github.com/pyparsing/pyparsing/
        
       | danidiaz wrote:
       | This "Design Patterns for Parser Combinators"
       | https://www.youtube.com/watch?v=RwzX1XltOGY video (an the
       | associated pdf
       | https://dl.acm.org/doi/abs/10.1145/3471874.3472984) explains some
       | techniques that are good to know--how to deal with left
       | recursion, for example.
       | 
       | It's also notable for being a continuous integration-driven talk!
        
       | mirekrusin wrote:
       | Parser combinators are great ie. in TypeScript [0].
       | 
       | Combinators in general are concise and practical ie. runtime type
       | assertions [1] or generators [2] etc.
       | 
       | We use them in several places in production.
       | 
       | [0] https://github.com/preludejs/parser
       | 
       | [1] https://github.com/appliedblockchain/assert-combinators
       | 
       | [2] https://github.com/preludejs/generator/tree/master/src
        
       | Zababa wrote:
       | An example of a concrete usage of parser combinators and their
       | performance: OCaml's http/af library [1] is a web server that
       | uses parser combinators to implement HTTP 1.1. As for the
       | performance, it's in the ballpark of Go's net/http and Rust's
       | hyper [2]. There's also h2 [3] that implements HTTP 2. They are
       | based on Angstrom [4], which started as a direct port of
       | attoparsec [5], cited in the article. It's great to see all the
       | cross-pollination happening in that space!
       | 
       | [1]: https://github.com/inhabitedtype/httpaf
       | 
       | [2]: https://github.com/ocaml-multicore/retro-httpaf-
       | bench/pull/1...
       | 
       | [3]: https://github.com/anmonteiro/ocaml-h2
       | 
       | [4]: https://github.com/inhabitedtype/angstrom
       | 
       | [5]: https://github.com/haskell/attoparsec
        
       ___________________________________________________________________
       (page generated 2021-12-22 23:01 UTC)