[HN Gopher] Parser Combinators in Haskell
___________________________________________________________________
Parser Combinators in Haskell
Author : aroccoli
Score : 95 points
Date : 2021-12-22 13:36 UTC (9 hours ago)
(HTM) web link (serokell.io)
(TXT) w3m dump (serokell.io)
| ra-mos wrote:
| Novice question, when would you use something like this over
| something like lex/yacc?
|
| https://tldp.org/HOWTO/Lex-YACC-HOWTO-4.html
| cosmic_quanta wrote:
| I love parser combinators. It's a much more natural 'format' than
| parsing with regular expressions which I have to do at $DAY_JOB.
|
| One aspect that I see discussed less often is the idea of
| 'builder combinators', which are kind of the inverse of parser
| combinators. There are multiple libraries in Haskell which make
| it very easy to write HTML/SVG/... this way:
|
| https://hackage.haskell.org/package/blaze-html
|
| https://hackage.haskell.org/package/blaze-svg
|
| I haven't needed to do the same with SQL but I imagine that this
| already exists as well
| jayshua wrote:
| I like using Elm's JSON library as an example because it's very
| small. Since there are only a handful of functions you can
| understand the entire thing pretty easily. Plus, encoding JSON
| is a great jump-off into the decoding half of the library which
| is basically a simple parser combinator system.
|
| https://package.elm-lang.org/packages/elm/json/latest/
| CornCobs wrote:
| What are the benefits of these monadic builder interfaces
| instead of simply pure functions? One can imagine a simple Html
| type with html tag functions e.g.
|
| `div :: [Attributes] -> [Html] -> Html`
|
| Instead of using do notation to build up multiple children one
| would simply supply a list of children. Added benefit of using
| simpler map/filter/filterMap/mapMaybe vs their monadic
| counterparts
| louthy wrote:
| You can pass contextual state/environment in the monad for
| trivial data binding
| jaspervdj wrote:
| The builder interface has important performance benefits over
| working with lists: similar to e.g. concatenating many small
| Strings vs using StringBuilder in Java.
|
| The builder interface indeed does not need to be monadic, you
| can simply append things using <> and get the same benefits.
|
| The monadic interface is purely a syntactic enhancement on
| top of that, to reduce the number of operators you need.
| jy14898 wrote:
| Along the same line, there's also invertible syntax
| descriptions
| https://gist.github.com/tfausak/a8d7f135bf76e64ea6f35d3be692...
| , which allow one definition to create both a parser and
| pretty-printer
| marginalia_nu wrote:
| I'm normally a recursive descent-man, but I implemented parsec-
| style parsing in Java once, just because I was curious to see
| what the world looked like when it was on fire. I don't have
| the code as this was just me wasting company time during a
| hackathon they insisted we all attended. I did get it working,
| despite the type system I was working in.
|
| But yeah. Mistakes were made.
| spekcular wrote:
| Why are parser combinators not given any attention in traditional
| academic treatments of parsing? Is there some drawback compared
| to other popular methods? (And why are they relatively more
| popular on HN and in functional programming communities?)
|
| For instance, in the book of Grune and Jacobs on parsing
| techniques, they are presented as essentially a curiosity (with
| 2.5 pages of description in a 600+ page book).
| auggierose wrote:
| There is the PEG paper [1], which is basically a formalisation
| of parser combinators.
|
| But really, otherwise there is not much to say about parser
| combinators, as they are just code. The cool thing about
| parsing is that usually you can give much nicer descriptions of
| the grammar than via parser combinators, and you can analyse
| these descriptions (uniqueness, for example), und generate
| optimised code.
|
| [1] https://dl.acm.org/doi/10.1145/964001.964011
| chowells wrote:
| Most replies to this are making a category error. Parser
| combinators aren't a parsing technology, they're a library
| interface. You can use any parsing technology as a backend to
| an interface that is built in terms of combinators. That most
| of them are essentially recursive-descent is a result of common
| implementation decisions, not a requirement of the interface.
|
| And that's why parsing research doesn't pay much attention to
| it - it's an interface, not a different approach to handling
| data while parsing. It's all about ergonomics of using a
| parser, not about the capability and performance
| characteristics of parsing.
|
| Should more research pay attention to that? Probably.
| BISON/YACC/ANTLR are all hell to use. Combinator-based
| libraries are easier, but current designs lack things like
| extending parsers without modifying them. There's a lot of
| research that could be done that would care about this aspect
| of parsers, but it doesn't seem to be for now.
| Athas wrote:
| From an academic point of view, most parser combinator
| libraries are basically LL parsers with backtracking. This is
| eminently practical (especially if the backtracking part is
| done well), but from an academic point of view that means
| they're not very interesting - really just a nicer way of
| implementing recursive descent parsers.
|
| There is of course plenty of academic work specifically on how
| to implement parser combinator libraries efficiently, how to
| make them invertible (so they can also produce prettyprinters),
| provide good error messages, etc.
| kccqzy wrote:
| Parser combinators are just a nicer syntax for implementing
| recursive descent parsing+. That's not enough academic content
| for the subset of academia focused on Algorithm and Data
| Structures. But of course if you are interested in another
| subset interested in Programming Languages and Types, you'll
| find plenty of people interested not in the algorithm (which is
| easy) but also the way it's written in a real programming
| language, and leveraging type systems to make the code nice;
| that's where you'll find most of the academic discussions on
| it.
|
| Also, higher-kinded types are pretty much required to make
| parser combinators nice, so that restricts the appeal of parser
| combinators to just Haskell and a few other niche languages.
|
| Yet another reason is that like all recursive descent parsing
| with backtracking, it's easy to accidentally make your parser
| run in exponential time. It certainly requires discipline and
| deliberation to avoid that.
|
| +: I only know of one library in Haskell that has a parser-
| combinator-like interface without recursive descent parsing,
| and that's https://github.com/ollef/Earley but I'll quote from
| its README the difference between that and typical parser
| combinators:
|
| > The grammar language is similar to that of many parser
| combinators (Parsec, Attoparsec, parallel parsing processes,
| etc.), providing an applicative interface, but the parser
| gracefully handles all finite CFGs, including those with left-
| recursion. On the other hand, its productions are not monadic
| meaning that it does not support context-sensitive or infinite
| grammars, which are supported by many parser combinator
| libraries.
| marcosdumay wrote:
| As people said, they are simpler, less strict (what means you
| can get bad performance if you use them wrong), and just an
| abstraction over the very general concept of recursive context-
| dependent parsing.
|
| There isn't much to cover about them, they are just very
| ergonomic. Even guaranteeing a good performance is covered on
| the theory of other types of parser, that is them trivial to
| replicate on them, but it's not about them.
| Joker_vD wrote:
| They have an exponential complexity in general case, I guess
| that's the reason. There _is_ a way to make them be of
| polynomial complexity in the worst case, but it requires quite
| an ingenious memoization technique, coupled with lazy
| evaluation.
| ddek wrote:
| Parser combinators in F# were where I really grasped the utility
| of a Monad. F# doesn't have HKT's, and computational expressions
| are a little different from Haskell 'do'.
|
| It was the '(>>=) = Parser<'a> -> ('a -> Parser<'b>) ->
| Parser<'b>' that flicked the lightswitch in my head.
| contravariant wrote:
| In .NET I still find it weird that people sometimes implement
| the visitor pattern without the output type. You'd have to
| truly think imperatively to consider: void
| Structure(IVisitor visitor);
|
| a good function signature, instead of: A
| Structure(IParser<A> parser);
|
| I mean fair enough that you can't be bothered to implement the
| functor and monad methods, but it's downright silly to define a
| function that is incapable of returning output.
| frabert wrote:
| In .NET specifically a reason I can think about doing that is
| that you'd need to implement your visitor twice, as the type
| argument cannot be void (if I recall correctly).
| Joker_vD wrote:
| It can't be void, but writing public
| sealed class Unit { private static Unit m_value
| = new Unit(); public static Unit Value =>
| m_value; public Unit() { }
| public override string ToString() => "()";
| public override bool Equals(object obj) => obj is Unit;
| public override int GetHashCode(object obj) => 0; }
|
| and then using it ("return Unit.Value") in place of void is
| really not that big a problem.
| contravariant wrote:
| True, but I just never quite managed to understand why
| you'd want your visitor to return nothing. And if you did
| why it was too much hassle to just return '0' or something.
| pizza wrote:
| Is there such a thing as a stochastic parser combinator? E.g. to
| do Viterbi/HMM decoding, etc.
|
| Off-topic: I'd really like to get more and more Haskell features
| in Python. Playing with the adt library for algebraic data types,
| and I'd also like to find something to reproduce
| monads/applicatives/functors/arrows etc
| ashton314 wrote:
| Rust has a parser combinator library called Nom [^1] that is
| really a joy to use. I know a _little_ Rust--enough to know what
| a lifetime is but not so much that I don't have to keep referring
| to the documentation when building something. It took me just a
| few hours with the docs and examples to get a decent parser
| together. So nice to work with combinators!
|
| [^1]: https://github.com/Geal/nom
| lou1306 wrote:
| Since it is not mentioned in the article: Python users may also
| want to check out pyparsing [0]. It is slightly different from
| Parsec/FParsec (for instance, it ignores all whitespace by
| default), but I think it is a really good project.
|
| [0]: https://github.com/pyparsing/pyparsing/
| danidiaz wrote:
| This "Design Patterns for Parser Combinators"
| https://www.youtube.com/watch?v=RwzX1XltOGY video (an the
| associated pdf
| https://dl.acm.org/doi/abs/10.1145/3471874.3472984) explains some
| techniques that are good to know--how to deal with left
| recursion, for example.
|
| It's also notable for being a continuous integration-driven talk!
| mirekrusin wrote:
| Parser combinators are great ie. in TypeScript [0].
|
| Combinators in general are concise and practical ie. runtime type
| assertions [1] or generators [2] etc.
|
| We use them in several places in production.
|
| [0] https://github.com/preludejs/parser
|
| [1] https://github.com/appliedblockchain/assert-combinators
|
| [2] https://github.com/preludejs/generator/tree/master/src
| Zababa wrote:
| An example of a concrete usage of parser combinators and their
| performance: OCaml's http/af library [1] is a web server that
| uses parser combinators to implement HTTP 1.1. As for the
| performance, it's in the ballpark of Go's net/http and Rust's
| hyper [2]. There's also h2 [3] that implements HTTP 2. They are
| based on Angstrom [4], which started as a direct port of
| attoparsec [5], cited in the article. It's great to see all the
| cross-pollination happening in that space!
|
| [1]: https://github.com/inhabitedtype/httpaf
|
| [2]: https://github.com/ocaml-multicore/retro-httpaf-
| bench/pull/1...
|
| [3]: https://github.com/anmonteiro/ocaml-h2
|
| [4]: https://github.com/inhabitedtype/angstrom
|
| [5]: https://github.com/haskell/attoparsec
___________________________________________________________________
(page generated 2021-12-22 23:01 UTC)