[HN Gopher] Parsing in JavaScript: all the tools and libraries y...
___________________________________________________________________
Parsing in JavaScript: all the tools and libraries you can use
Author : jasim
Score : 62 points
Date : 2021-05-24 14:32 UTC (8 hours ago)
(HTM) web link (tomassetti.me)
(TXT) w3m dump (tomassetti.me)
| archibaldJ wrote:
| hmm this article is a bit outdated; peg.js (mentioned in the
| article) has been abandoned by the maintainer for a few years now
| (and it never reached a stable 1.0 ); recently the project was
| picked up by another team under the name peggy.js
| https://github.com/peggyjs/peggy
| codeismath wrote:
| I recently used "Arcsecond" JavaScript Parser Combinator library
| to output some abstract syntax based on John Reynold's
| "Definitional interpreters for higher-order programming
| languages".
|
| It's based on Haskell's Parsec parser combinator library, and is
| zero-dependency.
|
| I was convinced to give it a try based on watching the author's
| YouTube videos "Parser Combinators From Scratch" on "Low Level
| JavaScript". Enjoyable series - recommended.
| [deleted]
| heroku wrote:
| See my library for parsing text:
| https://github.com/eguneys/tamcher
| dahart wrote:
| The link has an anchor "#chevrotain" that goes directly to near
| the bottom of the article, can it be removed?
| jasim wrote:
| Sorry, I had tried changing it immediately after posting the
| link, but HN allows only changing the title.
| sehugg wrote:
| I'd also suggest using jsfuzz to find sharp corners in your
| parser. I used it extensively for my recursive-descent BASIC
| interpreter (along with a test suite) and it found tons of
| issues.
| jchw wrote:
| This is pretty cool and useful! I had been doing bootleg parser
| combinators in JavaScript up until now. This works, although
| being someone who doesn't write Haskell I assume that the magic
| that makes parser combinators efficient is not easy to implement
| in most languages. (I've been using nom with Rust on the side too
| and it seems to do the same naive stuff that I do.)
|
| If you ever want to parse _binary_ data using JavaScript, I will
| always recommend the excellent Kaitai Struct project.
|
| https://kaitai.io/
| jevgeni wrote:
| Although this is an interesting topic to me, I struggle to read
| the site behind all the cookie warnings and newsletter begging.
| FractalHQ wrote:
| No Treesitter? I thought it was one of the most performant modern
| options with an amazing ecosystem around it.
| edko wrote:
| Treesitter is not JS (although the Wasm version can be used
| from JS). Lezer (https://github.com/lezer-parser/lezer) is
| similar (it was inspired by Treesitter) and is written in TS.
| Zababa wrote:
| Another thing that you can use if you're doing JS: any language
| that compile to JavaScript. Here are a few options:
|
| - Clojure with ClojureScript
|
| - F# with Fable
|
| - Haskell with ghcjs
|
| - OCaml with js_of_ocaml and ReScript
|
| - Racket with RacketScript
|
| - Scala with Scala.js
| lgessler wrote:
| It's a Clojure(Script) library, but it's still so good it's worth
| mentioning: Instaparse[0] generates a parser for you from a BNF-y
| grammar specification. Sample from the readme:
| (def as-and-bs (insta/parser "S = AB*
| AB = A B A = 'a'+ B = 'b'+"))
| => (as-and-bs "aaaaabbbaaaabb") [:S [:AB [:A "a"
| "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a"
| "a"] [:B "b" "b"]]]
|
| [0]: https://github.com/Engelberg/instaparse
| [deleted]
| dahart wrote:
| Anyone have experience with Nearley? I recently did some reading
| on JS parsers and ended up feeling like it was easier to use than
| most of the others where you have to pay attention to and know
| some tricks for handling recursive grammar rules. What I don't
| have a sense for yet is what the downsides of Nearley might be
| down the road once my grammar gets bigger and/or goes into
| production, if the performance issues might pop up suddenly and
| be hard to fix.
| inbx0 wrote:
| I've done a little toy language implementation with Nearley and
| really enjoyed it. If you have some JS experience under your
| belt, Nearley should be pretty easy to pick up and I felt it
| was easy to build on top of the basic building blocks. It's
| almost too easy to drop into custom JS with Nearley's inline JS
| syntax, so you can do all sorts of stuff with JS regexes or
| what not if you can't find a built-in feature that would do
| what you want.
|
| Unfortunately my project was not big enough that I'd be able to
| say anything about performace in scale, though. Being JS based
| has the advantage that you can (depending on your application
| needs) sometimes move some of that workload to the client :)
| matheusmoreira wrote:
| Nearley is amazing. The best implementation of the Earley
| algorithm I know. It's a great algorithm because you can use
| any context-free grammar as input and it doesn't require code
| generation. It's the perfect algorithm for a parse(grammar,
| input) standard library function.
|
| As far as I know, Nearley implements all optimizations
| published in the literature. Worst case time complexity is
| cubic for ambiguous grammars, quadratic for unambiguous
| grammars and linear for grammars suitable for deterministic
| algorithms. Performance is still going to be worse than
| constrained parsers that can handle only a subset of context-
| free grammars such as deterministic LL(1) parsers. Here's what
| the Parsing Techniques book says:
|
| > If one has the luxury of being in a position to design the
| grammar oneself, the choice is simple:
|
| > design the grammar to be LL(1) and use a predictive recursive
| descent parser.
|
| > This can be summarized as: parsing is a problem only if
| someone else is in charge of the grammar.
| fjfaase wrote:
| Earlier this year, I wrote a small interpreting parser in
| JavaScript, which takes a grammar from a string and parses some
| input from a string according to the grammar into an AST. It is
| embedded in the page
| https://fransfaase.github.io/ParserWorkshop/Online_inter_par...
___________________________________________________________________
(page generated 2021-05-24 23:01 UTC)