[HN Gopher] Parsing in JavaScript: all the tools and libraries y...
       ___________________________________________________________________
        
       Parsing in JavaScript: all the tools and libraries you can use
        
       Author : jasim
       Score  : 62 points
       Date   : 2021-05-24 14:32 UTC (8 hours ago)
        
 (HTM) web link (tomassetti.me)
 (TXT) w3m dump (tomassetti.me)
        
       | archibaldJ wrote:
       | hmm this article is a bit outdated; peg.js (mentioned in the
       | article) has been abandoned by the maintainer for a few years now
       | (and it never reached a stable 1.0 ); recently the project was
       | picked up by another team under the name peggy.js
       | https://github.com/peggyjs/peggy
        
       | codeismath wrote:
       | I recently used "Arcsecond" JavaScript Parser Combinator library
       | to output some abstract syntax based on John Reynold's
       | "Definitional interpreters for higher-order programming
       | languages".
       | 
       | It's based on Haskell's Parsec parser combinator library, and is
       | zero-dependency.
       | 
       | I was convinced to give it a try based on watching the author's
       | YouTube videos "Parser Combinators From Scratch" on "Low Level
       | JavaScript". Enjoyable series - recommended.
        
       | [deleted]
        
       | heroku wrote:
       | See my library for parsing text:
       | https://github.com/eguneys/tamcher
        
       | dahart wrote:
       | The link has an anchor "#chevrotain" that goes directly to near
       | the bottom of the article, can it be removed?
        
         | jasim wrote:
         | Sorry, I had tried changing it immediately after posting the
         | link, but HN allows only changing the title.
        
       | sehugg wrote:
       | I'd also suggest using jsfuzz to find sharp corners in your
       | parser. I used it extensively for my recursive-descent BASIC
       | interpreter (along with a test suite) and it found tons of
       | issues.
        
       | jchw wrote:
       | This is pretty cool and useful! I had been doing bootleg parser
       | combinators in JavaScript up until now. This works, although
       | being someone who doesn't write Haskell I assume that the magic
       | that makes parser combinators efficient is not easy to implement
       | in most languages. (I've been using nom with Rust on the side too
       | and it seems to do the same naive stuff that I do.)
       | 
       | If you ever want to parse _binary_ data using JavaScript, I will
       | always recommend the excellent Kaitai Struct project.
       | 
       | https://kaitai.io/
        
       | jevgeni wrote:
       | Although this is an interesting topic to me, I struggle to read
       | the site behind all the cookie warnings and newsletter begging.
        
       | FractalHQ wrote:
       | No Treesitter? I thought it was one of the most performant modern
       | options with an amazing ecosystem around it.
        
         | edko wrote:
         | Treesitter is not JS (although the Wasm version can be used
         | from JS). Lezer (https://github.com/lezer-parser/lezer) is
         | similar (it was inspired by Treesitter) and is written in TS.
        
       | Zababa wrote:
       | Another thing that you can use if you're doing JS: any language
       | that compile to JavaScript. Here are a few options:
       | 
       | - Clojure with ClojureScript
       | 
       | - F# with Fable
       | 
       | - Haskell with ghcjs
       | 
       | - OCaml with js_of_ocaml and ReScript
       | 
       | - Racket with RacketScript
       | 
       | - Scala with Scala.js
        
       | lgessler wrote:
       | It's a Clojure(Script) library, but it's still so good it's worth
       | mentioning: Instaparse[0] generates a parser for you from a BNF-y
       | grammar specification. Sample from the readme:
       | (def as-and-bs           (insta/parser             "S = AB*
       | AB = A B              A = 'a'+              B = 'b'+"))
       | => (as-and-bs "aaaaabbbaaaabb")         [:S          [:AB [:A "a"
       | "a" "a" "a" "a"] [:B "b" "b" "b"]]          [:AB [:A "a" "a" "a"
       | "a"] [:B "b" "b"]]]
       | 
       | [0]: https://github.com/Engelberg/instaparse
        
       | [deleted]
        
       | dahart wrote:
       | Anyone have experience with Nearley? I recently did some reading
       | on JS parsers and ended up feeling like it was easier to use than
       | most of the others where you have to pay attention to and know
       | some tricks for handling recursive grammar rules. What I don't
       | have a sense for yet is what the downsides of Nearley might be
       | down the road once my grammar gets bigger and/or goes into
       | production, if the performance issues might pop up suddenly and
       | be hard to fix.
        
         | inbx0 wrote:
         | I've done a little toy language implementation with Nearley and
         | really enjoyed it. If you have some JS experience under your
         | belt, Nearley should be pretty easy to pick up and I felt it
         | was easy to build on top of the basic building blocks. It's
         | almost too easy to drop into custom JS with Nearley's inline JS
         | syntax, so you can do all sorts of stuff with JS regexes or
         | what not if you can't find a built-in feature that would do
         | what you want.
         | 
         | Unfortunately my project was not big enough that I'd be able to
         | say anything about performace in scale, though. Being JS based
         | has the advantage that you can (depending on your application
         | needs) sometimes move some of that workload to the client :)
        
         | matheusmoreira wrote:
         | Nearley is amazing. The best implementation of the Earley
         | algorithm I know. It's a great algorithm because you can use
         | any context-free grammar as input and it doesn't require code
         | generation. It's the perfect algorithm for a parse(grammar,
         | input) standard library function.
         | 
         | As far as I know, Nearley implements all optimizations
         | published in the literature. Worst case time complexity is
         | cubic for ambiguous grammars, quadratic for unambiguous
         | grammars and linear for grammars suitable for deterministic
         | algorithms. Performance is still going to be worse than
         | constrained parsers that can handle only a subset of context-
         | free grammars such as deterministic LL(1) parsers. Here's what
         | the Parsing Techniques book says:
         | 
         | > If one has the luxury of being in a position to design the
         | grammar oneself, the choice is simple:
         | 
         | > design the grammar to be LL(1) and use a predictive recursive
         | descent parser.
         | 
         | > This can be summarized as: parsing is a problem only if
         | someone else is in charge of the grammar.
        
       | fjfaase wrote:
       | Earlier this year, I wrote a small interpreting parser in
       | JavaScript, which takes a grammar from a string and parses some
       | input from a string according to the grammar into an AST. It is
       | embedded in the page
       | https://fransfaase.github.io/ParserWorkshop/Online_inter_par...
        
       ___________________________________________________________________
       (page generated 2021-05-24 23:01 UTC)