[HN Gopher] From string to AST: parsing (2019)
___________________________________________________________________
From string to AST: parsing (2019)
Author : sph
Score : 92 points
Date : 2024-11-22 12:35 UTC (10 hours ago)
(HTM) web link (kubuszok.com)
(TXT) w3m dump (kubuszok.com)
| delta_p_delta_x wrote:
| > All is set and done
|
| For the record, the phrase is 'when all is _said_ and done '[1].
|
| [1]: https://en.wiktionary.org/wiki/when_all_is_said_and_done
| detourdog wrote:
| I thought it was a nice pun:)
| KPGv2 wrote:
| Yeah I was extremely triggered.
| ckok wrote:
| I think this makes it sound a lot more difficult than it has to
| be, with the formal theory.
|
| When it's really one of the most simple things if you divide it
| in parts and look at it from a tokenizer (string to list of
| tokens) and parser on top. Where the tokenizer can usually be
| very simple: a loop, large switch on the current character, where
| a choice is made on "what can this be", and making it into a
| formal token or error. Then a simple recursive parser that can
| almost be a 1 to 1 copy of the (E)BNF.
| jrop wrote:
| I love writing parsers like this. Add in Pratt Parsing for
| operator precedence and writing parsers can be really easy.
| detourdog wrote:
| I got the impression the author was trying to add higher level
| reasoning to the chosen term for string to AST parsing.
|
| I felt that they were pointing out how the cognitive load of
| understanding is effected by word choice.
| antononcube wrote:
| It seems you are describing how functional parsers (aka parser
| combinators) work.
|
| (BTW, there is a "Parser combinators" section in the featured
| post/article.)
| marcosdumay wrote:
| The bad news about those is that it's easy to mindlessly
| create a parser that runs on exponential time.
|
| The good news is that this happens in the grammar definition.
| So once you define your language well, you don't have to
| watch for it anymore.
| antononcube wrote:
| Insightful!
|
| Do you know of any "large scale" research on this? I.e.
| analysis of multiple related projects and/or of "real life
| stories."
|
| (I agree regardless.)
| marcosdumay wrote:
| I don't know about any real-world study. But there are
| people complaining about it from time to time, and it's
| quite obvious from the theory.
| DemocracyFTW2 wrote:
| IMHO it gets even better when you can use regular expressions
| and write a 'modal' parser where each mode is responsible for a
| certain sub-grammar, like string literals. JavaScript added the
| sticky flag (y) to make this even simpler.
| joz1-k wrote:
| I had exactly the same feeling as you after reading the
| article. And interestingly, all production parsers for all
| major languages are hand-written recursive descent parsers. On
| the other hand, if you inspect the actual code for these
| production parsers (even for newer languages like Swift, Scala,
| Kotlin, or Rust), the complexity and amount of code is still
| quite staggering.
| taeric wrote:
| Isn't a large portion of the code to get friendlier messages
| for the user?
| antononcube wrote:
| Very long write-up! If you read it and like the content, you
| might consider ditching Python and start using Raku a lot and
| often.
| marxisttemp wrote:
| I somewhat regularly stop to marvel that one of the greatest
| anarchist thinkers of our time is also responsible for
| foundational theories in linguistics that also correspond
| intimately with the foundational theories of computing. God bless
| Noam.
| revskill wrote:
| Theory without examples is like traveling without a memory.
| KPGv2 wrote:
| This was a really great read! I'm wrote the tree sitter grammar
| for the Unison programming language, and discovered I really like
| the work involved in pattern matching that writing tokenizers and
| parsers comes down to. It also gives you an in-depth
| understanding of how the language works that you've writing a
| parser for, and how tooling works.
|
| Like if you have an AST with the ability to map onto code that is
| displayed in your IDE, the algorithm for an IDE to refactor a
| variable name is to traverse up the AST until you get to the
| variable's declaration and then traverse all sibling trees,
| changing each matching name, but stopping a traversal whenever
| you encounter a new binding with same name. Code folding is to
| identify the categories of node that are "foldable" and then you
| hide every child of that node. Etc. It's all tree traversal
| algorithms.
|
| It gives you a deep appreciation for how powerful the tooling can
| be thanks to proper parsing.
| mdaniel wrote:
| > quadruple G=(N,S,P,S), where T is a finite set of nonterminal
| symbols, S a finite set of terminal symbols, P is a set of
| production rules and S is a start symbol.
|
| I think "T" is supposed to be "N" in that sentence[1], based
| solely upon the further use of "N" nomenclature in subsequent
| paragraphs
|
| 1: _he said, 5 years too late into a forum just discussing the
| article_
| atan2 wrote:
| It's nice to review some of this theory after a week of coding my
| own interpreter. I have been studying about compilers at
| pikuma.com the whole week and reading this article after coding a
| parser is a great way of reviewing what I've implemented.
___________________________________________________________________
(page generated 2024-11-22 23:00 UTC)