[HN Gopher] From string to AST: parsing (2019)
       ___________________________________________________________________
        
       From string to AST: parsing (2019)
        
       Author : sph
       Score  : 92 points
       Date   : 2024-11-22 12:35 UTC (10 hours ago)
        
 (HTM) web link (kubuszok.com)
 (TXT) w3m dump (kubuszok.com)
        
       | delta_p_delta_x wrote:
       | > All is set and done
       | 
       | For the record, the phrase is 'when all is _said_ and done '[1].
       | 
       | [1]: https://en.wiktionary.org/wiki/when_all_is_said_and_done
        
         | detourdog wrote:
         | I thought it was a nice pun:)
        
         | KPGv2 wrote:
         | Yeah I was extremely triggered.
        
       | ckok wrote:
       | I think this makes it sound a lot more difficult than it has to
       | be, with the formal theory.
       | 
       | When it's really one of the most simple things if you divide it
       | in parts and look at it from a tokenizer (string to list of
       | tokens) and parser on top. Where the tokenizer can usually be
       | very simple: a loop, large switch on the current character, where
       | a choice is made on "what can this be", and making it into a
       | formal token or error. Then a simple recursive parser that can
       | almost be a 1 to 1 copy of the (E)BNF.
        
         | jrop wrote:
         | I love writing parsers like this. Add in Pratt Parsing for
         | operator precedence and writing parsers can be really easy.
        
         | detourdog wrote:
         | I got the impression the author was trying to add higher level
         | reasoning to the chosen term for string to AST parsing.
         | 
         | I felt that they were pointing out how the cognitive load of
         | understanding is effected by word choice.
        
         | antononcube wrote:
         | It seems you are describing how functional parsers (aka parser
         | combinators) work.
         | 
         | (BTW, there is a "Parser combinators" section in the featured
         | post/article.)
        
           | marcosdumay wrote:
           | The bad news about those is that it's easy to mindlessly
           | create a parser that runs on exponential time.
           | 
           | The good news is that this happens in the grammar definition.
           | So once you define your language well, you don't have to
           | watch for it anymore.
        
             | antononcube wrote:
             | Insightful!
             | 
             | Do you know of any "large scale" research on this? I.e.
             | analysis of multiple related projects and/or of "real life
             | stories."
             | 
             | (I agree regardless.)
        
               | marcosdumay wrote:
               | I don't know about any real-world study. But there are
               | people complaining about it from time to time, and it's
               | quite obvious from the theory.
        
         | DemocracyFTW2 wrote:
         | IMHO it gets even better when you can use regular expressions
         | and write a 'modal' parser where each mode is responsible for a
         | certain sub-grammar, like string literals. JavaScript added the
         | sticky flag (y) to make this even simpler.
        
         | joz1-k wrote:
         | I had exactly the same feeling as you after reading the
         | article. And interestingly, all production parsers for all
         | major languages are hand-written recursive descent parsers. On
         | the other hand, if you inspect the actual code for these
         | production parsers (even for newer languages like Swift, Scala,
         | Kotlin, or Rust), the complexity and amount of code is still
         | quite staggering.
        
           | taeric wrote:
           | Isn't a large portion of the code to get friendlier messages
           | for the user?
        
       | antononcube wrote:
       | Very long write-up! If you read it and like the content, you
       | might consider ditching Python and start using Raku a lot and
       | often.
        
       | marxisttemp wrote:
       | I somewhat regularly stop to marvel that one of the greatest
       | anarchist thinkers of our time is also responsible for
       | foundational theories in linguistics that also correspond
       | intimately with the foundational theories of computing. God bless
       | Noam.
        
       | revskill wrote:
       | Theory without examples is like traveling without a memory.
        
       | KPGv2 wrote:
       | This was a really great read! I'm wrote the tree sitter grammar
       | for the Unison programming language, and discovered I really like
       | the work involved in pattern matching that writing tokenizers and
       | parsers comes down to. It also gives you an in-depth
       | understanding of how the language works that you've writing a
       | parser for, and how tooling works.
       | 
       | Like if you have an AST with the ability to map onto code that is
       | displayed in your IDE, the algorithm for an IDE to refactor a
       | variable name is to traverse up the AST until you get to the
       | variable's declaration and then traverse all sibling trees,
       | changing each matching name, but stopping a traversal whenever
       | you encounter a new binding with same name. Code folding is to
       | identify the categories of node that are "foldable" and then you
       | hide every child of that node. Etc. It's all tree traversal
       | algorithms.
       | 
       | It gives you a deep appreciation for how powerful the tooling can
       | be thanks to proper parsing.
        
       | mdaniel wrote:
       | > quadruple G=(N,S,P,S), where T is a finite set of nonterminal
       | symbols, S a finite set of terminal symbols, P is a set of
       | production rules and S is a start symbol.
       | 
       | I think "T" is supposed to be "N" in that sentence[1], based
       | solely upon the further use of "N" nomenclature in subsequent
       | paragraphs
       | 
       | 1: _he said, 5 years too late into a forum just discussing the
       | article_
        
       | atan2 wrote:
       | It's nice to review some of this theory after a week of coding my
       | own interpreter. I have been studying about compilers at
       | pikuma.com the whole week and reading this article after coding a
       | parser is a great way of reviewing what I've implemented.
        
       ___________________________________________________________________
       (page generated 2024-11-22 23:00 UTC)