[HN Gopher] Learn Python ASTs, by building your own linter
       ___________________________________________________________________
        
       Learn Python ASTs, by building your own linter
        
       Author : tusharsadhwani
       Score  : 145 points
       Date   : 2021-12-29 13:04 UTC (9 hours ago)
        
 (HTM) web link (sadh.life)
 (TXT) w3m dump (sadh.life)
        
       | mgdlbp wrote:
       | Quite a few languages can access their own ASTs, but I don't know
       | of one other than C# (and VB.NET--Roslyn is the compiler for
       | both) where the API is so deeply integrated and hence useful.
       | 
       | The Roslyn SDK exposes its syntax tree, symbol table, and
       | semantic model, with the primary use being for custom code
       | analysis. I surprisingly easily made a linter ('analyzer') for a
       | personal style preference, along with 'code fix' (lightbulb
       | suggestion that appears in Visual Studio) through the quick-start
       | tutorial. The resulting .NET assembly integrated impressively
       | with msbuild and Visual Studio, my custom analyzer being
       | indistinguishable in UX from the built-in ones. Seeing the actual
       | syntax tree, especially where the compiler had recovered from
       | syntax errors, also seemed a great learning experience for
       | getting a feel of how the compiler treated errors.
       | 
       | It seems to now be fairly common for .NET projects to develop
       | their own analyzers to enforce specialized best-practices; I
       | wonder if other languages have similar customs?
       | 
       | https://docs.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/
        
         | tusharsadhwani wrote:
         | That's amazing. I'm interested in C# now
        
       | benhoff wrote:
       | I've often thought it would be cool to use AST's and perhaps code
       | embeddings generated from machine learning as a tool to help
       | students improve.
       | 
       | If you've ever taught a course with intro level python, it
       | quickly becomes apparent how repeatable the mistakes are, or
       | where you didn't spend enough time. As a student, this is
       | frustrating because the correction comes too late, it's why
       | having someone knowledgeable over your shoulder can speed up your
       | learning.
       | 
       | The challenge that I believe ASTs present is that they only parse
       | compliant code. So if someone makes a syntax error, it becomes a
       | whole new ball game. I'd glanced at tree sitter to see if this
       | could fix some of these issues, but I think it's a more
       | fundamental problem than that.
        
         | tusharsadhwani wrote:
         | tree sitter can definitely help with this problem, but so can
         | regular AST parsers, the idea is the same: just add code or
         | grammar that will parse the "invalid" grammar, mark it as
         | invalid, and continue parsing valid code as soon as possible.
         | 
         | Existing code editors like VSCode do exactly this for better
         | syntax highlighting of incomplete code.
        
         | hsbauauvhabzb wrote:
         | Wouldn't that be impossible? The structure of python is finite,
         | and invalid deviations are infinite. Sure any language AST
         | compiler could be more helpful, but they can't take trash and
         | turn it to gold.
        
       | stevekemp wrote:
       | That's a pretty awesome read, and the approach is pretty
       | flexible.
       | 
       | I've written simple code using the AST-visitor approach to
       | enforce some common-standards on code within our company. Simple
       | things like ensuring that when we use Troposphere to generate AWS
       | cloudformation templates we always setup some specific values.
       | (For example I wrote a checker to ensure that every time an ECR
       | instance is created we must enable ScanOnPush, or every time we
       | declare a security-group we must have a comment "[cloudformation]
       | ..." with it - so that manual edits stand out.)
        
         | tusharsadhwani wrote:
         | Thanks!
         | 
         | The stuff that ASTs let you do really flexibly is almost always
         | lost to people because they're not aware of it. A lot of other
         | developers would try to do this with string or regex matching,
         | and that often leads to painful experiences.
        
           | stevekemp wrote:
           | Agreed. Simple checks like these are trivial:
           | 
           | A call to function "Foo" Must always have an argument
           | matching the regexp "/blah/". Otherwise raise an error.
           | 
           | And they're so lightweight you can add them to any
           | CI/CD/automation steps in your repository. Once you get a few
           | things like that, or validating naming-standards, you can
           | roll them up into a simple "linter".
        
       | ambrose2 wrote:
       | This was a really nice read! The best part was learning that
       | there's no need to actually parse tokens when building a Python
       | linter (well, maybe there's an exception) because you can
       | leverage the already parsed AST or CST.
        
         | tusharsadhwani wrote:
         | True! Although there's some lints that would require you to
         | parse tokens, such as checking for single vs. double quotes, or
         | number of spaces used for indentation.
         | 
         | However, python has a builtin tokenize module for that as well.
        
       | jmac01 wrote:
       | > "So what is an AST?"
       | 
       | I had to google... because it doesn't actually say that it stands
       | for Abstract Syntax Tree haha
       | 
       | Would be nice to highlight what AST stands for in the first
       | sentence of that section! :D
        
         | fintler wrote:
         | In the context of this article -- it's mostly just talking
         | about Python-specific ASTs.
         | 
         | Reading this article might be confusing to someone who's trying
         | to learn what an AST is. ASTs are not unique to Python, they're
         | just a common data structure used in compiler design.
         | 
         | ASTs are used by compilers like this:
         | 
         | 1) A compiler will take source code and process it into little
         | pieces called tokens (e.g., a number, an equals sign, a
         | variable type, etc) with a little program called a "lexer".
         | 
         | 2) Then, those tokens are processed by a "parser" -- which is a
         | little program that inputs the tokens from the lexer, as well
         | as a description of a programming language (e.g. a Chomsky
         | context-free grammar in Backus Naur Form) and outputs an AST.
         | 
         | 3) Then finally, the AST nodes are walked and machine code is
         | generated.
         | 
         | This article hooks into the AST inside the Python "compiler"
         | between steps 2 and 3 to do some analysis on the AST instead of
         | converting it to something that can be executed (e.g. machine
         | code or some other IR). Which, is a very useful thing, but
         | probably not a good introduction to compilers.
         | 
         | If you're new to compilers, I suggest staying away from the
         | Python "ast" module until you're comfortable with general
         | compiler design. Maybe start with playing around with something
         | like PLY instead -- create a simple little language yourself
         | and write a compiler for it:
         | 
         | <https://www.dabeaz.com/ply/ply.html#ply_nn2>
        
           | tusharsadhwani wrote:
           | I'll agree, it's not a good introduction to compilers, but it
           | isn't meant to be.
           | 
           | PLY on the other hand is an amazing resource, thanks for
           | linking it here.
        
         | tusharsadhwani wrote:
         | My bad xD, to my credit it's mentioned later in the article.
         | But you're right, I should add that in the beginning.
        
       | apurtbapurt wrote:
       | I maintain some code that rely on Python AST for finding and
       | packaging modules with appropriate class signatures when building
       | customer specific distributions. It works really well most of the
       | time. And, it is a lot easier to maintain than 50+ separate wheel
       | definitions.
       | 
       | The one big drawback is that the AST for even trivial code
       | patterns has had a history of changing between Python versions.
       | This makes it more annoying than usual to support multiple
       | versions at the same time. Luckily 3.9 and 3.10 hasn't brought
       | any changes that impacted my codebase, as far as I've noticed.
        
         | tusharsadhwani wrote:
         | The only major changes that I'm aware of since python3 has been
         | the change with keyword arguments in 3.6, and the deprecation
         | of Index and introduction of Constant more recently. Those are
         | big changes, but relatively small and maintainable imo. What
         | challenges have you faced?
        
           | masklinn wrote:
           | > the deprecation of Index and introduction of Constant more
           | recently.
           | 
           | The introduction of Constant also deprecated everything it
           | replaced (Str, Num, Bytes, and NameConstant).
           | 
           | There's also the introduction of f-strings (ast'd as
           | JoinedStr), various nodes being duplicated for their async
           | version.
           | 
           | Probably more relevant to automatically discovering
           | signatures would be the addition of positional-only arguments
           | to the `arguments` object.
           | 
           | But messing with the AST is definitely a lot more stable than
           | messing with the bytecode.
        
       | wbkang wrote:
       | This is a cool post thank you. I knew about ASTs but did not know
       | how to build them easily for Python so the second half was very
       | useful for me.
        
         | tusharsadhwani wrote:
         | I really wasn't expecting anyone to read all of it, I was
         | afraid people will either find it too trivial or too complex
         | based on skill level. So that's great to hear.
        
       | popotamonga wrote:
       | In general, for me at least i find the best way to learn about
       | something is to work in the 'internals'. For instance when react
       | came i couldn't wrap my head around it so i started my own js
       | framework, and it ended up almost exactly like react (then i
       | dumped it as it ended up just being a learning exercise)
        
         | agumonkey wrote:
         | I forgot whoever coined the saying (Feynmann or else) but I'm
         | definitely in the camp that needs to build something to feel at
         | home with it.
        
           | alansammarone wrote:
           | I believe you are referring to "What I cannot create I don't
           | understand", which is indeed by Feynman.
        
             | agumonkey wrote:
             | Most probably
        
         | sarupbanskota wrote:
         | You'll enjoy https://codecrafters.io
        
         | nefitty wrote:
         | I'm trying to become a top 0.01% JS user and creating linters,
         | flavors, etc is my plan as well. I've read through and
         | annotated the React codebase but it didn't stick very well. I
         | would have done better to create my own framework! I keep
         | having to relearn that lesson... I can have a lot of knowledge
         | about a thing through reading, but knowledge of the thing
         | requires some practical application.
         | 
         | A tangent, but as it relates to that, if anyone reading has
         | ideas on how to apply traditional computer science curriculum,
         | I would love to hear it. I can think of toy CPU emulators,
         | system architecture diagramming, language creation... But not
         | sure if there's a thing I can build that would say, "I
         | understand computer science."
        
       ___________________________________________________________________
       (page generated 2021-12-29 23:01 UTC)