[HN Gopher] Refactoring Python with Tree-sitter and Jedi
       ___________________________________________________________________
        
       Refactoring Python with Tree-sitter and Jedi
        
       Author : todsacerdoti
       Score  : 136 points
       Date   : 2024-09-24 15:02 UTC (4 days ago)
        
 (HTM) web link (jackevans.bearblog.dev)
 (TXT) w3m dump (jackevans.bearblog.dev)
        
       | nfrankel wrote:
       | I wonder if the author has ever heard something called an IDE?
        
         | ErikBjare wrote:
         | I think this particular case would be difficult to refactor
         | even in an IDE like PyCharm, which afaik is the best at
         | refactoring Python (might be outdated).
        
         | lispisok wrote:
         | yes but how does the IDE do it?
        
         | rustyminnow wrote:
         | What's an IDE and how does it refactor hundreds of semantically
         | unrelated identifiers in one go?
        
           | cstrahan wrote:
           | I'm not a Python developer, but...
           | 
           | I believe the idea is that those identifiers _are_
           | semantically related: that fixture decorator inspects the
           | formal parameter names so that it can pass the appropriate
           | arguments to each test when the tests are run. A sufficiently
           | smart IDE and /or language server would thus know that these
           | identifiers are related, and performing a rename on one
           | instance would thus rename all of the others.
           | 
           | And maybe you were being facetious, but an IDE is an
           | "Integrated Development Environment".
           | 
           | Edit: Yep. Took all of 60 seconds to find what I'm looking
           | for, as I type this from my phone while sitting in my throne
           | room: https://docs.pytest.org/en/6.2.x/fixture.html
           | 
           | See the "Fixtures can request other fixtures" section, which
           | describes the scenario from TFA.
           | 
           | And this post describes the PyCharm support for refactoring
           | fixtures: https://www.jetbrains.com/guide/pytest/tutorials/vi
           | sual_pyte...
        
         | fiddlerwoaroof wrote:
         | IDEs are great if your refactorings fit in the predefined
         | refactorings
        
         | Jackevansevo wrote:
         | Author here, I'm not aware of any IDE that can do this specific
         | refactor
        
           | morningsam wrote:
           | PyCharm understands pytest fixtures and if this is really
           | just about a single fixture called "database", it takes 3
           | seconds to do this refactoring by just renaming it.
        
         | 1-more wrote:
         | Write instructions on how to do this in any IDE.
        
           | morningsam wrote:
           | In PyCharm: Move cursor on any occurence or definition of
           | "database" fixture, press the "Rename" hotkey (Shift+F6),
           | delete old name and type new name, press Enter key to
           | confirm.
        
           | cstrahan wrote:
           | The fine folks at JetBrains already have done just that:
           | 
           | https://www.jetbrains.com/guide/pytest/tutorials/visual_pyte.
           | ..
        
       | morgante wrote:
       | Nice (simple) introduction to the tree sitter APIs.
       | 
       | If you're looking for a higher level interface, GritQL[0] is
       | built on top of tree-sitter and could handle the same refactor
       | with this query:                 language python            `def
       | $_($_): $_` as $func where $func <: contains `database` => `db`
       | 
       | [0] https://github.com/getgrit/gritql
        
       | seanhunter wrote:
       | Tree-sitter is really powerful, but it's worth people learning a
       | few methods they prefer to use because there are going to be
       | situations where one method works better than another. Things I
       | have found useful in the past include
       | 
       | - perl -pi -e 's/foo/bar/g' _files
       | 
       | "-pi" means "in place edit" so it will change the files in place.
       | If you have a purely mechanical change like he's doing here it's
       | a very reasonable choice. If you're not as much of a cowboy as I
       | am, you can specify a suffix and it will back the files up, so
       | something like
       | 
       | perl -p -i.bak -e 's/db/database/g' _py
       | 
       | For example then all your original '.py' files will be copied to
       | '.py.bak' and the new renamed versions will be '.py'
       | 
       | For vim users (I know emacs has the same thing but I don't
       | remember the exact invocation because it has been >20years since
       | I used emacs as my main editor) it's worth knowing the "global"
       | command. So you can execute a particular command only on lines
       | that match some regex. So say you want to delete all the lines
       | which mention cheese
       | 
       | :%g/cheese/d
       | 
       | Say you want to replace "db" with "database" but only on lines
       | which start with "def"
       | 
       | :%g/^def/s/db/database/
       | 
       | OK cool. Now if you go 'vim *py' you can do ":argdo
       | g/^def/s/db/database/ | update" and it will perform that global
       | command across all the files in the arg list and save the ones
       | which have changed.
        
         | _whiteCaps_ wrote:
         | I'd reach for argdo as well - but I don't think this covers his
         | use case of:
         | 
         | > every instance of a pytest fixture
         | 
         | Although it's probably good enough for 99% of the use cases,
         | and any extra accidental renames could be reverted when you
         | look at the diff.
         | 
         | Maybe it could be covered with a multi line regex using `\\_.`
        
         | Jackevansevo wrote:
         | Author here: I'm super familiar with this kind of find and
         | replace syntax inside vim or with sed. Usually it works great!
         | 
         | But in this specific situation it was tricky to handle
         | situations with things spanning over multiple lines +
         | preventing accidental renames.
        
           | seanhunter wrote:
           | I realise that and like the article. I was trying to convey
           | in my response that devs should have these things in their
           | toolkit not that you "did the wrong thing"[1] somehow by
           | using treesitter for this.
           | 
           | [1] like that's even possible in this situation
        
           | tmoertel wrote:
           | For those tricky situations, there's "sledgehammer and
           | review" and the second-order git-diff trick:
           | 
           | https://blog.moertel.com/posts/2013-02-18-git-second-
           | order-d...
        
         | aulin wrote:
         | About the cowboy comment, that's what version control is for.
         | Just modify in place and then stage hunk by hunk with magit or
         | git add -p.
        
       | avianlyric wrote:
       | Interesting use of treesitter. But I'm a little surprised that
       | treesitters built in query language wasn't used.
       | 
       | There's no need to manually iterate through the tree, and use if
       | statements to select nodes. Instead you can just write a couple
       | of simple queries (and even use treesitters web UI to test the
       | queries), and have the treesitter just provide all the nodes for
       | you.
       | 
       | https://tree-sitter.github.io/tree-sitter/using-parsers#patt...
        
         | hetspookjee wrote:
         | Having no experience with treesitter I find the query language
         | rather hard to parse. From a practical point of view and
         | experimenting with the library I'm not surprised to go with
         | this nested For loop approach.
        
           | jonathanyc wrote:
           | The query language is definitely underdocumented. In case it
           | helps you, what helped me was realizing it's basically a
           | funky pattern language, a la the match pattern sublanguages
           | in OCaml/Haskell/Rust.
           | 
           | But the syntax for variable binding is idiosyncratic and the
           | opposite of normal pattern languages. Writing "x" doesn't
           | bind the thing at the position to the variable x; instead,
           | you have to write e.g. foo @x to bind x to the child of type
           | foo. Insanely, some Scheme dialects use @ with the exact
           | opposite semantics!! There's also a bizarre # syntax for
           | conditionals and statements.
           | 
           | Honestly there isn't really an excuse for how weird they made
           | the pattern syntax given that people have spent decades
           | working on pattern matching for everything from XML to
           | objects (even respecting abstraction!). I've slowly been
           | souring on treesitter in general, but paraphrasing
           | Stroustrup: there are things people complain about, and then
           | there are things nobody uses.
        
             | ckolkey wrote:
             | Its just a Scheme dialect. A bit odd, but not crazy.
        
               | jonathanyc wrote:
               | Not really. It uses S-expressions but Scheme pattern
               | matching is totally different. The most common Scheme
               | pattern matching syntax is basically the same as pattern
               | matching in any other language: x means "bind the value
               | at this position to x", not "the child node of type".
               | See: https://www.gnu.org/software/guile/manual/html_node/
               | Pattern-... or syntax-rules.
               | 
               | It's as much a Scheme dialect as WASM's S-expression form
               | is a Scheme dialect.
               | 
               | Treesitter's query syntax is slightly understandable in
               | the sense that having x match a node among siblings of
               | type x works well for extracting values out of sibling
               | lists. Most conventional pattern syntaxes struggle with
               | this, e.g. how do you match the string "foo" inside of a
               | list of strings in OCaml or Rust without leaving the
               | match expression and resorting to a loop?
               | 
               | But you could imagine a syntax-rules like use of ellipses
               | .... There's also a more powerful pattern syntax someone
               | worked on for implementing Scheme-like macros in non-S-
               | expression based languages whose name escapes me right
               | now.
        
       | 147 wrote:
       | I've always wanted to do mechanical refactors and recently ran
       | into the problem the author ran into where tree-sitter can't
       | write back the AST as source. Is there an alternative that is
       | able to do this for most programming languages?
        
       | desbo wrote:
       | Would've been easy with fastmod:
       | https://github.com/facebookincubator/fastmod
        
         | westurner wrote:
         | > _I do wish tree-sitter had a mechanism to directly manipulate
         | the AST. I was unable to simply rename /delete nodes and then
         | write the AST back to disk. Instead I had to use Jedi or
         | manually edit the source (and then deal with nasty off-set re-
         | parsing logic)._
         | 
         | Or libCST: https://github.com/Instagram/LibCST docs:
         | https://libcst.readthedocs.io/en/latest/ :
         | 
         | > _LibCST parses Python 3.0 - > 3.12 source code as a CST tree
         | that keeps all formatting details (comments, whitespaces,
         | parentheses, etc). It's useful for building automated
         | refactoring (_codemod _) applications and linters._
         | 
         | libcst_transformer.py: https://gist.github.com/sangwoo-
         | joh/26e9007ebc2de256b0b3deed... :
         | 
         | > _example code for renaming variables using libcst_ [w /
         | Visitors and Transformers]
         | 
         | Refactoring because it doesn't pass formal verification:
         | https://deal.readthedocs.io/basic/verification.html#backgrou...
         | :
         | 
         | > _2021. deal-solver. We released a tool that converts Python
         | code (including deal contracts) into Z3 theorems that can be
         | formally verified_
        
           | westurner wrote:
           | Vim python-mode: https://github.com/python-mode/python-
           | mode/blob/e01c27e8c17b... :
           | 
           | > Pymode can rename everything: classes, functions, modules,
           | packages, methods, variables and keyword arguments.
           | 
           | > Keymap for rename method/function/class/variables under
           | cursor                 let g:pymode_rope_rename_bind =
           | '<C-c>rr
           | 
           | python-rope/ropevim also has mappings for refactorings like
           | renaming a variable: https://github.com/python-
           | rope/ropevim#keybinding :                 C-c r r
           | :RopeRename       C-c f     find occurrences
           | 
           | https://github.com/python-rope/ropevim#finding-occurrences
           | 
           | Their README now recommends pylsp-rope:
           | 
           | > _If you are using ropevim, consider using pylsp-rope in
           | Vim_
           | 
           | python-rope/pylsp-rope: https://github.com/python-rope/pylsp-
           | rope :
           | 
           | > Finding Occurrences: _The find occurrences command (_ C-c f
           | _by default) can be used to find the occurrences of a python
           | name. If unsure option is yes, it will also show unsure
           | occurrences; unsure occurrences are indicated with a ? mark
           | in the end. Note that ropevim uses the quickfix feature of
           | vim for marking occurrence locations._ [...]
           | 
           | > Rename: _When Rename is triggered, rename the symbol under
           | the cursor. If the symbol under the cursor points to a module
           | /package, it will move that module/package files_
           | 
           | SpaceVim > Available Layers > lang#python > LSP key Bindings:
           | https://spacevim.org/layers/lang/python/#lsp-key-bindings :
           | SPC l e  rename symbol
           | 
           | Vscode Python variable renaming:
           | 
           | Vscode tips and tricks > Multi cursor selection:
           | https://code.visualstudio.com/docs/getstarted/tips-and-
           | trick... :
           | 
           | > _You can add additional cursors to all occurrences of the
           | current selection with Ctrl+Shift+L._ [And then rename the
           | occurrences in the local file]
           | 
           | https://code.visualstudio.com/docs/editor/refactoring#_renam.
           | .. :
           | 
           | > Rename symbol: _Renaming is a common operation related to
           | refactoring source code, and VS Code has a separate Rename
           | Symbol command (F2). Some languages support renaming a symbol
           | across files. Press F2, type the new desired name, and press
           | Enter. All instances of the symbol across all files will be
           | renamed_
        
             | morningsam wrote:
             | But does Rope understand pytest fixtures? I doubt it, but
             | would be happy to be proven wrong.
        
       | ruined wrote:
       | what are some other tools like jedi? it would be cool to have a
       | list of the favored tool for each language, or a meta-tool.
       | 
       | there's tsmod at least https://github.com/WolkSoftware/tsmod
       | 
       | i've heard of fastmod, codemod but never used them.
        
         | rty32 wrote:
         | In the JavaScript world, jscodeshift and its upstream tool
         | recast are frequently used. I believe you could do the same
         | thing with esbuild and some Rust based tools, but these two are
         | probably the most popular.
        
       | pbreit wrote:
       | I'm wondering if this would be fairly easy to do with AI?
        
         | gloflo wrote:
         | What kind of "AI"? LLM-based hype would probably miss random
         | ones.
        
           | xrd wrote:
           | Check out the gritql example from morgante. That does a lot
           | of cool things and is what you are looking for.
        
       | alexpovel wrote:
       | These sorts of cases are why I wrote srgn [0]. It's based on
       | tree-sitter too. Calling it as                    cat file.py |
       | srgn --py def --py identifiers 'database' 'db'
       | 
       | will _replace_ all mentions of `database` inside identifiers
       | inside (only!) function definitions (`def`) with `db`.
       | 
       | An input like                   import database         import
       | pytest                   @pytest.fixture()         def
       | test_a(database):             return database
       | def test_b(database):             return database
       | database = "database"                   class database:
       | pass
       | 
       | is turned into                   import database         import
       | pytest                   @pytest.fixture()         def
       | test_a(db):             return db                   def
       | test_b(db):             return db                   database =
       | "database"                   class database:             pass
       | 
       | which seems roughly like what the author is after. Mentions of
       | "database" _outside_ function definitions are not modified. That
       | sort of logic I always found hard to replicate in basic GNU-like
       | tools. If run without stdin, the above command runs recursively,
       | in-place (careful with that one!).
       | 
       | Note: I just wrote this, and version 0.13.2 is required for the
       | above to work.
       | 
       | [0]: https://github.com/alexpovel/srgn
        
         | Jackevansevo wrote:
         | This is super cool! I wish I'd known about this.
        
       | caeruleus wrote:
       | There is a Python library/tool called Bowler
       | (https://pybowler.io/docs/basics-intro) that allows selecting and
       | transforming elements on a concrete syntax tree. From my limited
       | experience with it, I guess it would have been a nice fit for
       | this refactoring.
        
         | carlmr wrote:
         | I was going to suggest libCST, it works really well and is much
         | less of a hassle to set up than this.
         | 
         | https://github.com/Instagram/LibCST
        
           | caeruleus wrote:
           | Great suggestion! Bowler seems to be abandoned actually. Its
           | README mentions wanting to rewrite on top of LibCST though
           | (https://github.com/facebookincubator/Bowler?tab=readme-ov-
           | fi...).
        
       | _jayhack_ wrote:
       | Interesting refactor!
       | 
       | This is trivial with codegen.com. Syntax below:                 #
       | Iterate through all files in the codebase       for file in
       | codebase.files:           # Check for functions with the
       | pytest.fixture decorator           for function in
       | file.functions:               if any(d.name == "fixture" for d in
       | function.decorators):                   # Rename the 'db'
       | parameter to 'database'                   db_param =
       | function.get_parameter("db")                   if db_param:
       | db_param.set_name("database")                       # Log the
       | modification                       print(f"Modified
       | {function.name}")
       | 
       | Live example: https://www.codegen.sh/codemod/4697/public/diff
        
         | poincaredisk wrote:
         | Consider indenting your code block, it's unreadable as it is
         | now.
        
           | _jayhack_ wrote:
           | Good call, thank you
        
         | jesus_meza wrote:
         | That's pretty sick. Super readable with python :)
         | 
         | Is each file getting parsed individually with tree-sitter or
         | how is the codebase object constructed?
        
           | _jayhack_ wrote:
           | We do advanced static analysis to provide programmatic access
           | to the type system, etc., based on tree-sitter and in-house
           | tech.
           | 
           | This enables APIs such as `function.call_sites`,
           | `symbol.usages`, `class.parent_classes`, and more!
        
             | pksunkara wrote:
             | Where can I learn more about this? You guys don't seem to
             | have any docs available.
        
       | otteromkram wrote:
       | Everyone's tossing in the name of other third-party packages, but
       | have you explored the language section from Python's standard
       | library?
       | 
       | https://docs.python.org/3/library/language.html
        
         | benrutter wrote:
         | I was thinking too as I read that AST could be swapped in for
         | tree sitter and I think it'd work more or less the same (not
         | sure it'd have an advantage though, unless you preferred using
         | standard library tools where possible)
        
       | ievans wrote:
       | I wrote up a Semgrep rule as a comparison to add! (also tree-
       | sitter based, `pip install Semgrep`,
       | https://github.com/semgrep/semgrep, or play with live editor
       | link: https://semgrep.dev/playground/s/nJ4rY)
       | pattern: |-            def $FUNC(..., database, ...):
       | $...BODY         fix: |-           def $FUNC(..., db, ...):
       | $...BODY
        
       | 29athrowaway wrote:
       | Use a query expression instead.
        
       | nemoniac wrote:
       | There are several straightforward ways to do this without needing
       | Tree-sitter or Jedi.
       | 
       | Here are two approaches in Emacs.
       | 
       | https://emacs.stackexchange.com/a/69571
       | 
       | https://rigsomelight.com/2010/02/14/emacs-interactively-find...
        
         | IanCal wrote:
         | Is that the same? That looks like just a text based
         | replacement.
        
       | mhw wrote:
       | I've been looking at codemod tools recently, just as a way to
       | extend my editing toolbox. I came across https://ast-
       | grep.github.io/, which looks like it might address part of this
       | problem. My initial test case was to locate all calls to a method
       | where a specific argument was 'true', and it handled that well -
       | that's the kind of thing an IDE seems to struggle with. I'm not
       | yet sure whether it could handle renaming a variable though.
       | 
       | I guess what I'm looking for is something that
       | 
       | * can carry out the kind of refactorings usually reserved for an
       | IDE
       | 
       | * has a concise language for representing the refactorings so new
       | ones can be built quite easily
       | 
       | * can apply the same refactoring in multiple places, with some
       | kind of search language (addressing the task of renaming a test
       | parameter in multiple methods)
       | 
       | * ideally does this across multiple programming languages
        
       ___________________________________________________________________
       (page generated 2024-09-28 23:01 UTC)