[HN Gopher] Refactoring Python with Tree-sitter and Jedi
       ___________________________________________________________________
        
       Refactoring Python with Tree-sitter and Jedi
        
       Author : todsacerdoti
       Score  : 59 points
       Date   : 2024-09-24 15:02 UTC (3 days ago)
        
 (HTM) web link (jackevans.bearblog.dev)
 (TXT) w3m dump (jackevans.bearblog.dev)
        
       | nfrankel wrote:
       | I wonder if the author has ever heard something called an IDE?
        
         | ErikBjare wrote:
         | I think this particular case would be difficult to refactor
         | even in an IDE like PyCharm, which afaik is the best at
         | refactoring Python (might be outdated).
        
         | lispisok wrote:
         | yes but how does the IDE do it?
        
         | rustyminnow wrote:
         | What's an IDE and how does it refactor hundreds of semantically
         | unrelated identifiers in one go?
        
         | fiddlerwoaroof wrote:
         | IDEs are great if your refactorings fit in the predefined
         | refactorings
        
         | Jackevansevo wrote:
         | Author here, I'm not aware of any IDE that can do this specific
         | refactor
        
           | morningsam wrote:
           | PyCharm understands pytest fixtures and if this is really
           | just about a single fixture called "database", it takes 3
           | seconds to do this refactoring by just renaming it.
        
         | 1-more wrote:
         | Write instructions on how to do this in any IDE.
        
           | morningsam wrote:
           | In PyCharm: Move cursor on any occurence or definition of
           | "database" fixture, press the "Rename" hotkey (Shift+F6),
           | delete old name and type new name, press Enter key to
           | confirm.
        
       | morgante wrote:
       | Nice (simple) introduction to the tree sitter APIs.
       | 
       | If you're looking for a higher level interface, GritQL[0] is
       | built on top of tree-sitter and could handle the same refactor
       | with this query:                 language python            `def
       | $_($_): $_` as $func where $func <: contains `database` => `db`
       | 
       | [0] https://github.com/getgrit/gritql
        
       | seanhunter wrote:
       | Tree-sitter is really powerful, but it's worth people learning a
       | few methods they prefer to use because there are going to be
       | situations where one method works better than another. Things I
       | have found useful in the past include
       | 
       | - perl -pi -e 's/foo/bar/g' _files
       | 
       | "-pi" means "in place edit" so it will change the files in place.
       | If you have a purely mechanical change like he's doing here it's
       | a very reasonable choice. If you're not as much of a cowboy as I
       | am, you can specify a suffix and it will back the files up, so
       | something like
       | 
       | perl -p -i.bak -e 's/db/database/g' _py
       | 
       | For example then all your original '.py' files will be copied to
       | '.py.bak' and the new renamed versions will be '.py'
       | 
       | For vim users (I know emacs has the same thing but I don't
       | remember the exact invocation because it has been >20years since
       | I used emacs as my main editor) it's worth knowing the "global"
       | command. So you can execute a particular command only on lines
       | that match some regex. So say you want to delete all the lines
       | which mention cheese
       | 
       | :%g/cheese/d
       | 
       | Say you want to replace "db" with "database" but only on lines
       | which start with "def"
       | 
       | :%g/^def/s/db/database/
       | 
       | OK cool. Now if you go 'vim *py' you can do ":argdo
       | g/^def/s/db/database/ | update" and it will perform that global
       | command across all the files in the arg list and save the ones
       | which have changed.
        
         | _whiteCaps_ wrote:
         | I'd reach for argdo as well - but I don't think this covers his
         | use case of:
         | 
         | > every instance of a pytest fixture
         | 
         | Although it's probably good enough for 99% of the use cases,
         | and any extra accidental renames could be reverted when you
         | look at the diff.
         | 
         | Maybe it could be covered with a multi line regex using `\\_.`
        
         | Jackevansevo wrote:
         | Author here: I'm super familiar with this kind of find and
         | replace syntax inside vim or with sed. Usually it works great!
         | 
         | But in this specific situation it was tricky to handle
         | situations with things spanning over multiple lines +
         | preventing accidental renames.
        
           | seanhunter wrote:
           | I realise that and like the article. I was trying to convey
           | in my response that devs should have these things in their
           | toolkit not that you "did the wrong thing"[1] somehow by
           | using treesitter for this.
           | 
           | [1] like that's even possible in this situation
        
       | avianlyric wrote:
       | Interesting use of treesitter. But I'm a little surprised that
       | treesitters built in query language wasn't used.
       | 
       | There's no need to manually iterate through the tree, and use if
       | statements to select nodes. Instead you can just write a couple
       | of simple queries (and even use treesitters web UI to test the
       | queries), and have the treesitter just provide all the nodes for
       | you.
       | 
       | https://tree-sitter.github.io/tree-sitter/using-parsers#patt...
        
         | hetspookjee wrote:
         | Having no experience with treesitter I find the query language
         | rather hard to parse. From a practical point of view and
         | experimenting with the library I'm not surprised to go with
         | this nested For loop approach.
        
       | 147 wrote:
       | I've always wanted to do mechanical refactors and recently ran
       | into the problem the author ran into where tree-sitter can't
       | write back the AST as source. Is there an alternative that is
       | able to do this for most programming languages?
        
       | desbo wrote:
       | Would've been easy with fastmod:
       | https://github.com/facebookincubator/fastmod
        
         | westurner wrote:
         | > _I do wish tree-sitter had a mechanism to directly manipulate
         | the AST. I was unable to simply rename /delete nodes and then
         | write the AST back to disk. Instead I had to use Jedi or
         | manually edit the source (and then deal with nasty off-set re-
         | parsing logic)._
         | 
         | Or libCST: https://github.com/Instagram/LibCST docs:
         | https://libcst.readthedocs.io/en/latest/ :
         | 
         | > _LibCST parses Python 3.0 - > 3.12 source code as a CST tree
         | that keeps all formatting details (comments, whitespaces,
         | parentheses, etc). It's useful for building automated
         | refactoring (_codemod _) applications and linters._
         | 
         | libcst_transformer.py: https://gist.github.com/sangwoo-
         | joh/26e9007ebc2de256b0b3deed... :
         | 
         | > _example code for renaming variables using libcst_ [w /
         | Visitors and Transformers]
         | 
         | Refactoring because it doesn't pass formal verification:
         | https://deal.readthedocs.io/basic/verification.html#backgrou...
         | :
         | 
         | > _2021. deal-solver. We released a tool that converts Python
         | code (including deal contracts) into Z3 theorems that can be
         | formally verified_
        
           | westurner wrote:
           | Vim python-mode: https://github.com/python-mode/python-
           | mode/blob/e01c27e8c17b... :
           | 
           | > Pymode can rename everything: classes, functions, modules,
           | packages, methods, variables and keyword arguments.
           | 
           | > Keymap for rename method/function/class/variables under
           | cursor                 let g:pymode_rope_rename_bind =
           | '<C-c>rr
           | 
           | python-rope/ropevim also has mappings for refactorings like
           | renaming a variable: https://github.com/python-
           | rope/ropevim#keybinding :                 C-c r r
           | :RopeRename       C-c f     find occurrences
           | 
           | https://github.com/python-rope/ropevim#finding-occurrences
           | 
           | Their README now recommends pylsp-rope:
           | 
           | > _If you are using ropevim, consider using pylsp-rope in
           | Vim_
           | 
           | python-rope/pylsp-rope: https://github.com/python-rope/pylsp-
           | rope :
           | 
           | > Finding Occurrences: _The find occurrences command (_ C-c f
           | _by default) can be used to find the occurrences of a python
           | name. If unsure option is yes, it will also show unsure
           | occurrences; unsure occurrences are indicated with a ? mark
           | in the end. Note that ropevim uses the quickfix feature of
           | vim for marking occurrence locations._ [...]
           | 
           | > Rename: _When Rename is triggered, rename the symbol under
           | the cursor. If the symbol under the cursor points to a module
           | /package, it will move that module/package files_
           | 
           | SpaceVim > Available Layers > lang#python > LSP key Bindings:
           | https://spacevim.org/layers/lang/python/#lsp-key-bindings :
           | SPC l e  rename symbol
           | 
           | Vscode Python variable renaming:
           | 
           | Vscode tips and tricks > Multi cursor selection:
           | https://code.visualstudio.com/docs/getstarted/tips-and-
           | trick... :
           | 
           | > _You can add additional cursors to all occurrences of the
           | current selection with Ctrl+Shift+L._ [And then rename the
           | occurrences in the local file]
           | 
           | https://code.visualstudio.com/docs/editor/refactoring#_renam.
           | .. :
           | 
           | > Rename symbol: _Renaming is a common operation related to
           | refactoring source code, and VS Code has a separate Rename
           | Symbol command (F2). Some languages support renaming a symbol
           | across files. Press F2, type the new desired name, and press
           | Enter. All instances of the symbol across all files will be
           | renamed_
        
       | ruined wrote:
       | what are some other tools like jedi? it would be cool to have a
       | list of the favored tool for each language, or a meta-tool.
       | 
       | there's tsmod at least https://github.com/WolkSoftware/tsmod
       | 
       | i've heard of fastmod, codemod but never used them.
        
         | rty32 wrote:
         | In the JavaScript world, jscodeshift and its upstream tool
         | recast are frequently used. I believe you could do the same
         | thing with esbuild and some Rust based tools, but these two are
         | probably the most popular.
        
       | pbreit wrote:
       | I'm wondering if this would be fairly easy to do with AI?
        
         | gloflo wrote:
         | What kind of "AI"? LLM-based hype would probably miss random
         | ones.
        
       | alexpovel wrote:
       | These sorts of cases are why I wrote srgn [0]. It's based on
       | tree-sitter too. Calling it as                    cat file.py |
       | srgn --py def --py identifiers 'database' 'db'
       | 
       | will _replace_ all mentions of `database` inside identifiers
       | inside (only!) function definitions (`def`) with `db`.
       | 
       | An input like                   import database         import
       | pytest                   @pytest.fixture()         def
       | test_a(database):             return database
       | def test_b(database):             return database
       | database = "database"                   class database:
       | pass
       | 
       | is turned into                   import database         import
       | pytest                   @pytest.fixture()         def
       | test_a(db):             return db                   def
       | test_b(db):             return db                   database =
       | "database"                   class database:             pass
       | 
       | which seems roughly like what the author is after. Mentions of
       | "database" _outside_ function definitions are not modified. That
       | sort of logic I always found hard to replicate in basic GNU-like
       | tools. If run without stdin, the above command runs recursively,
       | in-place (careful with that one!).
       | 
       | Note: I just wrote this, and version 0.13.2 is required for the
       | above to work.
       | 
       | [0]: https://github.com/alexpovel/srgn
        
         | Jackevansevo wrote:
         | This is super cool! I wish I'd known about this.
        
       | caeruleus wrote:
       | There is a Python library/tool called Bowler
       | (https://pybowler.io/docs/basics-intro) that allows selecting and
       | transforming elements on a concrete syntax tree. From my limited
       | experience with it, I guess it would have been a nice fit for
       | this refactoring.
        
       | _jayhack_ wrote:
       | Interesting refactor!
       | 
       | This is trivial with codegen.com. Syntax below:                 #
       | Iterate through all files in the codebase       for file in
       | codebase.files:           # Check for functions with the
       | pytest.fixture decorator           for function in
       | file.functions:               if any(d.name == "fixture" for d in
       | function.decorators):                   # Rename the 'db'
       | parameter to 'database'                   db_param =
       | function.get_parameter("db")                   if db_param:
       | db_param.set_name("database")                       # Log the
       | modification                       print(f"Modified
       | {function.name}")
       | 
       | Live example: https://www.codegen.sh/codemod/4697/public/diff
        
         | poincaredisk wrote:
         | Consider indenting your code block, it's unreadable as it is
         | now.
        
         | jesus_meza wrote:
         | That's pretty sick. Super readable with python :)
         | 
         | Is each file getting parsed individually with tree-sitter or
         | how is the codebase object constructed?
        
       ___________________________________________________________________
       (page generated 2024-09-27 23:00 UTC)