[HN Gopher] Refactoring Python with Tree-sitter and Jedi
___________________________________________________________________
Refactoring Python with Tree-sitter and Jedi
Author : todsacerdoti
Score : 59 points
Date : 2024-09-24 15:02 UTC (3 days ago)
(HTM) web link (jackevans.bearblog.dev)
(TXT) w3m dump (jackevans.bearblog.dev)
| nfrankel wrote:
| I wonder if the author has ever heard something called an IDE?
| ErikBjare wrote:
| I think this particular case would be difficult to refactor
| even in an IDE like PyCharm, which afaik is the best at
| refactoring Python (might be outdated).
| lispisok wrote:
| yes but how does the IDE do it?
| rustyminnow wrote:
| What's an IDE and how does it refactor hundreds of semantically
| unrelated identifiers in one go?
| fiddlerwoaroof wrote:
| IDEs are great if your refactorings fit in the predefined
| refactorings
| Jackevansevo wrote:
| Author here, I'm not aware of any IDE that can do this specific
| refactor
| morningsam wrote:
| PyCharm understands pytest fixtures and if this is really
| just about a single fixture called "database", it takes 3
| seconds to do this refactoring by just renaming it.
| 1-more wrote:
| Write instructions on how to do this in any IDE.
| morningsam wrote:
| In PyCharm: Move cursor on any occurence or definition of
| "database" fixture, press the "Rename" hotkey (Shift+F6),
| delete old name and type new name, press Enter key to
| confirm.
| morgante wrote:
| Nice (simple) introduction to the tree sitter APIs.
|
| If you're looking for a higher level interface, GritQL[0] is
| built on top of tree-sitter and could handle the same refactor
| with this query: language python `def
| $_($_): $_` as $func where $func <: contains `database` => `db`
|
| [0] https://github.com/getgrit/gritql
| seanhunter wrote:
| Tree-sitter is really powerful, but it's worth people learning a
| few methods they prefer to use because there are going to be
| situations where one method works better than another. Things I
| have found useful in the past include
|
| - perl -pi -e 's/foo/bar/g' _files
|
| "-pi" means "in place edit" so it will change the files in place.
| If you have a purely mechanical change like he's doing here it's
| a very reasonable choice. If you're not as much of a cowboy as I
| am, you can specify a suffix and it will back the files up, so
| something like
|
| perl -p -i.bak -e 's/db/database/g' _py
|
| For example then all your original '.py' files will be copied to
| '.py.bak' and the new renamed versions will be '.py'
|
| For vim users (I know emacs has the same thing but I don't
| remember the exact invocation because it has been >20years since
| I used emacs as my main editor) it's worth knowing the "global"
| command. So you can execute a particular command only on lines
| that match some regex. So say you want to delete all the lines
| which mention cheese
|
| :%g/cheese/d
|
| Say you want to replace "db" with "database" but only on lines
| which start with "def"
|
| :%g/^def/s/db/database/
|
| OK cool. Now if you go 'vim *py' you can do ":argdo
| g/^def/s/db/database/ | update" and it will perform that global
| command across all the files in the arg list and save the ones
| which have changed.
| _whiteCaps_ wrote:
| I'd reach for argdo as well - but I don't think this covers his
| use case of:
|
| > every instance of a pytest fixture
|
| Although it's probably good enough for 99% of the use cases,
| and any extra accidental renames could be reverted when you
| look at the diff.
|
| Maybe it could be covered with a multi line regex using `\\_.`
| Jackevansevo wrote:
| Author here: I'm super familiar with this kind of find and
| replace syntax inside vim or with sed. Usually it works great!
|
| But in this specific situation it was tricky to handle
| situations with things spanning over multiple lines +
| preventing accidental renames.
| seanhunter wrote:
| I realise that and like the article. I was trying to convey
| in my response that devs should have these things in their
| toolkit not that you "did the wrong thing"[1] somehow by
| using treesitter for this.
|
| [1] like that's even possible in this situation
| avianlyric wrote:
| Interesting use of treesitter. But I'm a little surprised that
| treesitters built in query language wasn't used.
|
| There's no need to manually iterate through the tree, and use if
| statements to select nodes. Instead you can just write a couple
| of simple queries (and even use treesitters web UI to test the
| queries), and have the treesitter just provide all the nodes for
| you.
|
| https://tree-sitter.github.io/tree-sitter/using-parsers#patt...
| hetspookjee wrote:
| Having no experience with treesitter I find the query language
| rather hard to parse. From a practical point of view and
| experimenting with the library I'm not surprised to go with
| this nested For loop approach.
| 147 wrote:
| I've always wanted to do mechanical refactors and recently ran
| into the problem the author ran into where tree-sitter can't
| write back the AST as source. Is there an alternative that is
| able to do this for most programming languages?
| desbo wrote:
| Would've been easy with fastmod:
| https://github.com/facebookincubator/fastmod
| westurner wrote:
| > _I do wish tree-sitter had a mechanism to directly manipulate
| the AST. I was unable to simply rename /delete nodes and then
| write the AST back to disk. Instead I had to use Jedi or
| manually edit the source (and then deal with nasty off-set re-
| parsing logic)._
|
| Or libCST: https://github.com/Instagram/LibCST docs:
| https://libcst.readthedocs.io/en/latest/ :
|
| > _LibCST parses Python 3.0 - > 3.12 source code as a CST tree
| that keeps all formatting details (comments, whitespaces,
| parentheses, etc). It's useful for building automated
| refactoring (_codemod _) applications and linters._
|
| libcst_transformer.py: https://gist.github.com/sangwoo-
| joh/26e9007ebc2de256b0b3deed... :
|
| > _example code for renaming variables using libcst_ [w /
| Visitors and Transformers]
|
| Refactoring because it doesn't pass formal verification:
| https://deal.readthedocs.io/basic/verification.html#backgrou...
| :
|
| > _2021. deal-solver. We released a tool that converts Python
| code (including deal contracts) into Z3 theorems that can be
| formally verified_
| westurner wrote:
| Vim python-mode: https://github.com/python-mode/python-
| mode/blob/e01c27e8c17b... :
|
| > Pymode can rename everything: classes, functions, modules,
| packages, methods, variables and keyword arguments.
|
| > Keymap for rename method/function/class/variables under
| cursor let g:pymode_rope_rename_bind =
| '<C-c>rr
|
| python-rope/ropevim also has mappings for refactorings like
| renaming a variable: https://github.com/python-
| rope/ropevim#keybinding : C-c r r
| :RopeRename C-c f find occurrences
|
| https://github.com/python-rope/ropevim#finding-occurrences
|
| Their README now recommends pylsp-rope:
|
| > _If you are using ropevim, consider using pylsp-rope in
| Vim_
|
| python-rope/pylsp-rope: https://github.com/python-rope/pylsp-
| rope :
|
| > Finding Occurrences: _The find occurrences command (_ C-c f
| _by default) can be used to find the occurrences of a python
| name. If unsure option is yes, it will also show unsure
| occurrences; unsure occurrences are indicated with a ? mark
| in the end. Note that ropevim uses the quickfix feature of
| vim for marking occurrence locations._ [...]
|
| > Rename: _When Rename is triggered, rename the symbol under
| the cursor. If the symbol under the cursor points to a module
| /package, it will move that module/package files_
|
| SpaceVim > Available Layers > lang#python > LSP key Bindings:
| https://spacevim.org/layers/lang/python/#lsp-key-bindings :
| SPC l e rename symbol
|
| Vscode Python variable renaming:
|
| Vscode tips and tricks > Multi cursor selection:
| https://code.visualstudio.com/docs/getstarted/tips-and-
| trick... :
|
| > _You can add additional cursors to all occurrences of the
| current selection with Ctrl+Shift+L._ [And then rename the
| occurrences in the local file]
|
| https://code.visualstudio.com/docs/editor/refactoring#_renam.
| .. :
|
| > Rename symbol: _Renaming is a common operation related to
| refactoring source code, and VS Code has a separate Rename
| Symbol command (F2). Some languages support renaming a symbol
| across files. Press F2, type the new desired name, and press
| Enter. All instances of the symbol across all files will be
| renamed_
| ruined wrote:
| what are some other tools like jedi? it would be cool to have a
| list of the favored tool for each language, or a meta-tool.
|
| there's tsmod at least https://github.com/WolkSoftware/tsmod
|
| i've heard of fastmod, codemod but never used them.
| rty32 wrote:
| In the JavaScript world, jscodeshift and its upstream tool
| recast are frequently used. I believe you could do the same
| thing with esbuild and some Rust based tools, but these two are
| probably the most popular.
| pbreit wrote:
| I'm wondering if this would be fairly easy to do with AI?
| gloflo wrote:
| What kind of "AI"? LLM-based hype would probably miss random
| ones.
| alexpovel wrote:
| These sorts of cases are why I wrote srgn [0]. It's based on
| tree-sitter too. Calling it as cat file.py |
| srgn --py def --py identifiers 'database' 'db'
|
| will _replace_ all mentions of `database` inside identifiers
| inside (only!) function definitions (`def`) with `db`.
|
| An input like import database import
| pytest @pytest.fixture() def
| test_a(database): return database
| def test_b(database): return database
| database = "database" class database:
| pass
|
| is turned into import database import
| pytest @pytest.fixture() def
| test_a(db): return db def
| test_b(db): return db database =
| "database" class database: pass
|
| which seems roughly like what the author is after. Mentions of
| "database" _outside_ function definitions are not modified. That
| sort of logic I always found hard to replicate in basic GNU-like
| tools. If run without stdin, the above command runs recursively,
| in-place (careful with that one!).
|
| Note: I just wrote this, and version 0.13.2 is required for the
| above to work.
|
| [0]: https://github.com/alexpovel/srgn
| Jackevansevo wrote:
| This is super cool! I wish I'd known about this.
| caeruleus wrote:
| There is a Python library/tool called Bowler
| (https://pybowler.io/docs/basics-intro) that allows selecting and
| transforming elements on a concrete syntax tree. From my limited
| experience with it, I guess it would have been a nice fit for
| this refactoring.
| _jayhack_ wrote:
| Interesting refactor!
|
| This is trivial with codegen.com. Syntax below: #
| Iterate through all files in the codebase for file in
| codebase.files: # Check for functions with the
| pytest.fixture decorator for function in
| file.functions: if any(d.name == "fixture" for d in
| function.decorators): # Rename the 'db'
| parameter to 'database' db_param =
| function.get_parameter("db") if db_param:
| db_param.set_name("database") # Log the
| modification print(f"Modified
| {function.name}")
|
| Live example: https://www.codegen.sh/codemod/4697/public/diff
| poincaredisk wrote:
| Consider indenting your code block, it's unreadable as it is
| now.
| jesus_meza wrote:
| That's pretty sick. Super readable with python :)
|
| Is each file getting parsed individually with tree-sitter or
| how is the codebase object constructed?
___________________________________________________________________
(page generated 2024-09-27 23:00 UTC)