https://jackevans.bearblog.dev/refactoring-python-with-tree-sitter-jedi/

Jack's blog

Home Blog Atom

Refactoring Python with  Tree-sitter & Jedi

24 Sep, 2024

I was toying around with a refactor the other day that would have
taken me ages by hand as it involved 100s of files.

I wanted to rename every instance of a pytest fixture from database
-> db across my entire repo (silly I know). Unfortunately this isn't
something my editor of choice can magically refactor.

---------------------------------------------------------------------

Here's how my test files looked before:

@pytest.fixture()
def test_a(database): ...

def test_b(database): ...

def test_c(database, x): ...

def test_d(x, database): ...

def test_e(x, database, y): ...

After the refactor, this is how they look:

@pytest.fixture()
def test_a(db): ...

def test_b(db): ...

def test_c(db, x): ...

def test_d(x, db): ...

def test_e(x, db, y): ...

After struggling to achieve what I wanted with the tools I'd
typically reach for (grep + sed) I decided to try something a bit
fancier.

Parsing nodes with Tree-Sitter

The first thing to do is to find all row/column of each database
identifier:

from pathlib import Path

import tree_sitter_python as tspython
from tree_sitter import Language, Parser

PY_LANGUAGE = Language(tspython.language())

parser = Parser(PY_LANGUAGE)


def parse_func(node):
    for child in node.children:
        if child.type == "parameters":
            for sub_child in child.children:
                if sub_child.type == "identifier" and sub_child.text == b"database":
                    yield sub_child.start_point


def parse_file(path):
    tree = parser.parse(path.read_bytes())
    for child in tree.root_node.children:
        if child.type == "function_definition":
            yield from parse_func(child)


def process_file(path):
    for match in parse_file(path):
        print(match)

This prints the location of all the instances of def test_ functions.

Point(row=7, column=11)
Point(row=10, column=11)
Point(row=13, column=14)
Point(row=16, column=14)

Handling decorated functions

The above code doesn't include support decorated functions, for
example:

@pytest.fixture()
def test_a(database): ...

Decorators requires a bit more effort to handle correctly:

def parse_file(path):
    tree = parser.parse(path.read_bytes())
    for child in tree.root_node.children:
        if child.type == "function_definition":
            yield from parse_func(child)
        elif child.type == "decorated_definition":
            for sub_child in child.children:
                if sub_child.type == "function_definition":
                    yield from parse_func(sub_child)

Renaming with Jedi

Now for each row/col I can use Jedi to rename the identifier:

def process_file(path):
    for match in parse_file(path):
        script = Script(code=path.read_text(), path=str(path))
        result = script.rename(line=match.row + 1, column=match.column, new_name="db")
        result.apply()

Conclusion

Ironically I ended up not merging this change, but was a fun learning
exercise. I found both jedi and tree-sitter relatively easy to learn,
I'll certainly be keeping them in my toolbelt for situations where
grep + sed don't quite cut it.

I do wish tree-sitter had a mechanism to directly manipulate the AST.
I was unable to simply rename/delete nodes and then write the AST
back to disk. Instead I had to use Jedi or manually edit the source
(and then deal with nasty off-set re-parsing logic).

Note The astute amongst you will notice that this script does a lot
of re-parsing. I could probably optimise this further, but for a
quick project wide refactor I found this to be plenty fast enough.

Here's a video of it in action:

#python

[eLsppAybJxQbdUfAmjxR] [                    ] 4
Powered by Bear ?**?