[HN Gopher] Show HN: Leaping - Debug Python tests instantly with...
___________________________________________________________________
Show HN: Leaping - Debug Python tests instantly with an LLM
debugger
Hi HN! We're Adrien and Kanav. We met at our previous job, where we
spent about a third of our lives combating a constant firehose of
bugs. In the hope of reducing this pain for others in the future,
we're working on automating debugging. We're currently working on
a platform that ingests logs and then automatically reproduces,
root causes and ultimately fixes production bugs as they happen.
You can see some of our work on this here -
https://news.ycombinator.com/item?id=39528087 As we were building
the root-cause phase of our automated debugger, we realized that we
developed something that resembled an omniscient debugger. Like an
omniscient debugger, it also keeps track of variable assignments
over time but, you can interact with it at a higher level than a
debugger using natural language. We ended up sticking it in a
pytest plugin and have been using it ourselves for development over
the past few weeks. Using this pytest plugin, you're able to
reason at a much higher level than conventional debuggers and can
ask questions like: - Why did function x get hit? - Why was
variable y set to this value? - What changes can I make to this
code to make this test pass? Here's a brief demo of this in
action: https://www.loom.com/share/94ebe34097a343c39876d7109f2a1428
To achieve this, we first instrument the test using sys.settrace
(or, on versions of python >3.12, the far better sys.monitoring!)
to keep a history of all the functions that were called, along with
the calling line numbers. We then re-run the test and use AST
parsing to find all the variable assignments and keep track of
those changes over time. We also use AST parsing to obtain the
source code for these functions. We then neatly format and pass all
this context to GPT. We'd love it if you checked the pytest plugin
out and we welcome all feedback :) . If you want to chat bugs, our
emails are also always open - kanav@leaping.io
Author : kvptkr
Score : 79 points
Date : 2024-03-22 14:52 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| biddit wrote:
| Nice work! I watched the demo and can see how it will generate
| fixes for you, which you then copy and paste into the editor.
| Perhaps you could consider automating this process like Aider[1]
| does, whereby you force the LLM to generate a git diff for the
| fix and automatically commit it.
|
| 1. https://github.com/paul-gauthier/aider
| kvptkr wrote:
| Haha thanks! Yeah, I think that's definitely a logical next
| step. We do something similar for the larger bug resolution
| platform we've been working on, so it shouldn't be too hard to
| port over!
| bavell wrote:
| I've come across Aider before and was trying to remember the
| name just the other day - thanks!!
| drcongo wrote:
| This sounds great, but breaks pytest on the projects I've tried
| it on - both Django projects, both in different ways. I'm
| interested enough that I'll be keeping an eye on progress though!
| kvptkr wrote:
| Oof, I'm sorry to hear that - I don't think we had any Django
| projects in the set of projects we were testing this out on. I
| just filed an issue here and hopefully fix it asap -
| https://github.com/leapingio/leaping/issues/2
| danShumway wrote:
| > To achieve this, we first instrument the test using
| sys.settrace (or, on versions of python >3.12, the far better
| sys.monitoring!) to keep a history of all the functions that were
| called, along with the calling line numbers. We then re-run the
| test and use AST parsing to find all the variable assignments and
| keep track of those changes over time. We also use AST parsing to
| obtain the source code for these functions.
|
| I don't want to be negative on someone's Show HN post, but it
| seems like getting all of this and showing it to the user would
| be way more helpful than showing it to the LLM?
|
| My standard sometimes when I'm thinking about this kind of stuff
| is "would I want this if the LLM was swapped out for an actual
| human?" So would I want a service that gets all this useful
| information, then hands it off to a Python coder (even a very
| good Python coder) with no other real context about the overall
| project, and then I had to ask them why my test broke instead of
| being able to look at the info myself? I don't think I'd want
| that. I've worked with co-workers who I really respect; I still
| don't want to do remote debugging with them over Slack, I want to
| be able to see the data myself.
|
| Going through a middleperson to find out which code paths my code
| has hit will nearly always be slower than just showing me the
| full list of every code path my code just hit. Of course I want
| filtering and search and all of that, but I want those as ways of
| filtering the data, not ways of controlling access to the data.
|
| It feels like you've made something really useful -- an
| omniscient debugger that tracks state changes over time -- and
| then you've hooked it up to something that would make it
| considerably less useful. I've done debugging with state
| libraries like Redux where I can track changes to data over time,
| it makes debugging _way easier_. It 's great, it changes my
| relationship to how I think about code. So it's genuinely super
| cool to be able to use something like that in other situations.
| But at no point have I ever thought while using a state tracking
| tool, "I wish I had to have a conversation with this thing in
| order to get access to the timeline."
|
| Again, I don't want to be too negative. AI is all the hotness so
| I guess if you can pump all of that data into an LLM there's no
| reason not to since it'll generate more attention for the
| project. But it might not be a bad idea to also allow straight
| querying of the data passed to the LLM and data export that could
| be used to build more visual, user-controlled tools.
|
| Just opinion me, feel free to disregard.
| skydhash wrote:
| Not wanting to be negative too, but I've used debuggers like
| gdb, the ones in JetBrains's IDEs, XCode's, and every time it's
| not lack of information that's stopping me from solving the
| issue. Coding common lisp with sly and emacs or smalltalk with
| pharo is much entertaining than chatting with an LLM. Coding
| with a good debugger is very close to that (even the one inside
| the browser for JavaScript). I think we can design better tools
| than hook everything to a LLM that requires 128gb of ram to run
| locally.
| kvptkr wrote:
| Interesting! I think you're right in saying that the middleman
| you're talking about has to be really good for something like
| this to actually be useful, especially for people very
| comfortable with debugging tools + their codebase. If I
| understand correctly, you're saying that the most productive
| tool for you would be one that can present you with more
| relevant data, in a structured way (Redux, etc). At first, we
| actually did think of making a nice IDE with all that data, but
| found out it kind of already exists - https://pytrace.com/, and
| we found it to be more cumbersome to use than anything! Our
| belief is that these tools can help, but there's an irreducible
| amount of reasoning that needs to happen, which takes time and
| effort, and we think a tool like this might be able to reduce
| that by offloading the reasoning to LLMs. I guess what I'm
| saying is that there's a cap on how much useful information a
| tool can give a developer to enable them to reason better, and
| I'm really interested in seeing if it's possible to reduce how
| much reasoning is needed in the first place. Curious to hear
| what you think - thanks for the thoughtful comment!
| pedrovhb wrote:
| I thought of something similar these days but with a different
| approach - rather than settrace, it would use a subclass of
| bdb.Bdb (the standard library base debugger, on top of which Pdb
| is built) to actually have the LLM run a real debugging session.
| It'd place breakpoints (or postmortem sessions after an uncaught
| exception) to drop into a repl which allows going up/down the
| frame stack at a given execution point, listing local state for
| frames, running code on the repl to try out hypotheses or
| understand the cause of an exception, look at methods available
| for the objects in scope, etc. This is similar to what you'd get
| by running the `%debug` magic on IPython after an uncaught
| exception in a cell (try it out).
|
| The quick LLM input/repl output look is more suitable for local
| models though, where you can control hidden state cache, have
| lower latency, and enforce a grammar to ensure it doesn't go off
| the rails/commands implemented for interacting with the debugger,
| which afaik you can't do with services like OpenAI's. This is
| something I'd like to see more of - having low level control of a
| model gives qualitatively different ways of using it which I
| haven't seen people explore that much.
| kvptkr wrote:
| So interestingly enough, we first tried letting GPT interact
| with pdb, through just a set of directed prompts, but we found
| that it kept hallucinating commands, not responding with the
| correct syntax and really struggling with line numbers. That's
| why we pivoted to just getting all the relevant data upfront
| GPT could need and letting GPT synthesize that data into a
| singular root cause.
|
| I think we're going to explore the local model approach though
| - you raise some really great points about having more granular
| control over the state of the model.
| pedrovhb wrote:
| Interesting! Did you try the function calling API? I feel you
| with the line number troubles, it's hard to get something
| consistent there. Using diffs with GPT-4 isn't much better in
| my experience; I didn't extensively test that, but from what
| I did it rarely produced synctatically valid diffs that could
| just be sent to `patch`. One approach I started playing with
| was using tree-sitter to add markers to code and let the LLM
| specify marker ranges for deletion/insertion/replacement, but
| alas, I got distracted before fully going through with it.
|
| In any case, I'll keep an eye on the project, good luck! Let
| me know if you ever need an extra set of hands, I find this
| stuff pretty interesting to think about :)
| janpf wrote:
| I actually coded something very close to this and it worked
| surprisingly well: https://github.com/janpf/debuggAIr
| kvptkr wrote:
| Ooh, interesting - starred and going to dig into this later
| today!
| stuaxo wrote:
| I've done a manual version of this with chatgpt.
|
| I had ipdb, told it to request any variables that I should look
| at, suggest what to do next, what it would expect - it was
| quite good, but took a lot of persuading, just having an LLM
| that was more tuned to this would be better.
| brumar wrote:
| On the reddit discussion, one user pointed out that email
| adresses were incidently collected.
| https://www.reddit.com/r/programming/s/lBfxL7f2KM
| adrienphila wrote:
| Was removed here:
| https://github.com/leapingio/leaping/commit/e42d5198abe48875...
___________________________________________________________________
(page generated 2024-03-22 23:00 UTC)