hngopher.com

       [HN Gopher] RustAssistant: Using LLMs to Fix Compilation Errors ...
       ___________________________________________________________________
        
       RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code
        
       Author : mmastrac
       Score  : 127 points
       Date   : 2025-04-30 21:56 UTC (2 days ago)
        
 (HTM) web link (www.microsoft.com)
 (TXT) w3m dump (www.microsoft.com)
        
       | noodletheworld wrote:
       | Hot take: this is the future.
       | 
       | Strongly typed languages have a fundamentally superior iteration
       | strategy for coding agents.
       | 
       | The rust compiler, particularly, will often give extremely
       | specific "how to fix" advice... but in general I see this as a
       | future trend with rust and, increasingly, other languages.
       | 
       | Fundamentally, being able to assert "this code compiles" (and
       | iterate until it does) before returning "completed task" is
       | superior for agents to dynamic languages where the only possible
       | verification is runtime.
       | 
       | (And at best the agent can assert "i guess it looks ok")
        
         | justanything wrote:
         | would this mean that LLM would be able to generate code easier
         | for strongly typed languages?
        
           | littlestymaar wrote:
           | In an agentic scenario (when they can actually run the
           | compiler by themselves) yes.
        
             | hu3 wrote:
             | Yep.
             | 
             | I just tell LLM to create and run unit tests after applying
             | changes.
             | 
             | When tests fail, LLM can use the error message to fix code.
             | Be it compilation error of code or logic error in unit
             | tests.
        
               | greenavocado wrote:
               | I find llms extremely susceptible to spinning in a circle
               | effectively halting in these situations
        
               | hu3 wrote:
               | True. That's why my instructions file tell them to try to
               | fix once and stop if it fails again.
        
         | pseudony wrote:
         | I actually don't think it's that cut and dry. I expect
         | especially that rust (due to lifetimes) will stump LLMs -
         | fixing locally triggers a need for refactor elsewhere.
         | 
         | I actually think a language like Clojure (very functional, very
         | compositional, focus on local, stand-alone functions,
         | manipulate base data-structures (list, set, map), not
         | specialist types (~classes) would do well.
         | 
         | That said, atm. I get WAY more issues in ocaml suggestions from
         | claude than for Python. Training is king - the LLM cannot
         | reason so types are not as big a help as one might think.
        
           | littlestymaar wrote:
           | > fixing locally triggers a need for refactor elsewhere.
           | 
           | Yes, but such refactors are most of the time very mechanical,
           | and there's no reason to believe the agent won't be able to
           | do it.
           | 
           | > the LLM cannot reason so types are not as big a help as one
           | might think.
           | 
           | You are missing the point: the person you are responding
           | expects it to be superior in an _agentic_ scenario, where the
           | LLM can try its code and see the compiler output, rather than
           | in a pure text-generation situation where the LLM can only
           | assess the code from bird eye view.
        
             | inciampati wrote:
             | Mechanical repairs, and often indicative of mistakes about
             | lifetimes. So it's just part of the game.
        
             | Capricorn2481 wrote:
             | No, I think others are missing the point. An "Agentic
             | scenario" is not dissimilar from passing code manually to
             | an AI, it just does it by itself. And if you've tried to
             | use AI for Rust, you would understand why this is not
             | reliable.
             | 
             | An LLM can read compiler output, but how it corrects the
             | code is, ultimately, a semantic guess. It can look at the
             | names of types, it can use its training to guess where new
             | code should go based on types, but it's not able to
             | _actually use_ those types while making changes. It would
             | use them in the same way it would use comments, to inform
             | what code it should output. It makes a guess, checks the
             | compiler output, makes another guess, etc. This may lead to
             | code that compiles, but not code that should be committed
             | by any means. And Rust is not what I 'd call a "flexible
             | language," where lots of different coding styles are
             | acceptable in a codebase. You can easily end up with
             | brittle code.
             | 
             | So you don't get much benefits from types, but you do have
             | the overhead of semantic complexity. This is a huge problem
             | for a language like Rust, which is one of the most
             | semantically complicated languages. The best languages are
             | going to be ones that are semantically simple but popular,
             | like Golang. Although I do think Clojure's support is
             | impressive given how little code there is compared to other
             | languages.
        
           | noodletheworld wrote:
           | > so types are not as big a help as one might think.
           | 
           | Yes, they are.
           | 
           | An _agent_ can combine the compiler type system and iterate.
           | 
           | That is impossible using clojure.
           | 
           | The reason you have problems woth ocaml is that the tooling
           | youre using is too shit to support iterating until the
           | compiler passes before returning the results to you.
           | 
           | ...not because tooling doesnt exist. Not because the tooling
           | doesn't work.
           | 
           | --> because you are not using it.
           | 
           | Sure, rust ownership makes it hard for LLMs. Faaair point;
           | but ultimately, why would a coding agent ever suggest code to
           | you that doesnt compile?
           | 
           | Either: a) the agent tooling is poor or b) it is impossible
           | to verify if the code compiles.
           | 
           | One of those is a solvable problem.
           | 
           | One is not.
           | 
           | (Yes, what many current agents do is run test suites; but
           | dynamically generating valid tests is tricky; checking if
           | code compiles is not tricky.)
        
             | diggan wrote:
             | > An agent can combine the compiler type system and
             | iterate.
             | 
             | > That is impossible using clojure.
             | 
             | It might be impossible to use the compiler type system, but
             | in Clojure you have much more powerful tools for actually
             | working with your program as it runs, one would think this
             | would be a much better setup for an LLM that aims to
             | implement something.
             | 
             | Instead of just relying on the static types based on text,
             | the LLM could actually inspect the live data as the program
             | runs.
             | 
             | Besides, the LLM could also replace individual
             | functions/variables in a running program, without having to
             | restart.
             | 
             | The more I think about it, the more obvious it becomes how
             | well fitted Clojure would be for an LLM to iteratively
             | build an actual working program, compared to other static
             | approaches like using Rust.
        
               | michalsustr wrote:
               | I understand the point , however I think explicit types
               | are still superior, due to abundance of data in the
               | training phase. It seems to me to be too computationally
               | hard to incorporate a REPL-like interactive interface in
               | the gpu training loop. Since it's processing large
               | amounts of data you want to keep it simple, without back-
               | and-forth with CPUs that would kill performance. And if
               | you can't do it at training time, it's hard to expect for
               | the LLM to do well at inference time.
               | 
               | Well, if you could run clojure purely on gpu/inside the
               | neural net, that might be interesting!
        
               | diggan wrote:
               | Why would it be more expensive to include a REPL-like
               | experienced compared to running the whole of the Rust
               | compiler, in the GPU training loop?
               | 
               | Not that I argued that you should that (I don't think
               | either makes much sense, point was at inference time, not
               | for training), but if you apply that to one side of the
               | argument (for Clojure a REPL), don't you think you should
               | also apply that to the other side (for Rust, a compiler)
               | for a fair comparison?
        
               | michalsustr wrote:
               | I agree. I am under the impression that unlike Rust,
               | there aren't explicit types required in Clojure. (I don't
               | know clojure)
               | 
               | So there are examples online, with rust code and types
               | and compiler errors, and how to fix them. But for
               | clojure, the type information is missing and you need to
               | get it from repl.
        
               | michalsustr wrote:
               | If you want to build LLM specific to clojure, it could be
               | probably engineered, to add the types as traces for
               | training via synthetic dataset, and provide them from
               | repl at inference time. Sounds like awfully large amount
               | of work for non mainstream language.
        
               | diggan wrote:
               | > So there are examples online, with rust code and types
               | and compiler errors, and how to fix them. But for
               | clojure, the type information is missing and you need to
               | get it from repl.
               | 
               | Right, my point is that instead of the LLM relying on
               | static types and text, with Clojure the LLM could
               | actually inspect the live application. So instead of
               | trying to "understand" that variable A contains 123,
               | it'll do "<execute>(println A)</execute>" and whatever,
               | and then see the results for themselves.
               | 
               | Haven't thought deeply about it, but my intuition tells
               | me the more (accurate and fresh) relevant data you can
               | give the LLM for solving problems, the better. So having
               | the actual live data available is better than trying to
               | figure out what the data would be based on static types
               | and manually following the flow.
        
         | diggan wrote:
         | On the other hand, using "it compiles" as a heuristic for "it
         | does what I want" seems to be missing the goal of why you're
         | coding what you're coding in the first place. I'd much rather
         | setup one E2E test with how I want the thing to work, then let
         | the magical robot figure out how to get there while also being
         | able to run the test and see if they're there yet or not.
        
         | jillesvangurp wrote:
         | I'm waiting for someone to figure out that coding is
         | essentially a sequence of refactoring steps where each step is
         | a code transformation that transforms it from one valid state
         | to another. Equipping refactoring IDEs with an MCP facade would
         | give direct access to that as well as feedback on compilation
         | state and lots of other information. That makes it a lot easier
         | to do structured transformations of entire code bases without
         | having to feed the entire code base as a context and then hope
         | the LLM hallucinates together the right tokens and uses
         | reasoning to figure out if it might be correct. They are
         | actually pretty good at doing that but it doesn't scale very
         | well currently and gets expensive quickly (in time and tokens).
         | 
         | This stuff is indeed inherently harder for dynamic languages.
         | But it's been standard for (some) statically compiled languages
         | like Java, Kotlin, C#, Scala, etc. for most of this century. I
         | was using refactoring IDEs for Java as early as 2002.
        
           | pjmlp wrote:
           | Which based many of their tools on what Xerox PARC has done
           | with their Smalltalk, Mesa (XDE), Mesa/Cedar, Interlisp-D
           | environments.
           | 
           | This kind of processing is possible on dynamic languages,
           | when using an image base system, as it also contains metadata
           | that somehow takes the role of static types.
           | 
           | From the previous list only Mesa and Cedar are statically
           | typed.
        
           | _QrE wrote:
           | It's not really that much harder, if at all, for dynamic
           | languages, because you can use type hints in some cases (i.e.
           | Python), and a different language (typescript) in case of
           | Javascript; there's plenty of tools that'll tell you if
           | you're not respecting those type hints, and you can feed the
           | output to the LLM.
           | 
           | But yeah, if we get better & faster models, then hopefully we
           | might get to a point where we can let the LLM manage its own
           | context itself, and then we can see what it can do with large
           | codebases.
        
           | igouy wrote:
           | Smalltalk Refactoring Browser! (Where do you think Java IDEs
           | got the idea from?)
           | 
           | "A very large Smalltalk application was developed at Cargill
           | to support the operation of grain elevators and the
           | associated commodity trading activities. The Smalltalk client
           | application has 385 windows and over 5,000 classes. About
           | 2,000 classes in this application interacted with an early
           | (circa 1993) data access framework. The framework dynamically
           | performed a mapping of object attributes to data table
           | columns.
           | 
           | Analysis showed that although dynamic look up consumed 40% of
           | the client execution time, it was unnecessary.
           | 
           | A new data layer interface was developed that required the
           | business class to provide the object attribute to column
           | mapping in an explicitly coded method. Testing showed that
           | this interface was orders of magnitude faster. The issue was
           | how to change the 2,100 business class users of the data
           | layer.
           | 
           | A large application under development cannot freeze code
           | while a transformation of an interface is constructed and
           | tested. We had to construct and test the transformations in a
           | parallel branch of the code repository from the main
           | development stream. When the transformation was fully tested,
           | then it was applied to the main code stream in a single
           | operation.
           | 
           | Less than 35 bugs were found in the 17,100 changes. All of
           | the bugs were quickly resolved in a three-week period.
           | 
           | If the changes were done manually we estimate that it would
           | have taken 8,500 hours, compared with 235 hours to develop
           | the transformation rules.
           | 
           | The task was completed in 3% of the expected time by using
           | Rewrite Rules. This is an improvement by a factor of 36."
           | 
           | from "Transformation of an application data layer" Will Loew-
           | Blosser OOPSLA 2002
           | 
           | https://dl.acm.org/doi/10.1145/604251.604258
        
             | pjmlp wrote:
             | > Smalltalk Refactoring Browser! (Where do you think Java
             | IDEs got the idea from?)
             | 
             | Eclipse still has the navigation browser from Visual Age
             | for Smalltalk. :)
        
         | cardanome wrote:
         | Not really. Even humans regularly get lifetimes wrong.
         | 
         | As someone not super experienced in Rust, my workflow was often
         | very very compiler-error-driven. I would type a bit, see what
         | it says, changes it and so on. Maybe someone more experienced
         | can write whole chucks single-pass that compile on first try
         | but that should far exceed anything generative AI will be able
         | to do in the next few years.
         | 
         | The problem here is that iteration with AI is slow and
         | expensive at the moment.
         | 
         | If anything you want to use a language with automatic garbage
         | collection as it removes mental overhead for both generative AI
         | as well as humans. Also you want to to have a more boilerplate
         | heavy language because they are more easily to reason about
         | while the boilerplate doesn't matter when the AI does the work.
         | 
         | I haven't tried it but I suspect golang should work very well.
         | The language is very stable so older training data still works
         | fine. Projects are very uniform, there isn't much variation in
         | coding style, so easy to grok for AI.
         | 
         | Also probably Java but I suspect it might get confused with the
         | different versions and all the magic certain frameworks use.
        
           | greenavocado wrote:
           | All LLMs still massively struggle with resource lifetimes
           | irrespective of the language
        
             | hu3 wrote:
             | IMO they struggle a whole lot more with low level/manual
             | lifetimes like C, C++ and Rust.
        
         | sega_sai wrote:
         | I think this is a great point! I.e. while for humans, it's
         | easier to write not strongly-typed python-like code, as you
         | skip a lot of boiler-plate code, but for AI, the boiler-plate
         | is probably useful, because it reinforces what variable is of
         | what type, and also obviously it's easier to detect errors
         | early on at compilation time.
         | 
         | I actually wonder if that will force languages like python to
         | create a more strictly enforced type modes, as boiler-plate is
         | much less of an issue now.
        
         | pjmlp wrote:
         | Hot take, this is a transition step, like the -S switch back
         | when Assembly developers didn't believe compilers could output
         | code as good as themselves.
         | 
         | Eventually a few decades later, optimising backends made hand
         | written Assembly a niche use case.
         | 
         | Eventually AI based programming tools will be able to generate
         | executables. And like it happened with -S we might require the
         | generation into a classical programming language to validate
         | what the AI compiler backend is doing, until it gets good
         | enough and only those arguing on AI Compiler Explorer will
         | care.
        
           | secondcoming wrote:
           | It's probably pointless writing run of the mill assembly
           | these days, but SIMD has seen a resurgence in low-level
           | coding, at least until compilers get better at generating it.
           | I don't think I'd fully trust LLM generated SIMD code as if
           | it was flawed it'd be a nightmare to debug.
        
             | pjmlp wrote:
             | Well, that won't stop folks trying though.
             | 
             | "Nova: Generative Language Models for Assembly Code with
             | Hierarchical Attention and Contrastive Learning"
             | 
             | https://arxiv.org/html/2311.13721v3
        
           | imtringued wrote:
           | This won't be a thing and for very obvious reasons.
           | 
           | Programming languages solve the specification problem, (which
           | happens to be equivalent to "The Control Problem"). If you
           | want the computer to behave in a certain way, you will have
           | to provide a complete specification of the behavior. The more
           | loose and informal that specification is, the more blanks
           | have to be filled in, the more you are letting the AI make
           | decisions for you.
           | 
           | You tell your robotic chef to make a pizza, and he does, but
           | it turns out it decided to make a vegan pizza. You yell at
           | the robot for making a mistake and it sure gets that you
           | don't want a vegan pizza, so it decides to add canned tuna.
           | Except, turns out you don't like tuna either. You yell at the
           | robot again and again until it gets it. Every single time
           | you're telling the AI that it made a mistake, you're actually
           | providing a negative specification of what not to do. In the
           | extreme case you will have to give the AI an exhaustive list
           | of your preferences and dislikes, in other words, a complete
           | specification.
           | 
           | By directly producing executables, you have reduced the
           | number of knobs and levers that can be used to steer the AI
           | and made it that much harder to provide a specification of
           | what the application is supposed to do. In other words,
           | you're assuming that the model in itself is already a
           | complete specification and your prompt is just retrieving the
           | already existing specification.
        
             | pjmlp wrote:
             | That was the argument from many Assembly developers against
             | FORTRAN compilers, if you dive into literature of the time.
             | 
             | Also this is already happening in low code SaaS products
             | where integrations have now AI on their workflows.
             | 
             | For example, https://www.sitecore.com/products/xm-cloud/ai-
             | workflow
             | 
             | Which in a way are like high level interpreters, and
             | eventually we will have compilers as well.
             | 
             | Not saying it happen tomorrow, but it will come.
        
         | inciampati wrote:
         | I've found this to be very true. I don't think this is a hot
         | take. It's the big take.
         | 
         | Now I code almost all tools that aren't shell scripting in
         | rust. I'm only using dynamic languages when forced to by
         | platform or dependencies. I'm looking at you, pytorch.
        
         | slashdev wrote:
         | I've been saying this for years on X. I think static languages
         | are winning in general now, having gained much of the
         | ergonomics of dynamic languages without sacrificing anything.
         | 
         | But AI thrives with a tight feedback loop, and that's works
         | best with static languages. A Python linter (or even mypy) just
         | isn't as good as the Rust compiler.
         | 
         | The future will be dominated by static languages.
         | 
         | I say this is a long-time dynamic languages and Python
         | proponent who started seeing the light back when Go was first
         | released.
        
       | flohofwoe wrote:
       | So Microsoft programmers will become code monkeys that stumble
       | from one compiler error to the next without any idea what they
       | are actually doing, got it ;)
       | 
       | (it's also a poor look for Rust's ergonomics tbh, but that's not
       | a new issue)
        
         | jeffreygoesto wrote:
         | Yupp. And they brag about bangin' on it without any
         | understanding until it magically compiles.
        
       | jumploops wrote:
       | I'm curious how this performs against Claude Code/Codex.
       | 
       | The "RustAssistant Algorithm" looks to be a simple LLM
       | workflow[0], and their testing was limited to GPT-4 and GPT-3.5.
       | 
       | In my experience (building a simple Rust service using OpenAI's
       | o1), the LLM will happily fix compilation issues but will also
       | inadvertently change some out-of-context functionality to make
       | everything "just work."
       | 
       | The most common issues I experienced were subtle changes to
       | ownership, especially when using non-standard or frequently
       | updated crates, which caused performance degradations in the test
       | cases.
       | 
       | Therefore I wouldn't really trust GPT-4 (and certainly not 3.5)
       | to modify my code, even if just to fix compilation errors,
       | without some additional reasoning steps or oversight.
       | 
       | [0] https://www.anthropic.com/engineering/building-effective-
       | age...
        
         | woadwarrior01 wrote:
         | I tried Claude Code with a small-ish C++ codebase recently and
         | found it to be quite lacking. It kept making a lot of silly
         | syntax errors and going around in circles. Spent about $20 in
         | credits without it getting anywhere close to being able to
         | solve the task I was trying to guide it through. OTOH, I know a
         | lot of people who swear by it. But they all seem to be Python
         | or Front-end developers.
        
           | Wheaties466 wrote:
           | Do we really know why LLMs seem to score the highest with
           | python related coding tasks? I would think there are equally
           | good examples of javascript/c++/java code to train from but I
           | always see python with the highest scores.
        
             | triyambakam wrote:
             | Could be related to how flexible Python is. Pretty easy to
             | write bad and "working" Python code
        
           | greenavocado wrote:
           | May I ask what you tried? I have had strong successes with
           | C++ generation
        
             | woadwarrior01 wrote:
             | It was a bit esoteric, but not terribly so, some metal-cpp
             | based code for a macOS app.
        
         | mfld wrote:
         | I find that Claude code works well to fix rust compile errors
         | in most cases. Interestingly, the paper didn't compare against
         | agentic coding tools at all, which of course will be more easy
         | to use and more generally applicable.
        
       | rgoulter wrote:
       | At a glance, this seems really neat. -- I reckon one thing LLMs
       | have been useful to help with is "the things I'd copy-paste from
       | stack overflow". A loop of "let's fix each error" reminds me of
       | that.
       | 
       | I'd also give +1 to "LLMs as force multiplier". -- If you know
       | what you're doing & understand what's going on, it seems very
       | useful to have an LLM-supported tool able to help automatically
       | resolve compilation errors. -- But if you don't know what you're
       | doing, I'd worry perhaps the LLM will help you implement code
       | that's written with poor taste.
       | 
       | I can imagine LLMs could also help explain errors on demand. --
       | "You're trying to do this, you can't do that because..., instead,
       | what you should do is...".
        
       | k_bx wrote:
       | So far the best way to fix Rust for me was to use OpenAI's CODEX
       | tool. Rust libraries change APIs often and evolve quickly, but
       | luckily all the code is available under ~/.cargo/registry, so it
       | can go and read the actual library code. Very useful!
        
       | NoboruWataya wrote:
       | Anecdotally, ChatGPT (I use the free tier) does not seem to be
       | very good at Rust. For any problem with any complexity it will
       | very often suggest solutions which violate the borrowing rules.
       | When the error is pointed out to it, it will acknowledge the
       | error and suggest a revised solution with either the same or a
       | different borrowing issue. And repeat.
       | 
       | A 74% success rate may be an impressive improvement over the SOTA
       | for LLMs, but frankly a tool designed to fix your errors being
       | wrong, at best, 1 in 4 times seems like it would be rather
       | frustrating.
        
         | danielbln wrote:
         | Free tier ChatGPT (so probably gpt-4o) is quite a bit behind
         | the SOTA, especially compared to agentic workflows (LLM that
         | autonomously perform actions, run tests, read/write/edit files,
         | validate output etc.).
         | 
         | Gemini 2.5 pro is a much stronger model, so is Claude 3.7 and
         | presumably GPT4.1 (vis API).
        
           | greenavocado wrote:
           | Gemini 2.5 pro is far ahead of even Claude
           | 
           | Chart:
           | 
           | https://raw.githubusercontent.com/KCORES/kcores-llm-
           | arena/re...
           | 
           | Description of the challenges:
           | 
           | https://github.com/KCORES/kcores-llm-arena
        
       | MathiasPius wrote:
       | I suspect this might be helpful for minor integration challenges
       | or library upgrades like others have mentioned, but in my
       | experience, the vast majority of Rust compilation issues fall
       | into one of two buckets:
       | 
       | 1. Typos, oversights (like when adding new enum variants), etc.
       | All things which in most cases are solved with a single keystroke
       | using non-LLM LSPs.
       | 
       | 2. Wrong assumptions (on my part) about lifetimes, ownership, or
       | overall architecture. All problems which I very much doubt an LLM
       | will be able to reason about, because the problems usually lie in
       | my understanding or modelling of the problem domain, not anything
       | to do with the code itself.
        
       | pveierland wrote:
       | Anecdotally, Gemini 2.5 Pro has been yielding good results lately
       | for Rust. It's been able to one-shot pretty intricate proc macros
       | that required multiple supporting functions (~200 LoC).
       | 
       | Strong typing is super helpful when using AI, since if you're
       | properly grounded and understand the interface well, and you are
       | specifying against that interface, then the mental burden of
       | understanding the output and integrating with the rest of the
       | system is much lower compared to when large amounts of new
       | structure is created without well defined and understood bounds.
        
         | goeiedaggoeie wrote:
         | I find that these area all pretty bad with more advanced code
         | still, especially once FFI comes into play. Small chunks ok,
         | but even when working with specification (think some ISO
         | standard from video) and working on something simple (eg a
         | small gstreamer rust plugin), it is still not quite there.
         | C(++) same story.
         | 
         | All round however, 10 years ago I would have taken this
         | assistance!
        
           | danielbln wrote:
           | And 5 years ago this would have been firmly science fiction.
        
         | mountainriver wrote:
         | Agree, I've been one-shotting entire features into my rust code
         | base with 2.5
         | 
         | It's been very fun!
        
           | faitswulff wrote:
           | What coding assistant do you use?
        
             | mountainriver wrote:
             | Cursor
        
       | CryZe wrote:
       | I'd love to see VSCode integrate all the LSP information into
       | Copilot. That seems to be the natural evolution of this idea.
        
       | vaylian wrote:
       | > These unique Rust features also pose a steep learning curve for
       | programmers.
       | 
       | This is a common misunderstanding of what a learning curve is:
       | 
       | https://en.wikipedia.org/wiki/Learning_curve#%22Steep_learni...
        
       | manmal wrote:
       | Maybe this is the right thread to ask: I've read that Elixir is a
       | bit under supported by many LLMs. Whereas Ruby/Rails and Python
       | work very well. Are there any recommendations for models that
       | seem particularly useful for Elixir?
        
         | arrowsmith wrote:
         | Claude is the best for Elixir in my experience, although you
         | still need to hold its hand quite a lot (cursor rules etc).
         | 
         | None of the models are updated for Phoenix 1.8 either, which
         | has been very frustrating.
        
           | manmal wrote:
           | Thank you!
        
       | petesergeant wrote:
       | Every coding assistant or LLM I've used generally makes a real
       | hash of TypeScript's types, so I'm a little skeptical, but also:
       | 
       | > RustAssistant is able to achieve an impressive peak accuracy of
       | roughly 74% on real-world compilation errors in popular open-
       | source Rust repositories.
       | 
       | 74% feels like it would be just the right amount that people
       | would keep hitting "retry" without thinking about the error at
       | all. I've found LLMs great for throwing together simple scripts
       | in languages I just don't know or can't lookup the syntax for,
       | but I'm still struggling to get serious work out of them in
       | languages I know well where I'm trying to do anything vaguely
       | complicated.
       | 
       | Worse, they often produce plausible code that does something in a
       | weird or suboptimal way. Tests that don't actually really test
       | anything, or more subtle but actual bugs in logic, that you
       | wouldn't write yourself, but need to be very on the ball to catch
       | in code you're reviewing.
        
         | jcgrillo wrote:
         | 74% feels way too low to be useful, which aligns with my
         | limited experience trying to get any value from LLMs for
         | software engineering. It's just too frustrating making the
         | machine guess and check its way to the answer you already know.
        
       | delduca wrote:
       | > unlike unsafe languages like C/C++
       | 
       | The world is unsafe!
        
       | rs186 wrote:
       | Many of the examples seem very easy -- I suspect that without
       | LLMs, just simple Google searches lead you to a stackoverflow
       | question that asks the same thing which. I wonder how this
       | performs in bigger, more complex codebase.
       | 
       | Also, my personal experience with LLMs fixing compilation errors
       | is: when it works, it works great. But when it doesn't, it's so
       | clueless and lost that it's a complete waste of time to employ
       | LLM in the first place -- you are much better off debugging the
       | code yourself using old fashioned method.
        
         | lolinder wrote:
         | Yep. This is true for all languages that I've tried, but it's
         | _particularly_ true in Rust. The model will get into a loop
         | where it gets further and further away from the intended
         | behavior while trying to fix borrow checker errors, then
         | eventually (if you 're lucky) gives up and hand the mess back
         | over to you.
         | 
         | Which at least with Cursor's implementation means that it by
         | default gives you the last iteration of its attempt to fix the
         | problem, which when this happens is almost always _way_ worse
         | than its first attempt.
        
           | greenavocado wrote:
           | That's why you need to implement logical residual connections
           | to keep the results focused over successive prompts (like
           | ResNets do)
        
           | _bin_ wrote:
           | Absolutely; I re-try using LLMs to debug this every so often
           | and they just aren't capable of "fixing" anything borrow
           | checker-related. They spit out some slop amalgamation of
           | Rc/Arc/even UnsafeCell. They don't understand futures being
           | send + sync. They don't understand lifetimes. The other path
           | is it sometimes loops between two or three broken "fixes"
           | that still don't compile.
           | 
           | "Sure! Let me...." (writes the most ungodly garbage Rust
           | known to man)
           | 
           | Now, I certainly hope I'm wrong. It would be nice to enjoy
           | similar benefits to guys doing more python/typescript work. I
           | just doubt it's that good.
        
             | steveklabnik wrote:
             | This is pretty contrary to my experience, for whatever it's
             | worth. I wonder what the differences are.
        
             | OJFord wrote:
             | > It would be nice to enjoy similar benefits to guys doing
             | more python/typescript work.
             | 
             | No need to be envious: it doesn't give me compilation
             | errors in python, but that ain't because it always gives
             | correct code!
             | 
             | (It _can_ be helpful too, but I get a lot of hallucinated
             | APIs /arguments/etc.)
        
           | cmrdporcupine wrote:
           | Yeah borrow checker errors in Rust fed to an LLM inevitably
           | lead to the thing just going in circles. You correct it, it
           | offers something else that doesn't work, when notified of
           | that it gives you some variant of the original problem.
           | 
           | Usually when you come to an LLM with an error like this it's
           | because you tried something that made sense to you and it
           | didn't. Turns out those things usually "make sense" to an
           | LLM, too, and they don't step through and reason through the
           | "logic", they just vibe off of it, and the pattern they come
           | back to you with is usually just some variant of the original
           | pattern that led you to the problem in the first place.
        
         | nicce wrote:
         | There have been cases when o1/o3 has helped me to solve some
         | issues that I could not solve with stackoverflow or Rust forum.
         | 
         | LLM was able to connect the dots of some more complex and rarer
         | Rust features and my requirements. I did not know that they
         | could be used like that. One case was, for example, about
         | complex usage of generic associated types (GATs).
         | 
         | What it comes to lifetime issues, usually it is about wasting
         | time if trying to solve with LLMs.
        
           | seanw444 wrote:
           | I rarely touch LLMs (I think they're very overhyped and
           | usually unnecessary), but I did find one use case that was
           | handy recently. I was writing some Nim code that interfaced
           | with Cairo, and one Cairo feature wasn't really documented at
           | all, and the design pattern was one I was not familiar with.
           | 
           | Asking it to write Nim code to solve my problem resulted in a
           | mess of compilation errors, and it kept looping through
           | multiple broken approaches when I pointed out what was
           | flawed.
           | 
           | Finally, I just decided to ask it to explain the C design
           | pattern I was unfamiliar with, and it was capable of bridging
           | the gap enough for me to be able to write the correct Nim
           | code. That was pretty cool. There was no documentation
           | anywhere for what I needed, and nobody else had ever
           | encountered that problem before. That said, I could have just
           | gone to the Nim forum with a snippet and asked for help, and
           | I would have solved the problem with a fraction of the
           | electricity usage.
        
         | rvz wrote:
         | > Also, my personal experience with LLMs fixing compilation
         | errors is: when it works, it works great. But when it doesn't,
         | it's so clueless and lost that it's a complete waste of time to
         | employ LLM in the first place -- you are much better off
         | debugging the code yourself using old fashioned method.
         | 
         | Or just 'learning the Rust syntax' and standard library?
         | 
         | As you said, LLMs are unpredictable in their output and will
         | can generate functions that don't exist and incorrect code as
         | you use more advanced features, wasting more time than it saves
         | if you don't know the language well enough.
         | 
         | I guess those coming from dynamically typed languages are
         | having a very hard time in getting used to strongly typed
         | languages and then struggle with the basic syntax of say, Rust
         | or C++.
         | 
         | Looking at this AI hype with vibe-coding/debugging and LLMs, it
         | just favours throwing code on the wall with a lack of
         | understanding as to what it does after it compiles.
         | 
         | This is why many candidates _won 't ever_ do Leetcode with Rust
         | in a real interview.
        
         | mountainriver wrote:
         | LLMs have made me at least twice as fast at writing rust code.
         | I now think that more people should be writing rust as it's
         | been made fairly simple to do.
         | 
         | And yes there are some errors it gets stuck in a loop on. It's
         | not often and generally just switching to another LLM in cursor
         | will fix it.
        
         | onlyrealcuzzo wrote:
         | > But when it doesn't, it's so clueless and lost that it's a
         | complete waste of time to employ LLM in the first place -- you
         | are much better off debugging the code yourself using old
         | fashioned method.
         | 
         | So why not automatically try it, see if it fixes automatically,
         | and if not then actually debug it yourself?
        
         | themusicgod1 wrote:
         | "very easy" if you have access to the correct dependencies
         | which outside of microsoft's walled garden, and access to a
         | free LLM (https://elevenfreedoms.org/) which is not guaranteed
         | at all
         | 
         | all of this looks _very_ different when you have to patch in
         | rust dependencies by hand outside of github.
        
       | infogulch wrote:
       | I wonder if the reason why LLMs are not very good at debugging is
       | because there's not very much code published that is in this
       | intermediate state with obvious compilation errors.
        
         | qiine wrote:
         | huh isn't stackoverflow questions a big source ? ;p
        
       | pjmlp wrote:
       | With limited bandwidth, so will check later, it would be great if
       | it could do code suggestions for affine types related errors, or
       | explain what is wrong, this would help a lot regarding Rust's
       | adoption.
        
       | croemer wrote:
       | Paper is a bit pointless if one can't use the tool.
       | 
       | The paper links to a Github repo with nothing but a 3 sentence
       | README, no activity for 9 months, reading
       | 
       | > We are in the process of open-sourcing the implementation of
       | RustAssistant. Watch this space for updates.
        
       | croemer wrote:
       | Was the paper really written 2 years ago?
       | 
       | The paper states "We exclude error codes that are no longer
       | relevant in the latest version of the Rust compiler (1.67.1)".
       | 
       | A quick search shows that Rust 1.68.0 was released in March 2023:
       | https://releases.rs/docs/1.68.0/
       | 
       | Update: looks like it really is 2 years old. "We evaluate both
       | GPT-3.5-turbo (which we call as GPT-3.5) and GPT-4"
        
         | meltyness wrote:
         | Yeah, the problem LLMs will have with Rust is the adherence to
         | the type system, and the type system's capability to perform
         | type inference. It essentially demands coherent processing
         | memory, similar to the issues LLMs have performing arithmetic
         | while working with limited total features.
         | 
         | https://leetcode.com/problems/zigzag-grid-traversal-with-ski...
         | 
         | Here's an example of an ostensibly simple problem that I've
         | solved (pretty much adversarially) with a type like: StepBy<
         | Cloned< FlatMap< Chunks<Vec<i32>>, FnMut<&[i32]> ->
         | Chain<Iter<i32>, Rev<Iter<i32>> > > > >
         | 
         | So this (pretty much) maximally dips into the type system to
         | solve the problem, and as a result any comprehension the LLM
         | must develop mechanistically about the type system is
         | redundant.
         | 
         | It's a pretty wicked problem the degree to which the type
         | system is used to solve problems, and the degree to which
         | imperative code solves problems that, except for hopes and
         | wishes, which portions map from purpose to execution will
         | likely remain incomprehensible.
        
       | chaosprint wrote:
       | I am creator and maintainer of several Rust projects:
       | 
       | https://github.com/chaosprint/glicol
       | 
       | https://github.com/chaosprint/asak
       | 
       | For LLM, even the latest Gemini 2.5 Pro and Claude 3.7 Thinking,
       | it is difficult to give a code that can be compiled at once.
       | 
       | I think the main challenges are:
       | 
       | 1. Their training material is relatively lagging. Most Rust
       | projects are not 1.0, and the API is constantly changing, which
       | is also the source of most compilation errors.
       | 
       | 2. Trying to do too much at one time increases the probability of
       | errors.
       | 
       | 3. The agent does not follow human's work habits very well, go to
       | docs.rs to read the latest documents and look at examples. After
       | making mistakes, search for network resources such as GitHub.
       | 
       | Maybe this is where cursor rules and mcp can work hard. But at
       | present, it is far behind.
        
       ___________________________________________________________________
       (page generated 2025-05-02 23:01 UTC)