[HN Gopher] DARPA suggests turning old C code automatically into...
___________________________________________________________________
DARPA suggests turning old C code automatically into Rust - using
AI, of course
Author : jdblair
Score : 32 points
Date : 2024-08-03 19:44 UTC (3 hours ago)
(HTM) web link (www.theregister.com)
(TXT) w3m dump (www.theregister.com)
| jdblair wrote:
| I'm really surprised this can work at all in any automated way.
| You can't just make a line-by-line transcription of a typical c
| program into rust. Pointers and aliasing are ubiquitous in c
| programs, concepts that rust explicitly prevents. You have to
| rethink many typical constructs at a high level to rewrite a c
| program in rust, unless you wrap the whole thing in "unsafe."
| alex_suzuki wrote:
| I wonder about this as well, especially im code bases that make
| heavy use of macros.
| ip26 wrote:
| For a naive newcomer - could you go line by line, wrap the
| whole thing in "unsafe", compile to an identical binary, and
| then slowly peel away the "unsafe" while continuing to validate
| equivalence?
|
| That would at least get you to as much rust as possible, and
| then let engineers tackle rethinking just those concepts.
| rolph wrote:
| you need to create a transpiler philosophy.
|
| transform CtoASM, then ASMtoRust.
|
| what you need to avoid is incompatibilites between different
| high level languages with a low level intermediary so you
| arent stuck attempting to convert high level hardware
| abstraction directly to another high level hardware
| abstraction.
| j-krieger wrote:
| There are "warts" with unsafe Rust that would make this feat
| very difficult. Aliasing rules still apply.
| jcranmer wrote:
| Converting C to legal (unsafe) Rust is quite possible; there
| is indeed already a tool that does this
| (https://github.com/immunant/c2rust).
|
| The problem you run into is that the conversion is so
| pedantically correct that the resulting code is useless. The
| result retains all of the problems that the C code has, and
| is so far from idiomatic Rust that it's easier to toss the
| code and start from scratch. Progressive lifting on unsafe
| Rust to safe Rust is a very difficult order, and the tool I
| mentioned _had_ a tool to do that... which is now abandoned
| and unmaintained.
|
| At the end of the day, the chief issue with converting to
| safe Rust is not just that you have to copy semantics over,
| but you also have to recover a lot of high-level
| preconditions. Turning pointers into slices is perhaps the
| _easiest_ task of the lot; given the very strict mutability
| rules in Rust, you also have to work out when and where to
| insert things like Cell or Rc or Mutex or what have you, as
| well as building out lifetime analysis. And chances are the
| original code doesn 't get all these rules right, which is
| why there are bugs in the first place.
|
| Solving that problem is the goal of this DARPA proposal, or
| perhaps more accurately, determining how feasible it is to
| solve that problem automatically. Personally, I think the
| better answer is to have a semi-automated approach, where
| users provide as input the final Rust struct layouts (and
| possibly parts of the API, to fix lifetime issues), and the
| tool automates the drudgery of getting the same logic ported
| to that mapping.
| Animats wrote:
| Right. Used c2rust once. Been there, done that. The Rust
| code that comes out is _awful_. Does the same thing as the
| C code, bugs and all. You don 't get Rust subscript check
| errors, you get segfaults from unsafe Rust code. What comes
| out is hopeless for manual "refactoring".
|
| The hardest part may be Rust's affine type rules. Reference
| use in Rust is totally different than pointers in C/C++.
| Object parenting relationships are hard to express in Rust.
| alkonaut wrote:
| A line-by line doesn't require much "AI" either. You could
| probably make a rough translation in some (mostly unsafe) Rust.
|
| Assume the AI actually needs to figure out lifetimes and so on
| to be actually useful and make valid programs. Which would be
| impressive if it does.
| morgante wrote:
| Line by line is infeasible, which is precisely why you need to
| use AI to make larger semantic inferences.
|
| You also don't have to one-shot translate everything. One of
| the valuable things about the Rust compiler is it gives lots of
| specific information that you can feed back into an LLM to
| iterate.
|
| I've been working on similar problems for my startup (grit.io)
| and think C -> Rust is definitely tractable in the near term.
| Definitely not _easy_ but certainly solvable.
| stogot wrote:
| What about convert to AST then ask the AI to convert to Rust.
| Would that work?
| Someone wrote:
| That's probably the rout they would take, but the C AST
| won't have ownership attributes. You'd have to discover
| those yourself.
|
| ASTs also don't have much info on threading (that's more or
| less limited to "the program starts a thread with entry
| point _foo_ at some time", "Foo waits for another thread to
| finish")
| Someone wrote:
| > Pointers and aliasing are ubiquitous in c programs
|
| If we ignore multi-threaded programs is long term aliasing
| actually ubiquitous in C programs? For many programs, I would
| expect most of it to happen within the scope of a single
| function (and within it, across function calls, but there,
| borrowing will solve this, won't it?)
|
| If so I would trying to tackle that as one sub-problem (you
| have to start somewhere), and detecting how data gets shared
| between threads as another. For the latter, I expect that many
| programs will have some implicit ownership rule such as "thread
| T1 puts stuff in queue Q where thread T2 will pick it up" that
| can be translated as "putting it in queue transfers ownership".
|
| Detecting such rules may not be easy, but doesn't look
| completely out of reach for me, either, and that would be good
| enough for a research project.
| rekttrader wrote:
| Talked about here: https://news.ycombinator.com/item?id=41110269
| verdverm wrote:
| Russ Cox gave a GopherCon talk on the effort to automatically
| convert the Go compiler from C to Go (in the early days). Lots of
| interesting IRL issues / solutions in there.
|
| https://www.youtube.com/watch?v=QIE5nV5fDwA
|
| iirc, they were able to transpile 90%+ (without AI) and manually
| did the rest
| poikroequ wrote:
| > Those involved with the oversight of C and C++ have pushed
| back, arguing that proper adherence to ISO standards and diligent
| application of testing tools can achieve comparable results
| without reinventing everything in Rust.
|
| If you stick to extremely stringent coding practices and
| incorporate third party static verification tools that require
| riddling your code with proprietary situations, then sure, you
| can achieve comparable results with C/C++.
|
| Or you can just use Rust.
| dtx1 wrote:
| It's quite hilarious to see the push back rust gets by the
| c/c++ community. Obviously their decades of hard work and
| experience to work with those languages are overriding their
| reasoning circuits. Who in their right mind would defend a
| language that has such major and obvious design flaws if a
| genuine alternative is there.
| constantcrying wrote:
| Many of the most widely used languages have obvious major
| design flaws. (JavaScript is one obvious candidate, python is
| another. How did a language which has no built-in floating
| point type become the number one language for numerical
| analysis?)
|
| The real question is what tradeoffs you are making and what
| you are gaining. Rust makes certain memory safety guarantees
| about the program at compile time, but at the same time it
| disallows perfectly safe constructions, which can exist in
| C++, as well.
| brigadier132 wrote:
| I think DARPA is making the right decision about choosing
| Rust as the language for low level systems programming. For
| national security related matters you'd definitely want the
| certainty Rust brings.
|
| The reason I personally chose Rust as my go to language for
| low level programming is that despite learning systems
| programming in college I pretty much never used it outside
| of school. Meaning I didn't have any of that knowledge that
| c and c++ programmers had built up over years of
| experience. So I decided that instead of having to deal
| with the unknown skill deficiencies in writing concurrent
| software and memory management I'd rather just have a
| compiler scream at me. I don't regret the decision.
|
| Also, I remember writing an async TCP implementation in
| college with c++ using boost. Rust tooling is just so far
| ahead of that.
| dtx1 wrote:
| > JavaScript is one obvious candidate
|
| I don't see anyone defending JavaScript. In fact a whole
| lot of people are using typescript now because JavaScript
| is just so bad.
|
| As for python, that's a good point. I guess it's just
| because it's easy to use and all the numerical stuff is
| done with c-bindings anyway?
|
| But the C++ Situation is genuinely different. There's a
| reason governments are now calling upon developers to just
| let it die already[0]. That design flaw is so bad it's
| causing genuine harm.
|
| [0] https://www.cisa.gov/news-events/news/urgent-need-
| memory-saf...
| poikroequ wrote:
| proprietary annotations*
|
| Sorry, autocorrect, I typed this on my phone.
| tonetegeatinst wrote:
| I am a total C shill....I'll admit it. I'm just starting to learn
| it and the only reason I picked it was I wanted a low level
| language that most systems run.
|
| That said, while I can acknowledge the benefits of memory safety,
| I would personally choose zig over rust.
|
| All things considered I know you can do some safety check for C
| using the compiler, and that helps reduce the odds of memory
| issues.
|
| Idk what rust mail libraries look like( are they even called
| that?) But I know C's standard library's have made learning stuff
| easier. Is their any way to know if your libraries in rust are
| using unsafe code? Will that just spit out compile time errors?
| gosub100 wrote:
| slight tangent, but I think it would be amazing if AI could write
| device drivers. Full-featured GPU drivers for, say, OpenBSD. What
| does it need? Probably the state machines of the GPUs, how to
| enable various modes, how to feed data in/out of the device at
| intended speed, how to load shaders.
|
| Why can't AI learn to do that? Its reward could be getting past
| the initialization and getting to the default state of the
| driver. It could be trained on hundreds of GPU drivers, not only
| for the minutiae of how to load values into the control
| registers, but the bigger picture of what it actually means.
| constantcrying wrote:
| >Why can't AI learn to do that?
|
| Because it isn't magic.
|
| >It could be trained on hundreds of GPU drivers
|
| Do you know what an AI trained on hundreds of books looks like?
| Even with millions of books it can not write a coherent
| chapter, much less an entire book.
|
| This is a genuinely terrible idea. It is exactly the thing AI
| is _bad_ at, high degree of accuracy over long stretches of
| output.
| sumanthvepa wrote:
| I program in C++ and am very happy to do so. Modern C++ is very
| safe and actually fun to program in. It gives me enormous
| expressivity, extraordinary performance and safety when I need
| it. I'm not building space shuttles, I'm building 3D experiences,
| so I'm not terribly concerned about crashes. But even for me,
| I've not run into a memory corruption bug in recent memory (10-15
| years.)
|
| Bash C/C++ all you want. I'm happy to keep using it to my
| advantage.
| tomrod wrote:
| What is the learning curve for newbies to avoid critical
| segfaults? If you still have to walk a tightrope to get code
| across the board, wouldn't all benefit from a plankway with
| guardrails instead?
|
| I'm not dissing C or C++ in any way. I've used it. But I
| recognize there are some major footguns that aren't easy to
| avoid, causing a much longer learning curve than necessary to
| get things built. Rust at least seems determined to address
| them, good or bad!
| sumanthvepa wrote:
| To a first approximation, avoid using raw pointers. They
| should almost never be needed in application code. Use C++'s
| standard library facilities for smart pointers and containers
| instead. They are masterpieces of engineering, and work
| extremely well.
| offbynull wrote:
| I program in modern C++ as well (C++23). I disagree with both
| "very safe" and "fun". Even with 23 there are an innumerable
| number of footguns throughout both the language and the
| standard library. Debugging code is also a mess. Good luck
| getting anything done without paying for an IDE, and even then
| it can be a struggle.
| sumanthvepa wrote:
| Of all the languages I use C/C++ have the least need for paid
| tools.
|
| I use emacs(and vim), make and Boost's b2 build system for
| most of my programming. Although on Windows, Visual Studio is
| a joy to use. On Linux I use gdb. Works fine. I also use
| static analysers and valgrind. But I come from a tradition of
| Unix and living on the command line.
|
| I've tried CLion, because I pay for IntelliJ IDEA for other
| programming (I also have to write Javascript, and Python) But
| while its nice, there is nothing there that I couldn't do
| without.
|
| If you stick to C++ standard libraries, Boost, and turn on
| _all_ warnings, and are reasonably competent, you won 't
| encounter any bugs that are so serious that your program
| crashes inexplicably.
| ks2048 wrote:
| If we have smart AIs to write code, find bugs, and write tests -
| doesn't that mean we can ditch the "safe" languages and go back
| to C?
|
| Thats mostly a joke. But AI-hardened-C seems like it could be
| much better than current-human-only-C.
| rectang wrote:
| It's not any more of a joke than the hee-haw nonsense that
| using an LLM to translate working C code into something else
| will yield a result with fewer bugs.
| constantcrying wrote:
| Why would AI be competent at finding bugs? Most non-trivial
| bugs I find are about unexpected interactions between distinct
| pieces of code. Seems totally unfeasible for a llm to be good
| at.
| andrewstuart wrote:
| My experience of AI as a coding assistant:
|
| for Python - awesome
|
| for golang - awesome
|
| for JavaScript - awesome
|
| for Zig - not awesome, AI doesn't get it, maybe training data set
| too small
|
| for Rust - terrible - AI really doesn't get how it works,
| especially the hard bits
| bsder wrote:
| If you can cobble your program together by copy/pasting Stack
| Overflow snippets, an AI tends to be useful.
|
| Your list reflects that.
| ChrisArchitect wrote:
| Related:
|
| _Translating All C to Rust (TRACTOR)_
|
| https://news.ycombinator.com/item?id=41110269
| nielsbot wrote:
| Wonder if they could also/instead change the compiler similarly
| to Apple:
|
| https://support.apple.com/guide/security/memory-safe-iboot-i...
| TZubiri wrote:
| If you translate the code from C to Rust automatically. Isn't C
| still the source code? Just transpiled to Rust as an intermediate
| before asm?
| rectang wrote:
| Indeed. Think of the AI stage as "llmcc", or maybe "lsdcc" if
| you want to emphasize the hallucination problem.
| constantcrying wrote:
| I don't see this working. There are abstractions in C which are
| not replicable in Rust, without major changes.
|
| In C, having two separate data structures which carry an
| identical pointer and are writing to it is a common occurrence.
| This can not be trivially replicated in rust and will need some
| reasonably clever intervention.
| skywhopper wrote:
| This is a terrible idea. In order to get rid of one specific
| class of bugs, you want to risk introducing logic errors and
| performance issues and make the code harder to maintain.
|
| Not to mention that this quote is incredibly scary. This is
| someone we are trusting to make this decision?
|
| "You can go to any of the LLM websites, start chatting with one
| of the AI chatbots, and all you need to say is 'here's some C
| code, please translate it to safe idiomatic Rust code,' cut,
| paste, and something comes out, and it's often very good, but not
| always," said Dan Wallach, DARPA program manager for TRACTOR, in
| a statement.
___________________________________________________________________
(page generated 2024-08-03 23:00 UTC)