[HN Gopher] DARPA suggests turning old C code automatically into...
       ___________________________________________________________________
        
       DARPA suggests turning old C code automatically into Rust - using
       AI, of course
        
       Author : jdblair
       Score  : 32 points
       Date   : 2024-08-03 19:44 UTC (3 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | jdblair wrote:
       | I'm really surprised this can work at all in any automated way.
       | You can't just make a line-by-line transcription of a typical c
       | program into rust. Pointers and aliasing are ubiquitous in c
       | programs, concepts that rust explicitly prevents. You have to
       | rethink many typical constructs at a high level to rewrite a c
       | program in rust, unless you wrap the whole thing in "unsafe."
        
         | alex_suzuki wrote:
         | I wonder about this as well, especially im code bases that make
         | heavy use of macros.
        
         | ip26 wrote:
         | For a naive newcomer - could you go line by line, wrap the
         | whole thing in "unsafe", compile to an identical binary, and
         | then slowly peel away the "unsafe" while continuing to validate
         | equivalence?
         | 
         | That would at least get you to as much rust as possible, and
         | then let engineers tackle rethinking just those concepts.
        
           | rolph wrote:
           | you need to create a transpiler philosophy.
           | 
           | transform CtoASM, then ASMtoRust.
           | 
           | what you need to avoid is incompatibilites between different
           | high level languages with a low level intermediary so you
           | arent stuck attempting to convert high level hardware
           | abstraction directly to another high level hardware
           | abstraction.
        
           | j-krieger wrote:
           | There are "warts" with unsafe Rust that would make this feat
           | very difficult. Aliasing rules still apply.
        
           | jcranmer wrote:
           | Converting C to legal (unsafe) Rust is quite possible; there
           | is indeed already a tool that does this
           | (https://github.com/immunant/c2rust).
           | 
           | The problem you run into is that the conversion is so
           | pedantically correct that the resulting code is useless. The
           | result retains all of the problems that the C code has, and
           | is so far from idiomatic Rust that it's easier to toss the
           | code and start from scratch. Progressive lifting on unsafe
           | Rust to safe Rust is a very difficult order, and the tool I
           | mentioned _had_ a tool to do that... which is now abandoned
           | and unmaintained.
           | 
           | At the end of the day, the chief issue with converting to
           | safe Rust is not just that you have to copy semantics over,
           | but you also have to recover a lot of high-level
           | preconditions. Turning pointers into slices is perhaps the
           | _easiest_ task of the lot; given the very strict mutability
           | rules in Rust, you also have to work out when and where to
           | insert things like Cell or Rc or Mutex or what have you, as
           | well as building out lifetime analysis. And chances are the
           | original code doesn 't get all these rules right, which is
           | why there are bugs in the first place.
           | 
           | Solving that problem is the goal of this DARPA proposal, or
           | perhaps more accurately, determining how feasible it is to
           | solve that problem automatically. Personally, I think the
           | better answer is to have a semi-automated approach, where
           | users provide as input the final Rust struct layouts (and
           | possibly parts of the API, to fix lifetime issues), and the
           | tool automates the drudgery of getting the same logic ported
           | to that mapping.
        
             | Animats wrote:
             | Right. Used c2rust once. Been there, done that. The Rust
             | code that comes out is _awful_. Does the same thing as the
             | C code, bugs and all. You don 't get Rust subscript check
             | errors, you get segfaults from unsafe Rust code. What comes
             | out is hopeless for manual "refactoring".
             | 
             | The hardest part may be Rust's affine type rules. Reference
             | use in Rust is totally different than pointers in C/C++.
             | Object parenting relationships are hard to express in Rust.
        
         | alkonaut wrote:
         | A line-by line doesn't require much "AI" either. You could
         | probably make a rough translation in some (mostly unsafe) Rust.
         | 
         | Assume the AI actually needs to figure out lifetimes and so on
         | to be actually useful and make valid programs. Which would be
         | impressive if it does.
        
         | morgante wrote:
         | Line by line is infeasible, which is precisely why you need to
         | use AI to make larger semantic inferences.
         | 
         | You also don't have to one-shot translate everything. One of
         | the valuable things about the Rust compiler is it gives lots of
         | specific information that you can feed back into an LLM to
         | iterate.
         | 
         | I've been working on similar problems for my startup (grit.io)
         | and think C -> Rust is definitely tractable in the near term.
         | Definitely not _easy_ but certainly solvable.
        
           | stogot wrote:
           | What about convert to AST then ask the AI to convert to Rust.
           | Would that work?
        
             | Someone wrote:
             | That's probably the rout they would take, but the C AST
             | won't have ownership attributes. You'd have to discover
             | those yourself.
             | 
             | ASTs also don't have much info on threading (that's more or
             | less limited to "the program starts a thread with entry
             | point _foo_ at some time", "Foo waits for another thread to
             | finish")
        
         | Someone wrote:
         | > Pointers and aliasing are ubiquitous in c programs
         | 
         | If we ignore multi-threaded programs is long term aliasing
         | actually ubiquitous in C programs? For many programs, I would
         | expect most of it to happen within the scope of a single
         | function (and within it, across function calls, but there,
         | borrowing will solve this, won't it?)
         | 
         | If so I would trying to tackle that as one sub-problem (you
         | have to start somewhere), and detecting how data gets shared
         | between threads as another. For the latter, I expect that many
         | programs will have some implicit ownership rule such as "thread
         | T1 puts stuff in queue Q where thread T2 will pick it up" that
         | can be translated as "putting it in queue transfers ownership".
         | 
         | Detecting such rules may not be easy, but doesn't look
         | completely out of reach for me, either, and that would be good
         | enough for a research project.
        
       | rekttrader wrote:
       | Talked about here: https://news.ycombinator.com/item?id=41110269
        
       | verdverm wrote:
       | Russ Cox gave a GopherCon talk on the effort to automatically
       | convert the Go compiler from C to Go (in the early days). Lots of
       | interesting IRL issues / solutions in there.
       | 
       | https://www.youtube.com/watch?v=QIE5nV5fDwA
       | 
       | iirc, they were able to transpile 90%+ (without AI) and manually
       | did the rest
        
       | poikroequ wrote:
       | > Those involved with the oversight of C and C++ have pushed
       | back, arguing that proper adherence to ISO standards and diligent
       | application of testing tools can achieve comparable results
       | without reinventing everything in Rust.
       | 
       | If you stick to extremely stringent coding practices and
       | incorporate third party static verification tools that require
       | riddling your code with proprietary situations, then sure, you
       | can achieve comparable results with C/C++.
       | 
       | Or you can just use Rust.
        
         | dtx1 wrote:
         | It's quite hilarious to see the push back rust gets by the
         | c/c++ community. Obviously their decades of hard work and
         | experience to work with those languages are overriding their
         | reasoning circuits. Who in their right mind would defend a
         | language that has such major and obvious design flaws if a
         | genuine alternative is there.
        
           | constantcrying wrote:
           | Many of the most widely used languages have obvious major
           | design flaws. (JavaScript is one obvious candidate, python is
           | another. How did a language which has no built-in floating
           | point type become the number one language for numerical
           | analysis?)
           | 
           | The real question is what tradeoffs you are making and what
           | you are gaining. Rust makes certain memory safety guarantees
           | about the program at compile time, but at the same time it
           | disallows perfectly safe constructions, which can exist in
           | C++, as well.
        
             | brigadier132 wrote:
             | I think DARPA is making the right decision about choosing
             | Rust as the language for low level systems programming. For
             | national security related matters you'd definitely want the
             | certainty Rust brings.
             | 
             | The reason I personally chose Rust as my go to language for
             | low level programming is that despite learning systems
             | programming in college I pretty much never used it outside
             | of school. Meaning I didn't have any of that knowledge that
             | c and c++ programmers had built up over years of
             | experience. So I decided that instead of having to deal
             | with the unknown skill deficiencies in writing concurrent
             | software and memory management I'd rather just have a
             | compiler scream at me. I don't regret the decision.
             | 
             | Also, I remember writing an async TCP implementation in
             | college with c++ using boost. Rust tooling is just so far
             | ahead of that.
        
             | dtx1 wrote:
             | > JavaScript is one obvious candidate
             | 
             | I don't see anyone defending JavaScript. In fact a whole
             | lot of people are using typescript now because JavaScript
             | is just so bad.
             | 
             | As for python, that's a good point. I guess it's just
             | because it's easy to use and all the numerical stuff is
             | done with c-bindings anyway?
             | 
             | But the C++ Situation is genuinely different. There's a
             | reason governments are now calling upon developers to just
             | let it die already[0]. That design flaw is so bad it's
             | causing genuine harm.
             | 
             | [0] https://www.cisa.gov/news-events/news/urgent-need-
             | memory-saf...
        
         | poikroequ wrote:
         | proprietary annotations*
         | 
         | Sorry, autocorrect, I typed this on my phone.
        
       | tonetegeatinst wrote:
       | I am a total C shill....I'll admit it. I'm just starting to learn
       | it and the only reason I picked it was I wanted a low level
       | language that most systems run.
       | 
       | That said, while I can acknowledge the benefits of memory safety,
       | I would personally choose zig over rust.
       | 
       | All things considered I know you can do some safety check for C
       | using the compiler, and that helps reduce the odds of memory
       | issues.
       | 
       | Idk what rust mail libraries look like( are they even called
       | that?) But I know C's standard library's have made learning stuff
       | easier. Is their any way to know if your libraries in rust are
       | using unsafe code? Will that just spit out compile time errors?
        
       | gosub100 wrote:
       | slight tangent, but I think it would be amazing if AI could write
       | device drivers. Full-featured GPU drivers for, say, OpenBSD. What
       | does it need? Probably the state machines of the GPUs, how to
       | enable various modes, how to feed data in/out of the device at
       | intended speed, how to load shaders.
       | 
       | Why can't AI learn to do that? Its reward could be getting past
       | the initialization and getting to the default state of the
       | driver. It could be trained on hundreds of GPU drivers, not only
       | for the minutiae of how to load values into the control
       | registers, but the bigger picture of what it actually means.
        
         | constantcrying wrote:
         | >Why can't AI learn to do that?
         | 
         | Because it isn't magic.
         | 
         | >It could be trained on hundreds of GPU drivers
         | 
         | Do you know what an AI trained on hundreds of books looks like?
         | Even with millions of books it can not write a coherent
         | chapter, much less an entire book.
         | 
         | This is a genuinely terrible idea. It is exactly the thing AI
         | is _bad_ at, high degree of accuracy over long stretches of
         | output.
        
       | sumanthvepa wrote:
       | I program in C++ and am very happy to do so. Modern C++ is very
       | safe and actually fun to program in. It gives me enormous
       | expressivity, extraordinary performance and safety when I need
       | it. I'm not building space shuttles, I'm building 3D experiences,
       | so I'm not terribly concerned about crashes. But even for me,
       | I've not run into a memory corruption bug in recent memory (10-15
       | years.)
       | 
       | Bash C/C++ all you want. I'm happy to keep using it to my
       | advantage.
        
         | tomrod wrote:
         | What is the learning curve for newbies to avoid critical
         | segfaults? If you still have to walk a tightrope to get code
         | across the board, wouldn't all benefit from a plankway with
         | guardrails instead?
         | 
         | I'm not dissing C or C++ in any way. I've used it. But I
         | recognize there are some major footguns that aren't easy to
         | avoid, causing a much longer learning curve than necessary to
         | get things built. Rust at least seems determined to address
         | them, good or bad!
        
           | sumanthvepa wrote:
           | To a first approximation, avoid using raw pointers. They
           | should almost never be needed in application code. Use C++'s
           | standard library facilities for smart pointers and containers
           | instead. They are masterpieces of engineering, and work
           | extremely well.
        
         | offbynull wrote:
         | I program in modern C++ as well (C++23). I disagree with both
         | "very safe" and "fun". Even with 23 there are an innumerable
         | number of footguns throughout both the language and the
         | standard library. Debugging code is also a mess. Good luck
         | getting anything done without paying for an IDE, and even then
         | it can be a struggle.
        
           | sumanthvepa wrote:
           | Of all the languages I use C/C++ have the least need for paid
           | tools.
           | 
           | I use emacs(and vim), make and Boost's b2 build system for
           | most of my programming. Although on Windows, Visual Studio is
           | a joy to use. On Linux I use gdb. Works fine. I also use
           | static analysers and valgrind. But I come from a tradition of
           | Unix and living on the command line.
           | 
           | I've tried CLion, because I pay for IntelliJ IDEA for other
           | programming (I also have to write Javascript, and Python) But
           | while its nice, there is nothing there that I couldn't do
           | without.
           | 
           | If you stick to C++ standard libraries, Boost, and turn on
           | _all_ warnings, and are reasonably competent, you won 't
           | encounter any bugs that are so serious that your program
           | crashes inexplicably.
        
       | ks2048 wrote:
       | If we have smart AIs to write code, find bugs, and write tests -
       | doesn't that mean we can ditch the "safe" languages and go back
       | to C?
       | 
       | Thats mostly a joke. But AI-hardened-C seems like it could be
       | much better than current-human-only-C.
        
         | rectang wrote:
         | It's not any more of a joke than the hee-haw nonsense that
         | using an LLM to translate working C code into something else
         | will yield a result with fewer bugs.
        
         | constantcrying wrote:
         | Why would AI be competent at finding bugs? Most non-trivial
         | bugs I find are about unexpected interactions between distinct
         | pieces of code. Seems totally unfeasible for a llm to be good
         | at.
        
       | andrewstuart wrote:
       | My experience of AI as a coding assistant:
       | 
       | for Python - awesome
       | 
       | for golang - awesome
       | 
       | for JavaScript - awesome
       | 
       | for Zig - not awesome, AI doesn't get it, maybe training data set
       | too small
       | 
       | for Rust - terrible - AI really doesn't get how it works,
       | especially the hard bits
        
         | bsder wrote:
         | If you can cobble your program together by copy/pasting Stack
         | Overflow snippets, an AI tends to be useful.
         | 
         | Your list reflects that.
        
       | ChrisArchitect wrote:
       | Related:
       | 
       |  _Translating All C to Rust (TRACTOR)_
       | 
       | https://news.ycombinator.com/item?id=41110269
        
       | nielsbot wrote:
       | Wonder if they could also/instead change the compiler similarly
       | to Apple:
       | 
       | https://support.apple.com/guide/security/memory-safe-iboot-i...
        
       | TZubiri wrote:
       | If you translate the code from C to Rust automatically. Isn't C
       | still the source code? Just transpiled to Rust as an intermediate
       | before asm?
        
         | rectang wrote:
         | Indeed. Think of the AI stage as "llmcc", or maybe "lsdcc" if
         | you want to emphasize the hallucination problem.
        
       | constantcrying wrote:
       | I don't see this working. There are abstractions in C which are
       | not replicable in Rust, without major changes.
       | 
       | In C, having two separate data structures which carry an
       | identical pointer and are writing to it is a common occurrence.
       | This can not be trivially replicated in rust and will need some
       | reasonably clever intervention.
        
       | skywhopper wrote:
       | This is a terrible idea. In order to get rid of one specific
       | class of bugs, you want to risk introducing logic errors and
       | performance issues and make the code harder to maintain.
       | 
       | Not to mention that this quote is incredibly scary. This is
       | someone we are trusting to make this decision?
       | 
       | "You can go to any of the LLM websites, start chatting with one
       | of the AI chatbots, and all you need to say is 'here's some C
       | code, please translate it to safe idiomatic Rust code,' cut,
       | paste, and something comes out, and it's often very good, but not
       | always," said Dan Wallach, DARPA program manager for TRACTOR, in
       | a statement.
        
       ___________________________________________________________________
       (page generated 2024-08-03 23:00 UTC)