[HN Gopher] Compiling C to Safe Rust, Formalized
       ___________________________________________________________________
        
       Compiling C to Safe Rust, Formalized
        
       Author : love2read
       Score  : 251 points
       Date   : 2024-12-20 23:30 UTC (18 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ljlolel wrote:
       | I wonder how well O3 can do just compiling C to rust in one shot
        
         | saagarjha wrote:
         | Probably pretty bad.
        
         | ojosilva wrote:
         | Funny, I came here to say just the opposite, that I'm glad
         | algorithmic computing is still a thing in research and that not
         | everything is AI.
         | 
         | Ironically, AI is able to produce research-grade algorithms and
         | will probably become an authority on the subject, helping take
         | more traditional CS to the next level.
        
           | Garlef wrote:
           | I think it would make sense to evaluate if the the 'surgical'
           | rewrites mentioned in the article can be carried out by or
           | assisted by an LLM based process.
        
           | sunshowers wrote:
           | There's a lot of code in the world where correctness is a
           | requirement. :)
           | 
           | I agree with the sibling -- I think LLMs may be able to help
           | automate some parts of it, but humans are still 95% of it. At
           | least for now.
        
       | Alifatisk wrote:
       | c2rust.com, but it uses things like libc::c_int
        
         | love2read wrote:
         | C2Rust is mentioned in the second paragraph of the related work
         | section.
        
           | rat87 wrote:
           | How is c2rust doing these days? For practical codebases?
        
       | wffurr wrote:
       | Note that this is done for "existing formally verified C
       | codebases" which is a lot different from typical systems C code
       | which is not formally verified.
        
         | tonetegeatinst wrote:
         | What is the main difference? Can compiler flags force
         | compliance?
        
           | nickpsecurity wrote:
           | Formal verification often requires simplified code in a
           | restrictive style. You might not even be able to use C
           | features or structures that have the performance you want.
           | How theorem provers and brains work are also different enough
           | that making something easy for one often makes it harder for
           | the other.
           | 
           | You can also see this effect in the article on the history of
           | Coverity's analyzer. Real-world code was horrible to deal
           | with vs the academic examples they started with.
           | 
           | https://cacm.acm.org/research/a-few-billion-lines-of-code-
           | la...
        
           | LPisGood wrote:
           | My understanding is that formal verification is a tough goal
           | to achieve and that it usually requires designing the program
           | or the language to be a specific way.
           | 
           | The problem with transpiling C to rust is that unsafe and
           | unverified behavior can be a key property of the behavior
           | resulting program, so there isn't an obvious way to spit out
           | a sort of rustified (CRustified?) binary that matches the
           | behavior of the C program.
        
             | thunkingdeep wrote:
             | Typical term is "Oxidized". I think they feel clever when
             | they do the RiiR thing and say that.
        
           | immibis wrote:
           | "Formally verified" means someone has written a correct
           | mathematical proof that the code has no bugs (the proof is
           | checked by a computer program to make sure it is correct).
           | This is a very high bar.
           | 
           | I'm not sure what it has to do with translating the code to
           | Rust.
        
             | swiftcoder wrote:
             | > I'm not sure what it has to do with translating the code
             | to Rust.
             | 
             | Formally verified C programs typically have had all their
             | undefined behaviour already stamped out during the
             | verification process. That makes mapping it over to Rust a
             | lot simpler.
        
               | tjoff wrote:
               | Translating undefined behavior is the easy part.
        
               | swiftcoder wrote:
               | Maybe, but only if you know the specifics of the
               | environment in which it is executing (i.e. which
               | compiler/architecture-specific behaviours the code
               | actually relies on)
        
               | Filligree wrote:
               | Correctly written C programs don't have undefined
               | behaviour, so a translator can operate assuming there is
               | none.
        
             | DonaldPShimoda wrote:
             | In this case specifically, two separate aspects are being
             | referred to with regard to "formal verification".
             | 
             | The first is that the translation function (from C to Rust)
             | has itself been formally verified. In other words: if you
             | give it a C program that obeys certain necessary pre-
             | conditions, it will give you a safe Rust program. I believe
             | the title of the paper is primarily using "Formalized" with
             | regard to this characteristic.
             | 
             | The second is that the set of programs the authors evaluate
             | their tool on are C programs that have themselves been
             | formally verified. I only barely skimmed the introduction
             | and didn't see it addressed directly there, but I would
             | assume that this is because the pre-conditions necessary
             | for their translation to work are most easily (only?) met
             | by formally verified C programs, where of course the
             | verification has been rendered with respect to properties
             | that would be relevant here (e.g., memory use).
        
         | safercplusplus wrote:
         | And even then, not completely reliably it seems (from Section
         | 2.2):
         | 
         | > The coercions introduced by conversion rules can however lead
         | to subtle semantic differences
         | 
         | The example they give is this C code:                   1
         | uint8_t x[1] = { 0 };         2 uint8_t *y = x;         3 *y =
         | 1;         4 assert(*x == 1); /* SUCCESS */
         | 
         | getting translated to this (safe) Rust code:
         | 1 let x: [u8; 1] = [0; 1];         2 let mut y: Box<[u8]> =
         | Box::new(x);         3 y[0] = 1;         4 assert!(x[0] == 1)
         | /* failure */
         | 
         | So the pointer (iterator) targeting an existing (stack-
         | allocated) array declared on line 2 gets translated to an
         | owning pointer/Box) targeting a (heap-allocated) new copy of
         | the array. So if the original code was somehow counting on the
         | fact that the pointer iterator was actually targeting the array
         | it was assigned to, the translated code may (quietly) not
         | behave correctly.
         | 
         | For comparison, the scpptool (my project) auto-translation (to
         | a memory safe subset of C++) feature would translate it to
         | something like:                   1
         | mse::lh::TNativeArrayReplacement<uint8_t, 1> x = { 0 };
         | 2 mse::lh::TNativeArrayReplacement<uint8_t, 1>::iterator y = x;
         | // implicit conversion from array to iterator         3 *y = 1;
         | 4 assert(*x == 1); /* SUCCESS */ // dereferencing of array
         | supported for compatibility
         | 
         | or if y is subsequently retargeted at another type of array,
         | then line 2 may end up as something like:                   2
         | mse::TAnyRandomAccessIterator<uint8_t> y = x; // implicit
         | conversion from array to iterator
         | 
         | So the OP project may only be converting C code that is already
         | amenable to being converted to safe Rust. But given the
         | challenge of the problem, I can respect the accomplishment and
         | see some potential utility in it.
         | 
         | edit: added translation for line 2 in an alternate hypothetical
         | situation.
        
         | jokoon wrote:
         | What is formally verified C? Why isn't there more of it?
        
           | PhilipRoman wrote:
           | Because it takes a lot of effort. Google Frama-C. On the flip
           | side, it can express not just memory safety constraints, but
           | also correctness proofs.
           | 
           | In this case it's not about Frama-C or similar tools though,
           | see your sibling comments about the caveats.
        
             | zozbot234 wrote:
             | Arguably, generating verified C from a low-level focused
             | subset of F* (Low*, used in the HACL project) is close
             | enough to count as a "similar tool".
        
         | light_hue_1 wrote:
         | Oh, this has so so many more caveats! It's borderline false
         | advertising.
         | 
         | First of all they never translated any C! At all. Zero lines.
         | 
         | They took code written in F* and modified its C compiler to
         | emit Rust. They never had to deal with any actual C code of any
         | complexity, aside from the most trivial code that might
         | theoretically be emitted by a toy compiler (but wasn't even
         | that!). They just pretended they were dealing with C (or a
         | heavily restricted Mini-C).
         | 
         | Even if we accept this, there are plenty of arbitrary
         | restrictions on the C it can even support in principle:
         | 
         | > If the original C program further relies on x, our
         | translation will error out, and will ask the programmer to fix
         | their source code.
         | 
         | That's saying you want C to already be written in a style that
         | makes the Rust borrow checker happy! They're avoiding the
         | actual hard parts of the problem.
         | 
         | > These rules present simplified versions of what we support.
         | Our implementation features several peephole optimizations ...
         | 
         | If I may translate from the language of academia. "We present
         | beautiful rules in figure 4. But in reality, our implementation
         | relies on a large number of hacks."
         | 
         | But it gets worse!
         | 
         | > For overlap cases that can be distinguished statically (as
         | above), we emit a compile-time error; otherwise, the Rust code
         | will panic at runtime.
         | 
         | The resulting code is not guaranteed to be correct! For
         | example, aliasing in C can cause crashes in the resulting Rust
         | code. See comment at the top of page 10. We're going from a
         | formally verified C program to a "It may crash now" Rust
         | program!? That's nuts!
         | 
         | > We apply our methodology to existing formally verified C
         | codebases, namely, the HACL* cryptographic library ...
         | 
         | This is such a blatant lie that I had to return to it. It's
         | designed to catch an unwary reviewer. We often talk about HACL
         | being in verified C because it gets compiled to that. But it's
         | not a C library, it's written in a totally different language.
         | You cannot confuse the two.
         | 
         | I'm not a reviewer for their paper. But if I was, I would
         | strongly fight for rejection.
         | 
         | The fact that they only handle formally verified C is so
         | astronomically far away from being their main problem.
         | 
         | An honest title would be "Compiling a subset of F* to partially
         | Safe Rust, Partially Formalized"
        
           | lou1306 wrote:
           | > An honest title would be "Compiling a subset of F* to
           | partially Safe Rust, Partially Formalized"
           | 
           | Sadly that is the name of the game in conference publishing.
           | Make a big claim in the title and hope the reviewer does not
           | read the fine print.
           | 
           | > If I may translate from the language of academia. "We
           | present beautiful rules in figure 4. But in reality, our
           | implementation relies on a large number of hacks."
           | 
           | Er, this I can understand. Every conference paper only
           | presents you with a simplified view of the contributions.
           | There is no way to describe every corner case within the page
           | limits of a conference paper (specifically, I bet this was
           | submitted to POPL 2025).
        
             | light_hue_1 wrote:
             | > Er, this I can understand. Every conference paper only
             | presents you with a simplified view of the contributions.
             | 
             | In my group I don't allow hacks. Most other good papers I
             | can think of don't do that. It takes new students a while
             | to get used to not maximizing performance at all costs. And
             | if a hack must exist, then it's explained, not as a hack,
             | but as part of the method.
             | 
             | Don't you hate it when you implement some beautiful idea
             | from a paper only to discover that it hardly bares any
             | relationship to the actual method that works?
             | 
             | > There is no way to describe every corner case within the
             | page limits of a conference paper (specifically, I bet this
             | was submitted to POPL 2025).
             | 
             | You're probably right. But POPL allows for unlimited pages
             | in an attached appendix. They can and should describe the
             | full method. Particularly if they're going to play the game
             | of saying that they formalized this.
             | 
             | > Sadly that is the name of the game in conference
             | publishing. Make a big claim in the title and hope the
             | reviewer does not read the fine print.
             | 
             | It's a form of namespace squatting. Do the worst job
             | possible to claim a title that you couldn't execute on, so
             | that when people figure out that problem, they will be
             | forced to cite your non-working solution. I loathe this
             | approach to publishing.
             | 
             | We should punish people for doing. Reject for lack of
             | honesty and your paper can't be published in related
             | conference for X years.
        
               | lou1306 wrote:
               | Well in this case (and if it was POPL, but it's a pretty
               | safe bet considering the format and the timing) it looks
               | like reviewers have indeed rejected it. And I completely
               | agree, it is namespace squatting. Sadly every once in a
               | while it does work (and very effectively), so there is
               | little incentive for the community to punish it.
               | 
               | Sorry if my previous comment came off as dismissive, it's
               | just that I'm getting increasingly disillusioned with the
               | state of things in this space.
        
         | akkad33 wrote:
         | Is Rust formally verified? Not that I know of
        
           | PartiallyTyped wrote:
           | You can always run model checkers like Kani, though even that
           | is limited.
        
             | medo-bear wrote:
             | So no?
        
               | johnisgood wrote:
               | The answer is that it is not.
               | 
               | It frustrates me more than it should, I admit, that
               | people always mention Rust when they talk about safety,
               | but never Ada / SPARK. You want formal verification? Use
               | Ada / SPARK. It has been battle-tested. It has been used
               | for critical systems for a really long time now.
               | 
               | (And a compiler being formally verified vs. being able to
               | write formally verified code means two different things.)
        
               | medo-bear wrote:
               | I think a disclaimer like this should be written with
               | every Rust application, like health warnings on cigarette
               | packets
        
               | johnisgood wrote:
               | At this point, I think that would be better, yes, just
               | because people think Rust is "fully" safe, which is just
               | incorrect. I think the problem was the Rust hype and
               | repeated statements of it being very safe, so we have
               | some undoing to do.
               | 
               | For example if someone on GitHub sees that the project is
               | written in Rust, they are automatically going to assume
               | it is safe, incorrectly so. I do not blame them though.
        
               | keybored wrote:
               | You presumably extend this to every virtual machine or
               | interpreter for every language which is implemented in an
               | unsafe language. When that language claims to be safe
               | (like all such languages claim to be).
               | 
               | That seems excessive and tedious.
        
               | medo-bear wrote:
               | The point, I think, was that "safety" presumptions about
               | Rust are often exaggerated or poorly misunderstood due to
               | hype. That could certainly lead to problems
        
               | woodruffw wrote:
               | I don't think Rust's actual safety properties aren't
               | overhyped, although they may be subject to
               | misunderstanding about their exact extent.
               | 
               | Concretely: spatial and temporal memory safety are good
               | things, and Rust achieves both. It's not unique in this
               | regard, nor is it unique in not having a formal
               | definition.
        
               | keybored wrote:
               | > The point, I think, was that "safety" presumptions
               | about Rust are often exaggerated or poorly misunderstood
               | due to hype. That could certainly lead to problems
               | 
               | Then the point is hypocritical.
               | 
               | Runtimes for safe programming languages have been
               | implemented in unsafe languages since the dawn of safe
               | programming languages, basically.
               | 
               | EDIT: I see now that you are the cigarette warning guy.
               | In that case I don't understand what this coy "I think"
               | speculation is about when you made such a bizarre
               | proclamation on this topic.
        
               | chillingeffect wrote:
               | Rust is to safe as Tesla is to autopilot.
        
               | medo-bear wrote:
               | As Tesla is to Tesla
        
           | gpm wrote:
           | No, but small pieces of it are, and there's active work to
           | extend it
           | 
           | The concept of the borrow checker has been on a simplified
           | version of rust https://people.mpi-
           | sws.org/~dreyer/papers/rustbelt/paper.pdf - work has
           | continued in this area steadily (e.g. see tree borrows)
           | 
           | There's a variety of tools that take rust code and translate
           | it to something a proof system understands, and then checks
           | that it matches a specification. AWS is leading a project to
           | use these to verify the standard library:
           | https://aws.amazon.com/blogs/opensource/verify-the-safety-
           | of...
        
           | antiquark wrote:
           | Rust doesn't have a specification or standard yet, which
           | would make it difficult to formally verify.
           | 
           | https://stackoverflow.com/questions/75743030/is-there-a-
           | spec...
        
             | gpm wrote:
             | It does: https://github.com/ferrocene/specification
             | 
             | It also strikes me as extraordinarily unlikely that any
             | formal verification effort will use the existing
             | specification, and not build their own (using their own
             | formal language) as they go.
        
         | medo-bear wrote:
         | What is the benefit of compiling formally correct code to Rust?
         | It seems that all the possible benefits are already there (if
         | not more)
        
           | gpm wrote:
           | I suppose hypothetically putting it in an easier language to
           | make changes in. Though it's hard to imagine it being easier
           | to make changes to transpiled code than the original.
           | 
           | Alternatively this might be seen as a stepping stone to
           | translating non-formally-verified C to rust, which I
           | understand the US government has expressed a fair bit of
           | interest in.
        
             | medo-bear wrote:
             | Good luck with both of that. Otherwise people whose dayjob
             | is rewritting everything in Rust will soon be out of a job
        
         | cushychicken wrote:
         | Woof. Major unspoken caveat in the title!
        
       | pizlonator wrote:
       | Compiling a tiny subset of C, that is. It might be so tiny as to
       | be useless in practice.
       | 
       | I have low hopes for this kind of approach; it's sure to hit the
       | limits of what's possible with static analysis of C code. Also,
       | choosing Rust as the target makes the problem unnecessarily hard
       | because Rust's ownership model is so foreign to how real C
       | programs work.
        
         | whatisyourwork wrote:
         | It can be good as an interface language. Good for bindings.
        
         | pornel wrote:
         | Rust's ownership model is close enough for translating C. It's
         | just more explicit and strongly typed, so the translation needs
         | to figure out what a more free-form C code is trying to do, and
         | map that to Rust's idioms.
         | 
         | For example, C's buffers obviously have lengths, but in C the
         | length isn't explicitly tied to a pointer, so the translator
         | has to deduce how the C program tracks the length to convert
         | that into a slice. It's non-trivial even if the length is an
         | explicit variable, and even trickier if it's calculated or
         | changes representations (e.g. sometimes used in the form of
         | one-past-the-end pointer).
         | 
         | Other C patterns like `bool should_free_this_pointer` can be
         | translated to Rust's enum of `Owned`/`Borrowed`, but again it
         | requires deducing which allocation is tied to which boolean,
         | and what's the true safe scope of the borrowed variant.
        
           | pizlonator wrote:
           | Rust's ownership model forbids things like doubly linked
           | lists, which C programs use a lot.
           | 
           | That's just one example of how C code is nowhere near meeting
           | Rust's requirements. There are lots of others.
        
             | orf wrote:
             | > Rust's ownership model forbids things like doubly linked
             | lists, which C programs use a lot.
             | 
             | It's literally in the standard library
             | 
             | https://doc.rust-
             | lang.org/std/collections/struct.LinkedList....
        
               | singron wrote:
               | This implementation uses unsafe. You can write a linked
               | list in safe rust (e.g. using Rc), but it probably
               | wouldn't resemble the one you write in C.
               | 
               | In practice, a little unsafe is usually fine. I only
               | bring it up since the article is about translating to
               | safe rust.
        
               | orf wrote:
               | Safe rust isn't "rust code with absolutely 0 unsafe
               | blocks in any possible code path, ever". Rc uses unsafe
               | code every time you construct one, for example.
               | 
               | Unsafe blocks are an escape hatch where you promise that
               | some invariants the compiler cannot verify are in fact
               | true. If the translated code were to use that collection,
               | via its safe interfaces, it would still be "safe rust".
               | 
               | More generally: it's incorrect to say that the rust
               | ownership model forbids X when it ships with an
               | implementation of X, regardless of if and how it uses
               | "unsafe" - especially if "unsafe" is a feature of the
               | ownership model that helps implement it.
        
               | andrewflnr wrote:
               | No one here is confused about what unsafe means. The
               | point is, they're not implemented by following Rust's
               | ownership model, because Rust's ownership model does in
               | fact forbid that kind of thing.
               | 
               | You can nitpick the meaning of "forbids", but as far as
               | the current context is concerned, if you translate code
               | that _implements_ a doubly linked list (as opposed to
               | using one from a library) into Rust, it 's not going to
               | work without unsafe. Or an index-based graph or
               | something.
        
               | oneshtein wrote:
               | It's easy to implement doubly linked lists in safe Rust.
               | Just ensure that every element has one OWNER, to avoid
               | <<use after free>> bugs, or use a garbage collector, like
               | a reference counter.
               | 
               | Unlike C++ or Rust, C has no references, only pointers,
               | so developer must release memory manually at some
               | arbitrary point. This is the problem and source of bugs.
        
               | saghm wrote:
               | While I might agree that it's easy if you use a reference
               | counter, this is not going to be as performant as the
               | typical linked list written in C, which is why the
               | standard library uses unsafe for its implementation of
               | stuff like this. If it were "easy" to just write correct
               | `unsafe`, then it would be easy to do it in C as well.
               | 
               | Note that the converse to this isn't necessarily true!
               | People I trust way more to write unsafe Rust code than me
               | than me have argued that unsafe Rust can be harder than
               | writing C in some ways due to having to uphold certain
               | invariants that don't come up in C. While there are a
               | number of blog posts on the topic that anyone interested
               | can probably find fairly easily by googling "unsafe Rust
               | harder than C", I'll break my usual rule of strongly
               | preferring articles to video content to link a talk from
               | youtube because the speaker is one of those people I
               | mention who I'd trust more than me to write unsafe code
               | and I remember seeing him give this talk at the meetup:
               | https://www.youtube.com/watch?v=QAz-maaH0KM
        
               | bonzini wrote:
               | > unsafe Rust can be harder than writing C in some ways
               | due to having to uphold certain invariants that don't
               | come up in C.
               | 
               | Yes, this is absolutely correct and on top of this you
               | sometimes have to employ tricks to make the compiler
               | infer the right lifetime or type for the abstraction
               | you're providing. On the other hand, again thanks to the
               | abstraction power of Rust compared to C, you can test the
               | resulting code way more easily using for example Miri.
        
               | oconnor663 wrote:
               | More important than whether you use a little unsafe or a
               | lot, is whether you can find a clean boundary above which
               | everything can be safe. Something like a hash function or
               | a block cipher can be piles and piles of assembly under
               | the covers, but since the API is bytes-in-bytes-out, the
               | safety concerns are minimal. On the other hand, memory-
               | mapping a file is just one FFI function call, but the
               | uncontrollable mutability of the whole thing tends to
               | poison everything above it with unsafety.
        
               | imtringued wrote:
               | I don't really see it as a big "owning" of Rust that a
               | complex pointer heavy structure with runtime defined
               | ownership cannot be checked statically. Almost every
               | language that people use doubly linked lists in has a GC,
               | making the discussion kind of meaningless.
               | 
               | So C and C++ are the exceptions to the rule, but how do
               | they make it easy to write doubly linked lists?
               | Obviously, the key assumption is that that the developer
               | makes sure that node->next->prev = node->prev->next =
               | node (Ignoring nullptr).
               | 
               | With this restriction, you can safely write a doubly
               | linked list even without reference counting.
               | 
               | However, this isn't true on the pointer level. The prev
               | pointers could be pointing at the elements in a
               | completely random order. For example tail->prev = head,
               | head->prev = second_last and so on. So that going
               | backwards from the tail is actually going forwards again!
               | 
               | Then there is also the problem of having a pointer from
               | the outside of the linked list pointing directly at a
               | node. You would need a weak pointer, because another
               | pointer could have requested deletion from the linked
               | list, while you're still holding a reference.
               | 
               | If you wanted to support this generic datastructure,
               | rather than the doubly linked list you have in your head,
               | then you would need reference counting in C/C++ as well!
               | 
               | What this tells you, is that Rust isn't restrictive
               | enough to enforce these memory safe contracts. Anyone
               | with access to the individual nodes could break the
               | contract and make the code unsafe.
        
               | quuxplusone wrote:
               | But it's not in C's standard library. So the exercise
               | isn't merely to auto-translate one language's standard
               | library to another language's standard library (say,
               | replacing C++ std::list with Rust LinkedList) -- which
               | would already be very hard. The exercise here is to auto-
               | identify-and-refactor idioms open-coded in one language,
               | into idioms suited for the other language's already-
               | written standard library.
               | 
               | Imagine refactoring your average C program to use GLib
               | for all (all!) of its data structures. Now imagine doing
               | that, but _also_ translating it into Rust at the same
               | time.
        
               | Animats wrote:
               | > The exercise here is to auto-identify-and-refactor
               | idioms open-coded in one language, into idioms suited for
               | the other language's already-written standard library.
               | 
               | That's what LLMs are for - idiom translation. You can't
               | trust them to do it right, though.
               | 
               |  _[Pan et al . 2024] find that while GPT-4 generates code
               | that is more idiomatic than C2Rust, only 61% of it is
               | correct (i.e., compiles and produces the expected
               | result), compared to 95% for C2Rust._
               | 
               | This problem needs both AI-type methods to help with the
               | idioms and formal methods to insure that the guessed
               | idioms correctly capture the semantics.
               | 
               | A big advance in this project is that they can usually
               | translate C pointer arithmetic into Rust slices. That's
               | progress on of one of the hardest parts of the problem.
               | C2Rust did not do that. That system just generates unsafe
               | raw pointer arithmetic, yielding ugly Rust code that
               | replicates C pointer semantics using function calls.
               | 
               | DARPA is funding research in this area under the TRACTOR
               | program. Program awards in April 2025, so this is just
               | getting started. It's encouraging to see so much progress
               | already. This looks do-able.
        
               | saghm wrote:
               | Oh god, I can't even imagine trying to have formally-
               | verified LLM-generated code. It's not surprising that
               | even incremental progress for that would require quite a
               | lot of ingenuity.
        
               | fuhsnn wrote:
               | >That's what LLMs are for - idiom translation. You can't
               | trust them to do it right, though.
               | 
               | Optimizing C compilers also happened to be good at idiom
               | recognition, and we can probably trust them a little
               | more. The OP paper does mention future plan to use clang
               | as well: >We have plans for a libclang-based frontend
               | that consume actual C syntax.
               | 
               | If such transformation can be done at IR level it might
               | be more efficient to be to C-IR > idiom transform to
               | Rust-IR > run safe-checks in Rust-IR > continue
               | compilation in C-IR or Rust-IR or combining both for
               | better optimization properties.
        
               | swiftcoder wrote:
               | I'm definitely bullish on this angle of compiling C down
               | to LLVM assembly, and then "decompiling" it back to Rust
               | (with some reference to the original C to reconstruct
               | high-level idioms like for loops)
        
               | immibis wrote:
               | Actually, LLMs are for generating humorous nonsense.
               | Putting them in charge of the world economy was not
               | intended, but we did it anyway.
        
               | dhosek wrote:
               | Given that in my (small, employer-mandated) explorations
               | with Copilot autocompletions it's offered incorrect
               | suggestions about a third of the time and seems to like
               | to also suggest deprecated APIs, I'm skeptical about the
               | current generation's ability to be useful at even this
               | small task.
        
               | glouwbug wrote:
               | Sure but it takes two copilots to fly a plane
        
               | LeFantome wrote:
               | Have you seen O3?
               | 
               | If your experience with something less than half as good
               | as state-of-the-art is that it worked 66% of the time, I
               | am not sure why you would be so dismissive about the
               | future potential.
        
               | CodesInChaos wrote:
               | Why does C2Rust produce so much incorrect code? Getting
               | 5% wrong sounds terrible, for a 1:1 translation to unsafe
               | Rust. What does it mis-translate?
               | 
               | https://dl.acm.org/doi/pdf/10.1145/3597503.3639226
               | 
               | > As for C2Rust, the 5% unsuccessful translations were
               | due to compilation errors, the majority of them caused by
               | unused imports.
               | 
               | I'm rather confused by what that's supposed to mean,
               | since unused imports cause warnings, not errors in Rust.
        
               | pizlonator wrote:
               | Good luck inferring how to use that from some C
               | programmer's deranged custom linked list.
               | 
               | C programmers don't do linked lists by using libraries,
               | they hand roll them, and often they are more complex than
               | "just" a linked list. Lots of complex stuff out there.
        
             | oneshtein wrote:
             | Rus's ownership model doesn't forbid doubly linked lists.
             | It forbids doubly owned lists, or, in other words, <<use
             | after free>> bug.
        
           | bloppe wrote:
           | Is this sarcastic? There's a reason why the lifetime checker
           | is so annoying to people with a lot of C experience. You
           | absolutely cannot just use your familiar C coding styles in
           | Rust.
        
             | orf wrote:
             | You've misread the comment.
             | 
             | The _ownership model_ is close enough, but the way that
             | model is _expressed_ by the developer is completely
             | arbitrary (and thus completely nuts).
        
           | jerf wrote:
           | That's a classic example of an argument that looks really
           | good from the 30,000 foot view, but when you're actually on
           | the ground... no, basically none of that beautiful idea can
           | actually be manifested into reality.
        
           | smolder wrote:
           | It's not that simple. In fact it's impossible in some cases
           | if you don't sprinkle unsafe everywhere and defeat the
           | purpose. Rusts restrictions are so that it can be statically
           | analyzed to guarantee safety. The superset of all allowable C
           | program behaviors includes lots of things that are impossible
           | to guarantee the safety of through static analysis.
           | 
           | Formally verified C involves sticking to a strict subset of
           | the capabilities of C that is verifiable, much like Rust
           | enforces, so it makes sense that programs meeting that
           | standard could be translated.
        
         | titzer wrote:
         | Meh, you know people are just going to throw LLMs at it and
         | they'll be fine with it hallucinating correctish code by the
         | ton-load. But I agree that they are going to have tough time
         | making idiomatic Rust from random C. Like I said, correct-ish.
        
           | pizlonator wrote:
           | Great way to introduce novel security vulnerabilities!
           | 
           | If that's the Rust way, then I'm all for it. Will make it
           | easier for Fil-C to have some epic kill shots.
        
       | jtrueb wrote:
       | Interesting how higher optimization levels didn't really help
       | speed up rust in the O level comparison
        
         | vlovich123 wrote:
         | As they say it's likely that the code they're outputting is
         | pessimizing rustc's ability. Namely it sounds like they're
         | inlining code as part of the conversion
        
           | jtrueb wrote:
           | Yes, I'm just saying how it kicks in basically immediately
           | (O1).
        
       | nickpsecurity wrote:
       | The thing I wonder about is why we would do this. The technology
       | to _really_ convert industrial-grade apps from C to Rust could
       | probably bullet proof the C apps more easily. They'd just have to
       | do some analyses that fed into existing tooling, like static
       | analyzers and test generators.
       | 
       | Similarly, it they might generate safe wrappers that let teams
       | write new code in Rust side by side with the field-proven C. New
       | code has the full benefits, old code is proven safe, and the
       | interfaces are safer.
       | 
       | A full on translator might be an ideal option. We'd want one
       | language for the codebase in the future. Push-button safety with
       | low, false positives for existing C and C++ is still the greatest
       | need, though. Maybe auto-correcting bad structure right in the C,
       | too, like Google's compiler tool and ForAllSecure's Mayhem do.
        
         | DonaldPShimoda wrote:
         | > The technology to really convert industrial-grade apps from C
         | to Rust could probably bullet proof the C apps more easily.
         | 
         | No, some C programs cannot be made safe. This can be due to
         | dependency on undefined or unspecified behaviors, or it can be
         | because introducing proper safety checks would limit the domain
         | of possible inputs too much to be useful, or other things.
         | 
         | Translating to a safe language can maintain the expressive
         | capabilities of the inputs while statically guaranteeing
         | correct operation at run-time. It is objectively better in
         | these cases.
         | 
         | > field-proven C
         | 
         | I don't think this exists, as the numerous critical
         | vulnerabilities over the years have shown. All we have is C
         | that seems to work pretty well often enough to be useful.
         | 
         | > old code is proven safe
         | 
         | Old code is _assumed to be_ safe _due to luck_ , actually.
         | "Prove" has a specific meaning (especially on a post for a
         | paper about proving things), and the overwhelming majority of C
         | code is not proven to any rigorous mathematical standard. In
         | contrast, the Rust type system has been mathematically proven
         | to be correct.
         | 
         | > A full on translator might be an ideal option.
         | 
         | It depends on what you're willing to give up. If you don't mind
         | losing performance, limiting your domain of inputs or range of
         | outputs, giving up code legibility, and so on, then sure, this
         | can probably be done to some extent. But when you start wanting
         | your translator to be both sound and complete over all of these
         | concerns, you run into problems.
        
           | uecker wrote:
           | > No, some C programs cannot be made safe. This can be due to
           | dependency on undefined or unspecified behaviors, or it can
           | be because introducing proper safety checks would limit the
           | domain of possible inputs too much to be useful, or other
           | things.
           | 
           | You can certainly replace code using undefined behavior in C
           | code by using defined constructs.
           | 
           | > I don't think this exists, as the numerous critical
           | vulnerabilities over the years have shown. All we have is C
           | that seems to work pretty well often enough to be useful.
           | 
           | I think this highly misleading. Some of the most reliable
           | programs I know are written in C and Rust projects will also
           | have critical vulnerabilities. Most vulnerabilities are not
           | actually related to memory safety and the use of unsafe Rust
           | will also lead to memory safety issues in Rust code. So I see
           | some advantage to Rust but to me it is obviously overhyped.
        
           | _flux wrote:
           | > In contrast, the Rust type system has been mathematically
           | proven to be correct.
           | 
           | Is this the case? E.g. the issue "Prove the Rust type system
           | sound" https://github.com/rust-lang/rust/issues/9883 is
           | closed with comment "This will be an open issue forever.
           | Closing." in 2016: https://github.com/rust-
           | lang/rust/issues/9883#issuecomment-2... .
           | 
           | At least nowadays (since 2022) we do have a language
           | specification for Rust: https://ferrous-systems.com/blog/the-
           | ferrocene-language-spec...
        
             | aw1621107 wrote:
             | The closest thing is probably RustBelt [0], which proved
             | the soundness of a subset of Rust that included
             | borrowing/lifetimes. This was later extended to include
             | relaxed memory accesses [1].
             | 
             | Neither of these papers include the trait system,
             | unfortunately, and I'm not aware of another line of
             | research that does (yet?).
             | 
             | [0]: https://dl.acm.org/doi/10.1145/3158154
             | 
             | [1]: https://plv.mpi-sws.org/rustbelt/rbrlx/paper.pdf
        
           | nickpsecurity wrote:
           | There's tools that prove the safety of C code. They include
           | RV-Match, Astree Analzer, and Frama-C. CompCert certifies the
           | compilation. There's tons of tools to find bugs by model
           | checking, test generation, etc. Rust was nowhere near C last
           | I checked in terms of tooling available.
           | 
           | So, back to my original comment, the investment in making a
           | Rust to C transpiler would make a C to safer C transpiler
           | that fed that code into the verification tooling. A small
           | transpiler would be easier to write with huge, immediate
           | gains.
           | 
           | On field proven, there's two angles: spec and code. Many
           | C/C++ apps didn't become usable at all until years of
           | iteration based on feedback from field use. An alternative in
           | Rust might have incorrect specs which. From there, a number
           | have had much code review, static analysis, and testing. They
           | have few to no known bugs at any point. So, there are
           | definitely C/C++ applications out there that are field-proven
           | with a decent level of safety.
           | 
           | (Note: Any transpiler might need to be bug for bug and weird
           | behavior for weird behavior compatible to match the implicit
           | spec in the C/C++ code.)
           | 
           | You're right about the average Rust vs C code, like type
           | system vs correctness. I'd prefer new code be written in
           | safer languages, which I added to my proposal.
           | 
           | If apples to apples on correctness, you'd have to compare C
           | written for verification with Rust. Quite a few C projects
           | have been proven to be correct. Since you're using a
           | compiler, I'll add that tools like Softbound+CETS make C code
           | safe automatically.
           | 
           | I do think Rust brings the cost and effort down quite a lot,
           | though. It's tooling is usually more mature or at least free
           | than most for verifying C.
        
       | zoom6628 wrote:
       | I wonder how this compares to the zig-to-C translate function.
       | 
       | Zig seems to be awesome at creating mixed environs of zig for new
       | code and C for old, and translating or interop, plus being a C
       | compiler.
       | 
       | There must be some very good reasons why Linux kernel maintainers
       | aren't looking to zig as a C replacement rather than Rust.
       | 
       | I don't know enough to even speculate so would appreciate those
       | with more knowledge and experiencing weighing in.
        
         | capitol_ wrote:
         | Maybe because zig isn't memory safe.
        
         | ChristianJacobs wrote:
         | > looking to zig as a C replacement rather than Rust
         | 
         | Rust isn't a "replacement for C", but an addition to it. It's a
         | tool that Torvalds et. al. has recognised the value of and thus
         | it's been allowed in the kernel. The majority of the kernel
         | code will still be written in C.
         | 
         | I'm no kernel maintainer, but I can speculate that two of the
         | main reasons for Rust over Zig are the compile time guarantees
         | that the language provides being better as well as the rate of
         | adoption. There is a lot of work done by many leading companies
         | in the industry to provide Rust native code or maintained Rust
         | bindings for their APIs. Windows devs are re-writing parts of
         | _their_ kernel in Rust. There's a "movement" going on that has
         | been going on for a while. I only hope it doesn't stop.
         | 
         | Maybe the maintainers feel like Zig doesn't give them enough
         | over C to be worth the change? Many of them are still opposed
         | to Rust as well.
        
           | DonaldPShimoda wrote:
           | > Rust isn't a "replacement for C"
           | 
           | Hmm I think to clarify I would say that Rust _is_ intended as
           | a replacement for C in general, but that this isn't how the
           | Linux kernel developers are choosing to use it.
           | 
           | As for why the kernel developers would choose Rust, I would
           | think another one of the primary benefits is that the type
           | system guarantees the absence of a wide class of memory-
           | related errors that are prevalent in C, and this type system
           | (as well as those of its predecessors) has been subjected to
           | significant scrutiny by the academic community over the last
           | couple of decades to help iron out problems. I suspect this
           | is also part of why Rust has a relatively large and
           | passionate community compared to other C alternatives.
        
             | burakemir wrote:
             | Agreed. The large and passionate community may have
             | multiple factors but "things actually work" is probably a
             | factor.
             | 
             | It is hard to get a full picture of how academic research
             | influenced Rust and vice versa. Two examples:
             | 
             | - The use of linearity for tracking ownership in types has
             | been known to academics but had never found its way into a
             | mainstream language.
             | 
             | - researchers in programming language semantics pick Rust
             | as a target of formalization, which was only possible
             | because of design choices around type system. They were
             | able to apply techniques that resulted from decades of
             | trying to get a certified C. They have formalized parts of
             | the standard library, including unsafe Rust, and found and
             | fixed bugs.
             | 
             | So it seems fair to say that academic research on safety
             | for C has contributed much to what makes Rust work today,
             | and in ways that are not possible for C and C++ because
             | these languages do not offer static guarantees where types
             | Transport information about exclusive access to some part
             | of memory.
        
           | josefx wrote:
           | > It's a tool that Torvalds et. al. has recognised the value
           | of and thus it's been allowed in the kernel.
           | 
           | Has there actually been a successfull contribution to the
           | mainline kernel? The last two big projects I heard of (ext2 /
           | Apple drivers) seemed to have issues getting their code
           | accepted.
        
             | shakna wrote:
             | rnull is in the kernel. And I believe one of the generic
             | Realtek drivers is Rust as well.
        
         | spiffyk wrote:
         | Zig is nowhere near mature enough to be considered for the
         | kernel yet. There are breaking changes to it regularly still -
         | which is a good thing for Zig now, but not so good for huge,
         | long-lived codebases like Linux. Also compiler bugs happen.
         | 
         | Saying this as someone who generally likes Zig's direction.
        
         | 3836293648 wrote:
         | Zig isn't 1.0 and has zero backcompat guarantees. It's also
         | barely used anywhere and hasn't proven its value, even if parts
         | of it may seem promising
        
         | devjab wrote:
         | As I understand it most kernel maintainers aren't looking to
         | replace C with anything.
         | 
         | Zig has much better interoperability with C than Rust, but it's
         | not memory safe or stable. I think we'll see quite a lot of Zig
         | adoption in the C world, but I don't think it's in direct
         | competition with Rust as such. In my region of the world nobody
         | is adopting Rust, the C++ people are remaining in C++. There
         | was some interest in Rust originally but it never really caught
         | on in any company I know of. Likely for the same reason Go has
         | become huge in younger companies but will not really make its
         | way into companies which are traditionally Java/C# because even
         | if it made sense technically (and it probably doesn't) it's a
         | huge change management task. Zig is seeing traction for
         | programs without the need for dynamic memory allocation, but
         | not much beyond that.
        
         | Ar-Curunir wrote:
         | Rust is stable and used by a number of big players, while Zig
         | is not stable, and as a result hasn't seen widespread adoption
         | yet
        
       | mbana wrote:
       | Can something like `C2Rust` then use this to generate formally
       | correct code?
       | 
       | Also, is much of the authors did manual or was it run through
       | something to produce the Rust code? If so, where is the code that
       | generates Rust, I do not see any links to any source repos.
        
         | zozbot234 wrote:
         | > If so, where is the code that generates Rust, I do not see
         | any links to any source repos.
         | 
         | The paper states that these developments will be released under
         | open source licenses after the review process is completed,
         | i.e. most likely, after the paper is formally published.
        
       | amenghra wrote:
       | In 2002, a group of researchers presented a paper on Cyclone, a
       | safe dialect of C [1]. While (manually) porting code from C to
       | Cyclone, they found safety bugs in the C code.
       | 
       | These kinds of manual or automated conversation from C to <safer
       | language> therefore have potential not only for increasing
       | adoption of safer languages but also for uncovering existing
       | bugs.
       | 
       | [1] https://www.researchgate.net/profile/James-
       | Cheney-2/publicat...
        
       | alkonaut wrote:
       | If you used a naive translation to Rust, wouldn't you get parts
       | that are safe and parts that are unsafe? So your manual job would
       | need to be only verifying safety in the unsafe regions (same as
       | when writing rust to begin with)?
       | 
       | Seems it would be a win even if the unsafe portion is quite
       | large. Obviously not of it's 90% of the end result.
        
         | CodesInChaos wrote:
         | A naive translation would produce rust code which is almost
         | entirely unsafe (using raw pointers instead of references
         | everywhere). Translating to references is difficult, since C
         | code isn't written with the restrictions of the Rust alias
         | model / borrow-checker in mind.
        
       | p0w3n3d wrote:
       | I wonder, if a C library is working (i.e. is not formally proven
       | to be not having problems, but works in most ways) why shouldn't
       | we translate it using rust unsafe? I would say there is a value
       | in it as rust lacks of libraries generally. And this would not be
       | different from using a dll/so that was written in c and can be
       | unsafe in some circumstances after all
        
       | dmezzetti wrote:
       | Interesting concept. But for a working system in C, why do we
       | need to "convert" it to Rust. Seems like an effort where juice
       | isn't worth the squeeze. Probably will create more problems than
       | we're fixing.
        
       | kelnos wrote:
       | Ugh. They didn't compile any C to Rust. They modified the F*-to-C
       | compiler to emit Rust instead. So they compiled F* to safe Rust.
       | And they couldn't even do that 100% reliably; some valid F*
       | constructs couldn't be translated into Rust properly. They could
       | either translate it into Rust code that wouldn't compile, or
       | translate it into similar-looking Rust code that would compile,
       | but would produce incorrect results.
       | 
       | Flagged, this is just a lie of a title.
        
       | pizza234 wrote:
       | I've ported some projects to Rust (including C, where I've used
       | C2Rust as first step), and I've drawn some conclusions.
       | 
       | 1. Converting a C program to Rust, even if it includes unsafe
       | code, often uncovers bugs quickly thanks to Rust's stringent
       | constraints (bounds checking, strict signatures, etc.).
       | 
       | 2. automated C to Rust conversion is IMO something that will
       | never be solved entirely, because the design of C program is
       | fundamentally different from Rust; such conversions require a
       | significant redesign to be made safe (of course, not all C
       | programs are the same).
       | 
       | 3. in some cases, it's plain impossible to port a program from C
       | to Rust while preserving the exact semantics, because unsafety
       | can be inherent in the design.
       | 
       | That said, tooling is essential to porting, and as tools continue
       | to evolve, the process will become more streamlined.
        
       ___________________________________________________________________
       (page generated 2024-12-21 18:00 UTC)