[HN Gopher] Compiling C to Safe Rust, Formalized
___________________________________________________________________
Compiling C to Safe Rust, Formalized
Author : love2read
Score : 251 points
Date : 2024-12-20 23:30 UTC (18 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ljlolel wrote:
| I wonder how well O3 can do just compiling C to rust in one shot
| saagarjha wrote:
| Probably pretty bad.
| ojosilva wrote:
| Funny, I came here to say just the opposite, that I'm glad
| algorithmic computing is still a thing in research and that not
| everything is AI.
|
| Ironically, AI is able to produce research-grade algorithms and
| will probably become an authority on the subject, helping take
| more traditional CS to the next level.
| Garlef wrote:
| I think it would make sense to evaluate if the the 'surgical'
| rewrites mentioned in the article can be carried out by or
| assisted by an LLM based process.
| sunshowers wrote:
| There's a lot of code in the world where correctness is a
| requirement. :)
|
| I agree with the sibling -- I think LLMs may be able to help
| automate some parts of it, but humans are still 95% of it. At
| least for now.
| Alifatisk wrote:
| c2rust.com, but it uses things like libc::c_int
| love2read wrote:
| C2Rust is mentioned in the second paragraph of the related work
| section.
| rat87 wrote:
| How is c2rust doing these days? For practical codebases?
| wffurr wrote:
| Note that this is done for "existing formally verified C
| codebases" which is a lot different from typical systems C code
| which is not formally verified.
| tonetegeatinst wrote:
| What is the main difference? Can compiler flags force
| compliance?
| nickpsecurity wrote:
| Formal verification often requires simplified code in a
| restrictive style. You might not even be able to use C
| features or structures that have the performance you want.
| How theorem provers and brains work are also different enough
| that making something easy for one often makes it harder for
| the other.
|
| You can also see this effect in the article on the history of
| Coverity's analyzer. Real-world code was horrible to deal
| with vs the academic examples they started with.
|
| https://cacm.acm.org/research/a-few-billion-lines-of-code-
| la...
| LPisGood wrote:
| My understanding is that formal verification is a tough goal
| to achieve and that it usually requires designing the program
| or the language to be a specific way.
|
| The problem with transpiling C to rust is that unsafe and
| unverified behavior can be a key property of the behavior
| resulting program, so there isn't an obvious way to spit out
| a sort of rustified (CRustified?) binary that matches the
| behavior of the C program.
| thunkingdeep wrote:
| Typical term is "Oxidized". I think they feel clever when
| they do the RiiR thing and say that.
| immibis wrote:
| "Formally verified" means someone has written a correct
| mathematical proof that the code has no bugs (the proof is
| checked by a computer program to make sure it is correct).
| This is a very high bar.
|
| I'm not sure what it has to do with translating the code to
| Rust.
| swiftcoder wrote:
| > I'm not sure what it has to do with translating the code
| to Rust.
|
| Formally verified C programs typically have had all their
| undefined behaviour already stamped out during the
| verification process. That makes mapping it over to Rust a
| lot simpler.
| tjoff wrote:
| Translating undefined behavior is the easy part.
| swiftcoder wrote:
| Maybe, but only if you know the specifics of the
| environment in which it is executing (i.e. which
| compiler/architecture-specific behaviours the code
| actually relies on)
| Filligree wrote:
| Correctly written C programs don't have undefined
| behaviour, so a translator can operate assuming there is
| none.
| DonaldPShimoda wrote:
| In this case specifically, two separate aspects are being
| referred to with regard to "formal verification".
|
| The first is that the translation function (from C to Rust)
| has itself been formally verified. In other words: if you
| give it a C program that obeys certain necessary pre-
| conditions, it will give you a safe Rust program. I believe
| the title of the paper is primarily using "Formalized" with
| regard to this characteristic.
|
| The second is that the set of programs the authors evaluate
| their tool on are C programs that have themselves been
| formally verified. I only barely skimmed the introduction
| and didn't see it addressed directly there, but I would
| assume that this is because the pre-conditions necessary
| for their translation to work are most easily (only?) met
| by formally verified C programs, where of course the
| verification has been rendered with respect to properties
| that would be relevant here (e.g., memory use).
| safercplusplus wrote:
| And even then, not completely reliably it seems (from Section
| 2.2):
|
| > The coercions introduced by conversion rules can however lead
| to subtle semantic differences
|
| The example they give is this C code: 1
| uint8_t x[1] = { 0 }; 2 uint8_t *y = x; 3 *y =
| 1; 4 assert(*x == 1); /* SUCCESS */
|
| getting translated to this (safe) Rust code:
| 1 let x: [u8; 1] = [0; 1]; 2 let mut y: Box<[u8]> =
| Box::new(x); 3 y[0] = 1; 4 assert!(x[0] == 1)
| /* failure */
|
| So the pointer (iterator) targeting an existing (stack-
| allocated) array declared on line 2 gets translated to an
| owning pointer/Box) targeting a (heap-allocated) new copy of
| the array. So if the original code was somehow counting on the
| fact that the pointer iterator was actually targeting the array
| it was assigned to, the translated code may (quietly) not
| behave correctly.
|
| For comparison, the scpptool (my project) auto-translation (to
| a memory safe subset of C++) feature would translate it to
| something like: 1
| mse::lh::TNativeArrayReplacement<uint8_t, 1> x = { 0 };
| 2 mse::lh::TNativeArrayReplacement<uint8_t, 1>::iterator y = x;
| // implicit conversion from array to iterator 3 *y = 1;
| 4 assert(*x == 1); /* SUCCESS */ // dereferencing of array
| supported for compatibility
|
| or if y is subsequently retargeted at another type of array,
| then line 2 may end up as something like: 2
| mse::TAnyRandomAccessIterator<uint8_t> y = x; // implicit
| conversion from array to iterator
|
| So the OP project may only be converting C code that is already
| amenable to being converted to safe Rust. But given the
| challenge of the problem, I can respect the accomplishment and
| see some potential utility in it.
|
| edit: added translation for line 2 in an alternate hypothetical
| situation.
| jokoon wrote:
| What is formally verified C? Why isn't there more of it?
| PhilipRoman wrote:
| Because it takes a lot of effort. Google Frama-C. On the flip
| side, it can express not just memory safety constraints, but
| also correctness proofs.
|
| In this case it's not about Frama-C or similar tools though,
| see your sibling comments about the caveats.
| zozbot234 wrote:
| Arguably, generating verified C from a low-level focused
| subset of F* (Low*, used in the HACL project) is close
| enough to count as a "similar tool".
| light_hue_1 wrote:
| Oh, this has so so many more caveats! It's borderline false
| advertising.
|
| First of all they never translated any C! At all. Zero lines.
|
| They took code written in F* and modified its C compiler to
| emit Rust. They never had to deal with any actual C code of any
| complexity, aside from the most trivial code that might
| theoretically be emitted by a toy compiler (but wasn't even
| that!). They just pretended they were dealing with C (or a
| heavily restricted Mini-C).
|
| Even if we accept this, there are plenty of arbitrary
| restrictions on the C it can even support in principle:
|
| > If the original C program further relies on x, our
| translation will error out, and will ask the programmer to fix
| their source code.
|
| That's saying you want C to already be written in a style that
| makes the Rust borrow checker happy! They're avoiding the
| actual hard parts of the problem.
|
| > These rules present simplified versions of what we support.
| Our implementation features several peephole optimizations ...
|
| If I may translate from the language of academia. "We present
| beautiful rules in figure 4. But in reality, our implementation
| relies on a large number of hacks."
|
| But it gets worse!
|
| > For overlap cases that can be distinguished statically (as
| above), we emit a compile-time error; otherwise, the Rust code
| will panic at runtime.
|
| The resulting code is not guaranteed to be correct! For
| example, aliasing in C can cause crashes in the resulting Rust
| code. See comment at the top of page 10. We're going from a
| formally verified C program to a "It may crash now" Rust
| program!? That's nuts!
|
| > We apply our methodology to existing formally verified C
| codebases, namely, the HACL* cryptographic library ...
|
| This is such a blatant lie that I had to return to it. It's
| designed to catch an unwary reviewer. We often talk about HACL
| being in verified C because it gets compiled to that. But it's
| not a C library, it's written in a totally different language.
| You cannot confuse the two.
|
| I'm not a reviewer for their paper. But if I was, I would
| strongly fight for rejection.
|
| The fact that they only handle formally verified C is so
| astronomically far away from being their main problem.
|
| An honest title would be "Compiling a subset of F* to partially
| Safe Rust, Partially Formalized"
| lou1306 wrote:
| > An honest title would be "Compiling a subset of F* to
| partially Safe Rust, Partially Formalized"
|
| Sadly that is the name of the game in conference publishing.
| Make a big claim in the title and hope the reviewer does not
| read the fine print.
|
| > If I may translate from the language of academia. "We
| present beautiful rules in figure 4. But in reality, our
| implementation relies on a large number of hacks."
|
| Er, this I can understand. Every conference paper only
| presents you with a simplified view of the contributions.
| There is no way to describe every corner case within the page
| limits of a conference paper (specifically, I bet this was
| submitted to POPL 2025).
| light_hue_1 wrote:
| > Er, this I can understand. Every conference paper only
| presents you with a simplified view of the contributions.
|
| In my group I don't allow hacks. Most other good papers I
| can think of don't do that. It takes new students a while
| to get used to not maximizing performance at all costs. And
| if a hack must exist, then it's explained, not as a hack,
| but as part of the method.
|
| Don't you hate it when you implement some beautiful idea
| from a paper only to discover that it hardly bares any
| relationship to the actual method that works?
|
| > There is no way to describe every corner case within the
| page limits of a conference paper (specifically, I bet this
| was submitted to POPL 2025).
|
| You're probably right. But POPL allows for unlimited pages
| in an attached appendix. They can and should describe the
| full method. Particularly if they're going to play the game
| of saying that they formalized this.
|
| > Sadly that is the name of the game in conference
| publishing. Make a big claim in the title and hope the
| reviewer does not read the fine print.
|
| It's a form of namespace squatting. Do the worst job
| possible to claim a title that you couldn't execute on, so
| that when people figure out that problem, they will be
| forced to cite your non-working solution. I loathe this
| approach to publishing.
|
| We should punish people for doing. Reject for lack of
| honesty and your paper can't be published in related
| conference for X years.
| lou1306 wrote:
| Well in this case (and if it was POPL, but it's a pretty
| safe bet considering the format and the timing) it looks
| like reviewers have indeed rejected it. And I completely
| agree, it is namespace squatting. Sadly every once in a
| while it does work (and very effectively), so there is
| little incentive for the community to punish it.
|
| Sorry if my previous comment came off as dismissive, it's
| just that I'm getting increasingly disillusioned with the
| state of things in this space.
| akkad33 wrote:
| Is Rust formally verified? Not that I know of
| PartiallyTyped wrote:
| You can always run model checkers like Kani, though even that
| is limited.
| medo-bear wrote:
| So no?
| johnisgood wrote:
| The answer is that it is not.
|
| It frustrates me more than it should, I admit, that
| people always mention Rust when they talk about safety,
| but never Ada / SPARK. You want formal verification? Use
| Ada / SPARK. It has been battle-tested. It has been used
| for critical systems for a really long time now.
|
| (And a compiler being formally verified vs. being able to
| write formally verified code means two different things.)
| medo-bear wrote:
| I think a disclaimer like this should be written with
| every Rust application, like health warnings on cigarette
| packets
| johnisgood wrote:
| At this point, I think that would be better, yes, just
| because people think Rust is "fully" safe, which is just
| incorrect. I think the problem was the Rust hype and
| repeated statements of it being very safe, so we have
| some undoing to do.
|
| For example if someone on GitHub sees that the project is
| written in Rust, they are automatically going to assume
| it is safe, incorrectly so. I do not blame them though.
| keybored wrote:
| You presumably extend this to every virtual machine or
| interpreter for every language which is implemented in an
| unsafe language. When that language claims to be safe
| (like all such languages claim to be).
|
| That seems excessive and tedious.
| medo-bear wrote:
| The point, I think, was that "safety" presumptions about
| Rust are often exaggerated or poorly misunderstood due to
| hype. That could certainly lead to problems
| woodruffw wrote:
| I don't think Rust's actual safety properties aren't
| overhyped, although they may be subject to
| misunderstanding about their exact extent.
|
| Concretely: spatial and temporal memory safety are good
| things, and Rust achieves both. It's not unique in this
| regard, nor is it unique in not having a formal
| definition.
| keybored wrote:
| > The point, I think, was that "safety" presumptions
| about Rust are often exaggerated or poorly misunderstood
| due to hype. That could certainly lead to problems
|
| Then the point is hypocritical.
|
| Runtimes for safe programming languages have been
| implemented in unsafe languages since the dawn of safe
| programming languages, basically.
|
| EDIT: I see now that you are the cigarette warning guy.
| In that case I don't understand what this coy "I think"
| speculation is about when you made such a bizarre
| proclamation on this topic.
| chillingeffect wrote:
| Rust is to safe as Tesla is to autopilot.
| medo-bear wrote:
| As Tesla is to Tesla
| gpm wrote:
| No, but small pieces of it are, and there's active work to
| extend it
|
| The concept of the borrow checker has been on a simplified
| version of rust https://people.mpi-
| sws.org/~dreyer/papers/rustbelt/paper.pdf - work has
| continued in this area steadily (e.g. see tree borrows)
|
| There's a variety of tools that take rust code and translate
| it to something a proof system understands, and then checks
| that it matches a specification. AWS is leading a project to
| use these to verify the standard library:
| https://aws.amazon.com/blogs/opensource/verify-the-safety-
| of...
| antiquark wrote:
| Rust doesn't have a specification or standard yet, which
| would make it difficult to formally verify.
|
| https://stackoverflow.com/questions/75743030/is-there-a-
| spec...
| gpm wrote:
| It does: https://github.com/ferrocene/specification
|
| It also strikes me as extraordinarily unlikely that any
| formal verification effort will use the existing
| specification, and not build their own (using their own
| formal language) as they go.
| medo-bear wrote:
| What is the benefit of compiling formally correct code to Rust?
| It seems that all the possible benefits are already there (if
| not more)
| gpm wrote:
| I suppose hypothetically putting it in an easier language to
| make changes in. Though it's hard to imagine it being easier
| to make changes to transpiled code than the original.
|
| Alternatively this might be seen as a stepping stone to
| translating non-formally-verified C to rust, which I
| understand the US government has expressed a fair bit of
| interest in.
| medo-bear wrote:
| Good luck with both of that. Otherwise people whose dayjob
| is rewritting everything in Rust will soon be out of a job
| cushychicken wrote:
| Woof. Major unspoken caveat in the title!
| pizlonator wrote:
| Compiling a tiny subset of C, that is. It might be so tiny as to
| be useless in practice.
|
| I have low hopes for this kind of approach; it's sure to hit the
| limits of what's possible with static analysis of C code. Also,
| choosing Rust as the target makes the problem unnecessarily hard
| because Rust's ownership model is so foreign to how real C
| programs work.
| whatisyourwork wrote:
| It can be good as an interface language. Good for bindings.
| pornel wrote:
| Rust's ownership model is close enough for translating C. It's
| just more explicit and strongly typed, so the translation needs
| to figure out what a more free-form C code is trying to do, and
| map that to Rust's idioms.
|
| For example, C's buffers obviously have lengths, but in C the
| length isn't explicitly tied to a pointer, so the translator
| has to deduce how the C program tracks the length to convert
| that into a slice. It's non-trivial even if the length is an
| explicit variable, and even trickier if it's calculated or
| changes representations (e.g. sometimes used in the form of
| one-past-the-end pointer).
|
| Other C patterns like `bool should_free_this_pointer` can be
| translated to Rust's enum of `Owned`/`Borrowed`, but again it
| requires deducing which allocation is tied to which boolean,
| and what's the true safe scope of the borrowed variant.
| pizlonator wrote:
| Rust's ownership model forbids things like doubly linked
| lists, which C programs use a lot.
|
| That's just one example of how C code is nowhere near meeting
| Rust's requirements. There are lots of others.
| orf wrote:
| > Rust's ownership model forbids things like doubly linked
| lists, which C programs use a lot.
|
| It's literally in the standard library
|
| https://doc.rust-
| lang.org/std/collections/struct.LinkedList....
| singron wrote:
| This implementation uses unsafe. You can write a linked
| list in safe rust (e.g. using Rc), but it probably
| wouldn't resemble the one you write in C.
|
| In practice, a little unsafe is usually fine. I only
| bring it up since the article is about translating to
| safe rust.
| orf wrote:
| Safe rust isn't "rust code with absolutely 0 unsafe
| blocks in any possible code path, ever". Rc uses unsafe
| code every time you construct one, for example.
|
| Unsafe blocks are an escape hatch where you promise that
| some invariants the compiler cannot verify are in fact
| true. If the translated code were to use that collection,
| via its safe interfaces, it would still be "safe rust".
|
| More generally: it's incorrect to say that the rust
| ownership model forbids X when it ships with an
| implementation of X, regardless of if and how it uses
| "unsafe" - especially if "unsafe" is a feature of the
| ownership model that helps implement it.
| andrewflnr wrote:
| No one here is confused about what unsafe means. The
| point is, they're not implemented by following Rust's
| ownership model, because Rust's ownership model does in
| fact forbid that kind of thing.
|
| You can nitpick the meaning of "forbids", but as far as
| the current context is concerned, if you translate code
| that _implements_ a doubly linked list (as opposed to
| using one from a library) into Rust, it 's not going to
| work without unsafe. Or an index-based graph or
| something.
| oneshtein wrote:
| It's easy to implement doubly linked lists in safe Rust.
| Just ensure that every element has one OWNER, to avoid
| <<use after free>> bugs, or use a garbage collector, like
| a reference counter.
|
| Unlike C++ or Rust, C has no references, only pointers,
| so developer must release memory manually at some
| arbitrary point. This is the problem and source of bugs.
| saghm wrote:
| While I might agree that it's easy if you use a reference
| counter, this is not going to be as performant as the
| typical linked list written in C, which is why the
| standard library uses unsafe for its implementation of
| stuff like this. If it were "easy" to just write correct
| `unsafe`, then it would be easy to do it in C as well.
|
| Note that the converse to this isn't necessarily true!
| People I trust way more to write unsafe Rust code than me
| than me have argued that unsafe Rust can be harder than
| writing C in some ways due to having to uphold certain
| invariants that don't come up in C. While there are a
| number of blog posts on the topic that anyone interested
| can probably find fairly easily by googling "unsafe Rust
| harder than C", I'll break my usual rule of strongly
| preferring articles to video content to link a talk from
| youtube because the speaker is one of those people I
| mention who I'd trust more than me to write unsafe code
| and I remember seeing him give this talk at the meetup:
| https://www.youtube.com/watch?v=QAz-maaH0KM
| bonzini wrote:
| > unsafe Rust can be harder than writing C in some ways
| due to having to uphold certain invariants that don't
| come up in C.
|
| Yes, this is absolutely correct and on top of this you
| sometimes have to employ tricks to make the compiler
| infer the right lifetime or type for the abstraction
| you're providing. On the other hand, again thanks to the
| abstraction power of Rust compared to C, you can test the
| resulting code way more easily using for example Miri.
| oconnor663 wrote:
| More important than whether you use a little unsafe or a
| lot, is whether you can find a clean boundary above which
| everything can be safe. Something like a hash function or
| a block cipher can be piles and piles of assembly under
| the covers, but since the API is bytes-in-bytes-out, the
| safety concerns are minimal. On the other hand, memory-
| mapping a file is just one FFI function call, but the
| uncontrollable mutability of the whole thing tends to
| poison everything above it with unsafety.
| imtringued wrote:
| I don't really see it as a big "owning" of Rust that a
| complex pointer heavy structure with runtime defined
| ownership cannot be checked statically. Almost every
| language that people use doubly linked lists in has a GC,
| making the discussion kind of meaningless.
|
| So C and C++ are the exceptions to the rule, but how do
| they make it easy to write doubly linked lists?
| Obviously, the key assumption is that that the developer
| makes sure that node->next->prev = node->prev->next =
| node (Ignoring nullptr).
|
| With this restriction, you can safely write a doubly
| linked list even without reference counting.
|
| However, this isn't true on the pointer level. The prev
| pointers could be pointing at the elements in a
| completely random order. For example tail->prev = head,
| head->prev = second_last and so on. So that going
| backwards from the tail is actually going forwards again!
|
| Then there is also the problem of having a pointer from
| the outside of the linked list pointing directly at a
| node. You would need a weak pointer, because another
| pointer could have requested deletion from the linked
| list, while you're still holding a reference.
|
| If you wanted to support this generic datastructure,
| rather than the doubly linked list you have in your head,
| then you would need reference counting in C/C++ as well!
|
| What this tells you, is that Rust isn't restrictive
| enough to enforce these memory safe contracts. Anyone
| with access to the individual nodes could break the
| contract and make the code unsafe.
| quuxplusone wrote:
| But it's not in C's standard library. So the exercise
| isn't merely to auto-translate one language's standard
| library to another language's standard library (say,
| replacing C++ std::list with Rust LinkedList) -- which
| would already be very hard. The exercise here is to auto-
| identify-and-refactor idioms open-coded in one language,
| into idioms suited for the other language's already-
| written standard library.
|
| Imagine refactoring your average C program to use GLib
| for all (all!) of its data structures. Now imagine doing
| that, but _also_ translating it into Rust at the same
| time.
| Animats wrote:
| > The exercise here is to auto-identify-and-refactor
| idioms open-coded in one language, into idioms suited for
| the other language's already-written standard library.
|
| That's what LLMs are for - idiom translation. You can't
| trust them to do it right, though.
|
| _[Pan et al . 2024] find that while GPT-4 generates code
| that is more idiomatic than C2Rust, only 61% of it is
| correct (i.e., compiles and produces the expected
| result), compared to 95% for C2Rust._
|
| This problem needs both AI-type methods to help with the
| idioms and formal methods to insure that the guessed
| idioms correctly capture the semantics.
|
| A big advance in this project is that they can usually
| translate C pointer arithmetic into Rust slices. That's
| progress on of one of the hardest parts of the problem.
| C2Rust did not do that. That system just generates unsafe
| raw pointer arithmetic, yielding ugly Rust code that
| replicates C pointer semantics using function calls.
|
| DARPA is funding research in this area under the TRACTOR
| program. Program awards in April 2025, so this is just
| getting started. It's encouraging to see so much progress
| already. This looks do-able.
| saghm wrote:
| Oh god, I can't even imagine trying to have formally-
| verified LLM-generated code. It's not surprising that
| even incremental progress for that would require quite a
| lot of ingenuity.
| fuhsnn wrote:
| >That's what LLMs are for - idiom translation. You can't
| trust them to do it right, though.
|
| Optimizing C compilers also happened to be good at idiom
| recognition, and we can probably trust them a little
| more. The OP paper does mention future plan to use clang
| as well: >We have plans for a libclang-based frontend
| that consume actual C syntax.
|
| If such transformation can be done at IR level it might
| be more efficient to be to C-IR > idiom transform to
| Rust-IR > run safe-checks in Rust-IR > continue
| compilation in C-IR or Rust-IR or combining both for
| better optimization properties.
| swiftcoder wrote:
| I'm definitely bullish on this angle of compiling C down
| to LLVM assembly, and then "decompiling" it back to Rust
| (with some reference to the original C to reconstruct
| high-level idioms like for loops)
| immibis wrote:
| Actually, LLMs are for generating humorous nonsense.
| Putting them in charge of the world economy was not
| intended, but we did it anyway.
| dhosek wrote:
| Given that in my (small, employer-mandated) explorations
| with Copilot autocompletions it's offered incorrect
| suggestions about a third of the time and seems to like
| to also suggest deprecated APIs, I'm skeptical about the
| current generation's ability to be useful at even this
| small task.
| glouwbug wrote:
| Sure but it takes two copilots to fly a plane
| LeFantome wrote:
| Have you seen O3?
|
| If your experience with something less than half as good
| as state-of-the-art is that it worked 66% of the time, I
| am not sure why you would be so dismissive about the
| future potential.
| CodesInChaos wrote:
| Why does C2Rust produce so much incorrect code? Getting
| 5% wrong sounds terrible, for a 1:1 translation to unsafe
| Rust. What does it mis-translate?
|
| https://dl.acm.org/doi/pdf/10.1145/3597503.3639226
|
| > As for C2Rust, the 5% unsuccessful translations were
| due to compilation errors, the majority of them caused by
| unused imports.
|
| I'm rather confused by what that's supposed to mean,
| since unused imports cause warnings, not errors in Rust.
| pizlonator wrote:
| Good luck inferring how to use that from some C
| programmer's deranged custom linked list.
|
| C programmers don't do linked lists by using libraries,
| they hand roll them, and often they are more complex than
| "just" a linked list. Lots of complex stuff out there.
| oneshtein wrote:
| Rus's ownership model doesn't forbid doubly linked lists.
| It forbids doubly owned lists, or, in other words, <<use
| after free>> bug.
| bloppe wrote:
| Is this sarcastic? There's a reason why the lifetime checker
| is so annoying to people with a lot of C experience. You
| absolutely cannot just use your familiar C coding styles in
| Rust.
| orf wrote:
| You've misread the comment.
|
| The _ownership model_ is close enough, but the way that
| model is _expressed_ by the developer is completely
| arbitrary (and thus completely nuts).
| jerf wrote:
| That's a classic example of an argument that looks really
| good from the 30,000 foot view, but when you're actually on
| the ground... no, basically none of that beautiful idea can
| actually be manifested into reality.
| smolder wrote:
| It's not that simple. In fact it's impossible in some cases
| if you don't sprinkle unsafe everywhere and defeat the
| purpose. Rusts restrictions are so that it can be statically
| analyzed to guarantee safety. The superset of all allowable C
| program behaviors includes lots of things that are impossible
| to guarantee the safety of through static analysis.
|
| Formally verified C involves sticking to a strict subset of
| the capabilities of C that is verifiable, much like Rust
| enforces, so it makes sense that programs meeting that
| standard could be translated.
| titzer wrote:
| Meh, you know people are just going to throw LLMs at it and
| they'll be fine with it hallucinating correctish code by the
| ton-load. But I agree that they are going to have tough time
| making idiomatic Rust from random C. Like I said, correct-ish.
| pizlonator wrote:
| Great way to introduce novel security vulnerabilities!
|
| If that's the Rust way, then I'm all for it. Will make it
| easier for Fil-C to have some epic kill shots.
| jtrueb wrote:
| Interesting how higher optimization levels didn't really help
| speed up rust in the O level comparison
| vlovich123 wrote:
| As they say it's likely that the code they're outputting is
| pessimizing rustc's ability. Namely it sounds like they're
| inlining code as part of the conversion
| jtrueb wrote:
| Yes, I'm just saying how it kicks in basically immediately
| (O1).
| nickpsecurity wrote:
| The thing I wonder about is why we would do this. The technology
| to _really_ convert industrial-grade apps from C to Rust could
| probably bullet proof the C apps more easily. They'd just have to
| do some analyses that fed into existing tooling, like static
| analyzers and test generators.
|
| Similarly, it they might generate safe wrappers that let teams
| write new code in Rust side by side with the field-proven C. New
| code has the full benefits, old code is proven safe, and the
| interfaces are safer.
|
| A full on translator might be an ideal option. We'd want one
| language for the codebase in the future. Push-button safety with
| low, false positives for existing C and C++ is still the greatest
| need, though. Maybe auto-correcting bad structure right in the C,
| too, like Google's compiler tool and ForAllSecure's Mayhem do.
| DonaldPShimoda wrote:
| > The technology to really convert industrial-grade apps from C
| to Rust could probably bullet proof the C apps more easily.
|
| No, some C programs cannot be made safe. This can be due to
| dependency on undefined or unspecified behaviors, or it can be
| because introducing proper safety checks would limit the domain
| of possible inputs too much to be useful, or other things.
|
| Translating to a safe language can maintain the expressive
| capabilities of the inputs while statically guaranteeing
| correct operation at run-time. It is objectively better in
| these cases.
|
| > field-proven C
|
| I don't think this exists, as the numerous critical
| vulnerabilities over the years have shown. All we have is C
| that seems to work pretty well often enough to be useful.
|
| > old code is proven safe
|
| Old code is _assumed to be_ safe _due to luck_ , actually.
| "Prove" has a specific meaning (especially on a post for a
| paper about proving things), and the overwhelming majority of C
| code is not proven to any rigorous mathematical standard. In
| contrast, the Rust type system has been mathematically proven
| to be correct.
|
| > A full on translator might be an ideal option.
|
| It depends on what you're willing to give up. If you don't mind
| losing performance, limiting your domain of inputs or range of
| outputs, giving up code legibility, and so on, then sure, this
| can probably be done to some extent. But when you start wanting
| your translator to be both sound and complete over all of these
| concerns, you run into problems.
| uecker wrote:
| > No, some C programs cannot be made safe. This can be due to
| dependency on undefined or unspecified behaviors, or it can
| be because introducing proper safety checks would limit the
| domain of possible inputs too much to be useful, or other
| things.
|
| You can certainly replace code using undefined behavior in C
| code by using defined constructs.
|
| > I don't think this exists, as the numerous critical
| vulnerabilities over the years have shown. All we have is C
| that seems to work pretty well often enough to be useful.
|
| I think this highly misleading. Some of the most reliable
| programs I know are written in C and Rust projects will also
| have critical vulnerabilities. Most vulnerabilities are not
| actually related to memory safety and the use of unsafe Rust
| will also lead to memory safety issues in Rust code. So I see
| some advantage to Rust but to me it is obviously overhyped.
| _flux wrote:
| > In contrast, the Rust type system has been mathematically
| proven to be correct.
|
| Is this the case? E.g. the issue "Prove the Rust type system
| sound" https://github.com/rust-lang/rust/issues/9883 is
| closed with comment "This will be an open issue forever.
| Closing." in 2016: https://github.com/rust-
| lang/rust/issues/9883#issuecomment-2... .
|
| At least nowadays (since 2022) we do have a language
| specification for Rust: https://ferrous-systems.com/blog/the-
| ferrocene-language-spec...
| aw1621107 wrote:
| The closest thing is probably RustBelt [0], which proved
| the soundness of a subset of Rust that included
| borrowing/lifetimes. This was later extended to include
| relaxed memory accesses [1].
|
| Neither of these papers include the trait system,
| unfortunately, and I'm not aware of another line of
| research that does (yet?).
|
| [0]: https://dl.acm.org/doi/10.1145/3158154
|
| [1]: https://plv.mpi-sws.org/rustbelt/rbrlx/paper.pdf
| nickpsecurity wrote:
| There's tools that prove the safety of C code. They include
| RV-Match, Astree Analzer, and Frama-C. CompCert certifies the
| compilation. There's tons of tools to find bugs by model
| checking, test generation, etc. Rust was nowhere near C last
| I checked in terms of tooling available.
|
| So, back to my original comment, the investment in making a
| Rust to C transpiler would make a C to safer C transpiler
| that fed that code into the verification tooling. A small
| transpiler would be easier to write with huge, immediate
| gains.
|
| On field proven, there's two angles: spec and code. Many
| C/C++ apps didn't become usable at all until years of
| iteration based on feedback from field use. An alternative in
| Rust might have incorrect specs which. From there, a number
| have had much code review, static analysis, and testing. They
| have few to no known bugs at any point. So, there are
| definitely C/C++ applications out there that are field-proven
| with a decent level of safety.
|
| (Note: Any transpiler might need to be bug for bug and weird
| behavior for weird behavior compatible to match the implicit
| spec in the C/C++ code.)
|
| You're right about the average Rust vs C code, like type
| system vs correctness. I'd prefer new code be written in
| safer languages, which I added to my proposal.
|
| If apples to apples on correctness, you'd have to compare C
| written for verification with Rust. Quite a few C projects
| have been proven to be correct. Since you're using a
| compiler, I'll add that tools like Softbound+CETS make C code
| safe automatically.
|
| I do think Rust brings the cost and effort down quite a lot,
| though. It's tooling is usually more mature or at least free
| than most for verifying C.
| zoom6628 wrote:
| I wonder how this compares to the zig-to-C translate function.
|
| Zig seems to be awesome at creating mixed environs of zig for new
| code and C for old, and translating or interop, plus being a C
| compiler.
|
| There must be some very good reasons why Linux kernel maintainers
| aren't looking to zig as a C replacement rather than Rust.
|
| I don't know enough to even speculate so would appreciate those
| with more knowledge and experiencing weighing in.
| capitol_ wrote:
| Maybe because zig isn't memory safe.
| ChristianJacobs wrote:
| > looking to zig as a C replacement rather than Rust
|
| Rust isn't a "replacement for C", but an addition to it. It's a
| tool that Torvalds et. al. has recognised the value of and thus
| it's been allowed in the kernel. The majority of the kernel
| code will still be written in C.
|
| I'm no kernel maintainer, but I can speculate that two of the
| main reasons for Rust over Zig are the compile time guarantees
| that the language provides being better as well as the rate of
| adoption. There is a lot of work done by many leading companies
| in the industry to provide Rust native code or maintained Rust
| bindings for their APIs. Windows devs are re-writing parts of
| _their_ kernel in Rust. There's a "movement" going on that has
| been going on for a while. I only hope it doesn't stop.
|
| Maybe the maintainers feel like Zig doesn't give them enough
| over C to be worth the change? Many of them are still opposed
| to Rust as well.
| DonaldPShimoda wrote:
| > Rust isn't a "replacement for C"
|
| Hmm I think to clarify I would say that Rust _is_ intended as
| a replacement for C in general, but that this isn't how the
| Linux kernel developers are choosing to use it.
|
| As for why the kernel developers would choose Rust, I would
| think another one of the primary benefits is that the type
| system guarantees the absence of a wide class of memory-
| related errors that are prevalent in C, and this type system
| (as well as those of its predecessors) has been subjected to
| significant scrutiny by the academic community over the last
| couple of decades to help iron out problems. I suspect this
| is also part of why Rust has a relatively large and
| passionate community compared to other C alternatives.
| burakemir wrote:
| Agreed. The large and passionate community may have
| multiple factors but "things actually work" is probably a
| factor.
|
| It is hard to get a full picture of how academic research
| influenced Rust and vice versa. Two examples:
|
| - The use of linearity for tracking ownership in types has
| been known to academics but had never found its way into a
| mainstream language.
|
| - researchers in programming language semantics pick Rust
| as a target of formalization, which was only possible
| because of design choices around type system. They were
| able to apply techniques that resulted from decades of
| trying to get a certified C. They have formalized parts of
| the standard library, including unsafe Rust, and found and
| fixed bugs.
|
| So it seems fair to say that academic research on safety
| for C has contributed much to what makes Rust work today,
| and in ways that are not possible for C and C++ because
| these languages do not offer static guarantees where types
| Transport information about exclusive access to some part
| of memory.
| josefx wrote:
| > It's a tool that Torvalds et. al. has recognised the value
| of and thus it's been allowed in the kernel.
|
| Has there actually been a successfull contribution to the
| mainline kernel? The last two big projects I heard of (ext2 /
| Apple drivers) seemed to have issues getting their code
| accepted.
| shakna wrote:
| rnull is in the kernel. And I believe one of the generic
| Realtek drivers is Rust as well.
| spiffyk wrote:
| Zig is nowhere near mature enough to be considered for the
| kernel yet. There are breaking changes to it regularly still -
| which is a good thing for Zig now, but not so good for huge,
| long-lived codebases like Linux. Also compiler bugs happen.
|
| Saying this as someone who generally likes Zig's direction.
| 3836293648 wrote:
| Zig isn't 1.0 and has zero backcompat guarantees. It's also
| barely used anywhere and hasn't proven its value, even if parts
| of it may seem promising
| devjab wrote:
| As I understand it most kernel maintainers aren't looking to
| replace C with anything.
|
| Zig has much better interoperability with C than Rust, but it's
| not memory safe or stable. I think we'll see quite a lot of Zig
| adoption in the C world, but I don't think it's in direct
| competition with Rust as such. In my region of the world nobody
| is adopting Rust, the C++ people are remaining in C++. There
| was some interest in Rust originally but it never really caught
| on in any company I know of. Likely for the same reason Go has
| become huge in younger companies but will not really make its
| way into companies which are traditionally Java/C# because even
| if it made sense technically (and it probably doesn't) it's a
| huge change management task. Zig is seeing traction for
| programs without the need for dynamic memory allocation, but
| not much beyond that.
| Ar-Curunir wrote:
| Rust is stable and used by a number of big players, while Zig
| is not stable, and as a result hasn't seen widespread adoption
| yet
| mbana wrote:
| Can something like `C2Rust` then use this to generate formally
| correct code?
|
| Also, is much of the authors did manual or was it run through
| something to produce the Rust code? If so, where is the code that
| generates Rust, I do not see any links to any source repos.
| zozbot234 wrote:
| > If so, where is the code that generates Rust, I do not see
| any links to any source repos.
|
| The paper states that these developments will be released under
| open source licenses after the review process is completed,
| i.e. most likely, after the paper is formally published.
| amenghra wrote:
| In 2002, a group of researchers presented a paper on Cyclone, a
| safe dialect of C [1]. While (manually) porting code from C to
| Cyclone, they found safety bugs in the C code.
|
| These kinds of manual or automated conversation from C to <safer
| language> therefore have potential not only for increasing
| adoption of safer languages but also for uncovering existing
| bugs.
|
| [1] https://www.researchgate.net/profile/James-
| Cheney-2/publicat...
| alkonaut wrote:
| If you used a naive translation to Rust, wouldn't you get parts
| that are safe and parts that are unsafe? So your manual job would
| need to be only verifying safety in the unsafe regions (same as
| when writing rust to begin with)?
|
| Seems it would be a win even if the unsafe portion is quite
| large. Obviously not of it's 90% of the end result.
| CodesInChaos wrote:
| A naive translation would produce rust code which is almost
| entirely unsafe (using raw pointers instead of references
| everywhere). Translating to references is difficult, since C
| code isn't written with the restrictions of the Rust alias
| model / borrow-checker in mind.
| p0w3n3d wrote:
| I wonder, if a C library is working (i.e. is not formally proven
| to be not having problems, but works in most ways) why shouldn't
| we translate it using rust unsafe? I would say there is a value
| in it as rust lacks of libraries generally. And this would not be
| different from using a dll/so that was written in c and can be
| unsafe in some circumstances after all
| dmezzetti wrote:
| Interesting concept. But for a working system in C, why do we
| need to "convert" it to Rust. Seems like an effort where juice
| isn't worth the squeeze. Probably will create more problems than
| we're fixing.
| kelnos wrote:
| Ugh. They didn't compile any C to Rust. They modified the F*-to-C
| compiler to emit Rust instead. So they compiled F* to safe Rust.
| And they couldn't even do that 100% reliably; some valid F*
| constructs couldn't be translated into Rust properly. They could
| either translate it into Rust code that wouldn't compile, or
| translate it into similar-looking Rust code that would compile,
| but would produce incorrect results.
|
| Flagged, this is just a lie of a title.
| pizza234 wrote:
| I've ported some projects to Rust (including C, where I've used
| C2Rust as first step), and I've drawn some conclusions.
|
| 1. Converting a C program to Rust, even if it includes unsafe
| code, often uncovers bugs quickly thanks to Rust's stringent
| constraints (bounds checking, strict signatures, etc.).
|
| 2. automated C to Rust conversion is IMO something that will
| never be solved entirely, because the design of C program is
| fundamentally different from Rust; such conversions require a
| significant redesign to be made safe (of course, not all C
| programs are the same).
|
| 3. in some cases, it's plain impossible to port a program from C
| to Rust while preserving the exact semantics, because unsafety
| can be inherent in the design.
|
| That said, tooling is essential to porting, and as tools continue
| to evolve, the process will become more streamlined.
___________________________________________________________________
(page generated 2024-12-21 18:00 UTC)