[HN Gopher] mrustc: In-progress alternative Rust compiler (to C)
___________________________________________________________________
mrustc: In-progress alternative Rust compiler (to C)
Author : my123
Score : 59 points
Date : 2021-02-28 19:35 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xiphias2 wrote:
| It's strange for me why anybody would reimplement a Rust compiler
| in C++ instead of Rust, but people are free to do what they want
| :)
| monocasa wrote:
| Part of the point is allowing bootstrapping a rust toolchain
| from source on systems that don't have rust yet.
| dataflow wrote:
| Can't you just cross compile? Wouldn't you have to do that
| anyway for some parts of the system? What's the point of
| bootstrapping?
| monocasa wrote:
| When I say "systems", I should have been a little more
| specific.
|
| There's a bunch of corporate/.gov envs that don't allow
| binaries to enter their system. All new code has to be
| compiled from source, on the target system (or at least on
| that side of the process firewall, it might be an airgapped
| cluster that can share code). They have C and C++ compilers
| that they've been compiling from source since the dawn of
| time, but they don't have a rust compiler. This gives them
| a mechanism to begin using rust within their process.
| josephg wrote:
| Couldn't they run mrustc on the rust source code on
| another machine to generate C code for rustc, then copy
| the C code to their internal systems and compile that
| into a working rust compiler?
|
| Or would that not count as "compiling from source"?
| danappelxx wrote:
| I believe that's exactly what they were suggesting :)
| xiphias2 wrote:
| I see, thanks for the explanation, now the project makes
| a lot of sense :)
|
| I guess you didn't want to advertise the real reason of
| the project.
| Blikkentrekker wrote:
| How did they get the original _C_ compiler there then
| without allowing binaries? Did they originally write one
| in assembler?
|
| I remember reading an interesting thought experiment that
| traced history that investigated what would happen if
| Dennis Ritchie had put malicious code in the first _C_
| compiler that was designed to detect whether the compiler
| compiled a compiler, and then copied the malicious code
| into it.
|
| It concluded that tracing the history, that if this were
| to have happened, then _GCC_ and _Clang_ and many other
| programming languages would have said malicious code in
| their compilers that do not show up in the source code
| with none the wiser.
| monocasa wrote:
| Epoch creation is covered by a totally different set of
| process rules, for the reasons you've stated. These rules
| are mainly intended to make it much harder for someone to
| gain access to the system if they weren't there when the
| system initially was blocked off from the world.
| dataflow wrote:
| Wow I see. Could they not use the initial Rust compilers
| (and however those were bootstrapped? I have no idea how)
| to get a Rust compiler going? I'm guessing it's just too
| painful?
| shakow wrote:
| > I'm guessing it's just too painful?
|
| Indeed. Although it is being worked on, the whole
| bootstrap process of rustc is currently a major hassle,
| requiring to start from the oldest CaML versions of the
| compiler up to the most recent ones.
| maccam94 wrote:
| The earliest versions of the rust compiler were written
| in Ocaml, which is also unlikely to have been built in a
| trusted way by these organizations. mrustc exists to be a
| way to bootstrap trusted builds of rustc.
| spijdar wrote:
| Bootstrapping rust is really, really painful as is. The
| rust compiler itself tends to require a _recent_ version
| of the rust compiler to work. So to bootstrap from the
| original versions of rustc (which were written in OCaml)
| you 'd have to incrementally step up really slowly.
|
| Being able to target a pretty recent version of Rust with
| a compiler written in C would be so, so useful for these
| purposes.
| Blikkentrekker wrote:
| Certainly that could be automated quite easily.
|
| I would assume that on such compilation farms that these
| systems generally use, this could be done very quickly.
|
| Perhaps there would be an interest for the _Rustc_ team
| to provide a recursive automated setup that is capable of
| compiling the latest _Rustc_ from _OCaml_ , and finally
| from _C_ as well since one must follow a similar process
| with _OCaml_.
| LegionMammal978 wrote:
| The automation for this is actually trickier than it
| would first appear: especially with the earlier
| snapshots, a number of patches are needed to compile the
| LLVM interfaces written in C++. Also, due to linking
| issues, some snapshots require a stage2 compiler, while
| others can only use a stage1 compiler. I agree that it
| would be trivial once you have the chain in place, but
| that chain can only be built through trial and error.
| Blikkentrekker wrote:
| And no one has done this before?
|
| Methinks that building this chain by trial and error is
| rather trivial compared to the actual work in building a
| compiler.
| sanxiyn wrote:
| We are discussing mrustc here. The reason it hasn't been
| done yet is that nearly everyone interested in this
| agrees mrustc is a better bootstrapping path than
| starting from the last OCaml version.
| tedunangst wrote:
| And when you port to a new CPU architecture you're going
| to backport those changes to 200 obsolete versions as
| well?
| monocasa wrote:
| Yeah, rustc dogfoods new features, so it's _very_
| painful. It'd probably be close to 100 or so compilers to
| go back to the ocaml that it started from.
| LegionMammal978 wrote:
| I've actually been trying this for real. Right now, I'm
| at the 82-compiler mark, including the ~6 LLVM versions
| needed. For a sense of scale, the most recent compiler in
| this chain is the 2011-12-07 snapshot, which is still
| older than Rust 0.1 (2012-01-20). In most cases,
| additional compilers are needed only to resolve syntax
| errors, but there are occasional LLVM-related segfaults
| that create most of my headaches.
| wh33zle wrote:
| Is your work publicly available somewhere?
| LegionMammal978 wrote:
| The scripts are still pretty messy right now. I'll
| probably clean them up a bit and upload them once I reach
| 0.1.
| skybrian wrote:
| But there are no binaries being copied. They allow hand-
| written C to be copied, but not machine-generated C?
| monocasa wrote:
| Without mrustc you'd need to trust the rustc binary, or
| build around a hundred versions of rust to go back to a
| rustc that itself didn't depend on rust.
| CameronNemo wrote:
| Do all these entities operate independently? Is there no
| internal web of trust they could use to distribute
| rebuilt binaries?
| monocasa wrote:
| Yeah, they operate independently. No new binaries,
| period.
| __d wrote:
| Which makes sense: compartmentalise. Otherwise, a failure
| of one domain would compromise all the others.
|
| I hope there's some automated assistance for scanning
| inbound source code too. Imagine reviewing _everything_ ,
| line by line, millions of them.
| [deleted]
| dan-robertson wrote:
| One way to trust your computer system more is to have very
| few binary dependencies. Ie the goal is to have some source
| files and hopefully if you start with (eg) a trusted C
| compiler and linker you can work your way up to the whole
| system with deterministic results. This is supposed to be a
| way to prove that the compiler wasn't doing anything funky
| like injecting some vulnerability into the code it compiled
| and injecting that vulnerability injection into any
| compiler it compiled.
| rowanG077 wrote:
| I thought this was already possible. The first Rust compiler
| is written in OCaml. It should be possible to do
| bootstrapping with that.
| axelf4 wrote:
| Of course, but AFAIU this massively shortens the chain.
| Newer versions of rustc up the minimum compiler version
| required for building all the time.
| faho wrote:
| Unfortunately rustc uses rust features almost immediately
| (I think even features introduced in the immediately
| preceding version?), so you get a bootstrap chain, and if
| you start with the compiler written in OCaml you now need
| to build every single rustc version ever released.
|
| This takes very long and is error-prone, so having a
| compiler that can build even an intermediate version is
| already a big help.
|
| See e.g. https://guix.gnu.org/blog/2018/bootstrapping-rust/
| kiwidrew wrote:
| It's unfortunate that Rust has adopted development
| policies that are so hostile towards bootstrapping and
| cross-compiling rustc. The fact that most of the people
| cheerleading for Rust the ecosystem don't even see this
| as a problem is enough to convince me to steer clear of
| anything involving Rust the language.
|
| You don't have a real language until there is a stable
| specification and multiple independent implementations.
| Until then it's just an experimental toy.
| monocasa wrote:
| You can't build GCC on MSVC or vis versa.
|
| And people do see the pain here as a problem, that's why
| the project in this very thread exists. I think most in
| the Rust community see mrustc and the pressure that an
| independent implementation brings process wise as a very
| good thing for the language.
| pjmlp wrote:
| You mean just like LLVM?
| coolreader18 wrote:
| Oh, nice! I didn't realize that development was still happening
| for that, I thought it had stopped around Rust 1.20. Glad to see
| that it's still being developed, stuff like this will definitely
| be important for bootstrapping/"trusting trust".
| lmkg wrote:
| The interesting thing to me is that this does _not_ fully valid
| Rust code, e.g. not checking for borrow safety. Which would
| normally make no damn sense for a Rust compiler, but actually
| makes sense for what _this particular_ project is trying to
| achieve.
|
| This is not a compiler for _developing_ Rust programs. This is a
| compiler to let platforms unsupported by Rust /LLVM build rust
| programs from source. So it takes a Rust program which already
| exists, _assumes_ it can be compiled by rustc, and translates it
| to C so it can be handed off to your local compiler. Skipping the
| burrow checker probably lets this execute in environments which
| are more constrained than what rustc can fit into.
|
| Of course there's a question of how much of Rust's safety survive
| translation into C. Certainly some. E.g. the emitted program
| would include bounds checks where not specifically elided in
| Rust, and references are statically guaranteed to be non-null
| before being lowered into pointers. But there could be gaps,
| especially in cases where Rust semantics mismatch with C in ways
| that LLVM hasn't exposed yet.
| rmdashrfstar wrote:
| > But there could be gaps, especially in cases where Rust
| semantics mismatch with C in ways that LLVM hasn't exposed yet.
|
| Could you elaborate on what potential mismatches there
| currently are between the semantics of Rust and C? Where
| particularly do you think there might be issues/dragons hiding
| in the semantics of a translation?
| gpm wrote:
| Integer overflows (signed overflows defined to wrap or panic
| in rust, are undefined behavior in C).
|
| Float integer conversions (overflows are defined to saturate
| in rust, are undefined behavior in C).
|
| Maybe things like how pointer casts are treated in correct
| (but unsafe) rust vs C. Or similarly in transmute. Generally
| it wouldn't surprise me if rust was standardizing a memory
| model that was subtly different from C.
|
| I'm sure the list goes on, but "that sort of thing".
| tedunangst wrote:
| Pretty sure rust wraps on overflow, not saturates, but it's
| not hard to get that behavior from most C compilers if
| desired.
| [deleted]
| Blikkentrekker wrote:
| > _This is not a compiler for developing Rust programs. This is
| a compiler to let platforms unsupported by Rust /LLVM build
| rust programs from source. So it takes a Rust program which
| already exists, assumes it can be compiled by rustc, and
| translates it to C so it can be handed off to your local
| compiler. Skipping the burrow checker probably lets this
| execute in environments which are more constrained than what
| rustc can fit into._
|
| I would assume that borrow checking uses far less memory than
| the actual compilation after the compiler has proven the
| program to be in concordance with the borrow rules.
|
| It more so seems that it is something they did not bother to
| put too much effort into, as it wasn't necessary, than a
| legitimate way to reduce a compiler's memory footprint
| ris wrote:
| To me the question it raises is - does the rust borrow checker
| need to be part of the compiler or could it be separated into a
| standalone component? Is the borrow checker architecture-
| agnostic?
| twic wrote:
| It is architecture-agnostic.
|
| As currently implemented in rustc, it operates on an
| intermediate representation that has already had name
| resolution and some simplification done. As such, i think it
| would be rather hard to separate from the rest of the
| compiler.
| viraptor wrote:
| For the first one, definitely could be split. You can already
| run that part without code generation - https://doc.rust-
| lang.org/cargo/commands/cargo-check.html
___________________________________________________________________
(page generated 2021-02-28 23:01 UTC)