[HN Gopher] mrustc: In-progress alternative Rust compiler (to C)
       ___________________________________________________________________
        
       mrustc: In-progress alternative Rust compiler (to C)
        
       Author : my123
       Score  : 59 points
       Date   : 2021-02-28 19:35 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xiphias2 wrote:
       | It's strange for me why anybody would reimplement a Rust compiler
       | in C++ instead of Rust, but people are free to do what they want
       | :)
        
         | monocasa wrote:
         | Part of the point is allowing bootstrapping a rust toolchain
         | from source on systems that don't have rust yet.
        
           | dataflow wrote:
           | Can't you just cross compile? Wouldn't you have to do that
           | anyway for some parts of the system? What's the point of
           | bootstrapping?
        
             | monocasa wrote:
             | When I say "systems", I should have been a little more
             | specific.
             | 
             | There's a bunch of corporate/.gov envs that don't allow
             | binaries to enter their system. All new code has to be
             | compiled from source, on the target system (or at least on
             | that side of the process firewall, it might be an airgapped
             | cluster that can share code). They have C and C++ compilers
             | that they've been compiling from source since the dawn of
             | time, but they don't have a rust compiler. This gives them
             | a mechanism to begin using rust within their process.
        
               | josephg wrote:
               | Couldn't they run mrustc on the rust source code on
               | another machine to generate C code for rustc, then copy
               | the C code to their internal systems and compile that
               | into a working rust compiler?
               | 
               | Or would that not count as "compiling from source"?
        
               | danappelxx wrote:
               | I believe that's exactly what they were suggesting :)
        
               | xiphias2 wrote:
               | I see, thanks for the explanation, now the project makes
               | a lot of sense :)
               | 
               | I guess you didn't want to advertise the real reason of
               | the project.
        
               | Blikkentrekker wrote:
               | How did they get the original _C_ compiler there then
               | without allowing binaries? Did they originally write one
               | in assembler?
               | 
               | I remember reading an interesting thought experiment that
               | traced history that investigated what would happen if
               | Dennis Ritchie had put malicious code in the first _C_
               | compiler that was designed to detect whether the compiler
               | compiled a compiler, and then copied the malicious code
               | into it.
               | 
               | It concluded that tracing the history, that if this were
               | to have happened, then _GCC_ and _Clang_ and many other
               | programming languages would have said malicious code in
               | their compilers that do not show up in the source code
               | with none the wiser.
        
               | monocasa wrote:
               | Epoch creation is covered by a totally different set of
               | process rules, for the reasons you've stated. These rules
               | are mainly intended to make it much harder for someone to
               | gain access to the system if they weren't there when the
               | system initially was blocked off from the world.
        
               | dataflow wrote:
               | Wow I see. Could they not use the initial Rust compilers
               | (and however those were bootstrapped? I have no idea how)
               | to get a Rust compiler going? I'm guessing it's just too
               | painful?
        
               | shakow wrote:
               | > I'm guessing it's just too painful?
               | 
               | Indeed. Although it is being worked on, the whole
               | bootstrap process of rustc is currently a major hassle,
               | requiring to start from the oldest CaML versions of the
               | compiler up to the most recent ones.
        
               | maccam94 wrote:
               | The earliest versions of the rust compiler were written
               | in Ocaml, which is also unlikely to have been built in a
               | trusted way by these organizations. mrustc exists to be a
               | way to bootstrap trusted builds of rustc.
        
               | spijdar wrote:
               | Bootstrapping rust is really, really painful as is. The
               | rust compiler itself tends to require a _recent_ version
               | of the rust compiler to work. So to bootstrap from the
               | original versions of rustc (which were written in OCaml)
               | you 'd have to incrementally step up really slowly.
               | 
               | Being able to target a pretty recent version of Rust with
               | a compiler written in C would be so, so useful for these
               | purposes.
        
               | Blikkentrekker wrote:
               | Certainly that could be automated quite easily.
               | 
               | I would assume that on such compilation farms that these
               | systems generally use, this could be done very quickly.
               | 
               | Perhaps there would be an interest for the _Rustc_ team
               | to provide a recursive automated setup that is capable of
               | compiling the latest _Rustc_ from _OCaml_ , and finally
               | from _C_ as well since one must follow a similar process
               | with _OCaml_.
        
               | LegionMammal978 wrote:
               | The automation for this is actually trickier than it
               | would first appear: especially with the earlier
               | snapshots, a number of patches are needed to compile the
               | LLVM interfaces written in C++. Also, due to linking
               | issues, some snapshots require a stage2 compiler, while
               | others can only use a stage1 compiler. I agree that it
               | would be trivial once you have the chain in place, but
               | that chain can only be built through trial and error.
        
               | Blikkentrekker wrote:
               | And no one has done this before?
               | 
               | Methinks that building this chain by trial and error is
               | rather trivial compared to the actual work in building a
               | compiler.
        
               | sanxiyn wrote:
               | We are discussing mrustc here. The reason it hasn't been
               | done yet is that nearly everyone interested in this
               | agrees mrustc is a better bootstrapping path than
               | starting from the last OCaml version.
        
               | tedunangst wrote:
               | And when you port to a new CPU architecture you're going
               | to backport those changes to 200 obsolete versions as
               | well?
        
               | monocasa wrote:
               | Yeah, rustc dogfoods new features, so it's _very_
               | painful. It'd probably be close to 100 or so compilers to
               | go back to the ocaml that it started from.
        
               | LegionMammal978 wrote:
               | I've actually been trying this for real. Right now, I'm
               | at the 82-compiler mark, including the ~6 LLVM versions
               | needed. For a sense of scale, the most recent compiler in
               | this chain is the 2011-12-07 snapshot, which is still
               | older than Rust 0.1 (2012-01-20). In most cases,
               | additional compilers are needed only to resolve syntax
               | errors, but there are occasional LLVM-related segfaults
               | that create most of my headaches.
        
               | wh33zle wrote:
               | Is your work publicly available somewhere?
        
               | LegionMammal978 wrote:
               | The scripts are still pretty messy right now. I'll
               | probably clean them up a bit and upload them once I reach
               | 0.1.
        
               | skybrian wrote:
               | But there are no binaries being copied. They allow hand-
               | written C to be copied, but not machine-generated C?
        
               | monocasa wrote:
               | Without mrustc you'd need to trust the rustc binary, or
               | build around a hundred versions of rust to go back to a
               | rustc that itself didn't depend on rust.
        
               | CameronNemo wrote:
               | Do all these entities operate independently? Is there no
               | internal web of trust they could use to distribute
               | rebuilt binaries?
        
               | monocasa wrote:
               | Yeah, they operate independently. No new binaries,
               | period.
        
               | __d wrote:
               | Which makes sense: compartmentalise. Otherwise, a failure
               | of one domain would compromise all the others.
               | 
               | I hope there's some automated assistance for scanning
               | inbound source code too. Imagine reviewing _everything_ ,
               | line by line, millions of them.
        
               | [deleted]
        
             | dan-robertson wrote:
             | One way to trust your computer system more is to have very
             | few binary dependencies. Ie the goal is to have some source
             | files and hopefully if you start with (eg) a trusted C
             | compiler and linker you can work your way up to the whole
             | system with deterministic results. This is supposed to be a
             | way to prove that the compiler wasn't doing anything funky
             | like injecting some vulnerability into the code it compiled
             | and injecting that vulnerability injection into any
             | compiler it compiled.
        
           | rowanG077 wrote:
           | I thought this was already possible. The first Rust compiler
           | is written in OCaml. It should be possible to do
           | bootstrapping with that.
        
             | axelf4 wrote:
             | Of course, but AFAIU this massively shortens the chain.
             | Newer versions of rustc up the minimum compiler version
             | required for building all the time.
        
             | faho wrote:
             | Unfortunately rustc uses rust features almost immediately
             | (I think even features introduced in the immediately
             | preceding version?), so you get a bootstrap chain, and if
             | you start with the compiler written in OCaml you now need
             | to build every single rustc version ever released.
             | 
             | This takes very long and is error-prone, so having a
             | compiler that can build even an intermediate version is
             | already a big help.
             | 
             | See e.g. https://guix.gnu.org/blog/2018/bootstrapping-rust/
        
               | kiwidrew wrote:
               | It's unfortunate that Rust has adopted development
               | policies that are so hostile towards bootstrapping and
               | cross-compiling rustc. The fact that most of the people
               | cheerleading for Rust the ecosystem don't even see this
               | as a problem is enough to convince me to steer clear of
               | anything involving Rust the language.
               | 
               | You don't have a real language until there is a stable
               | specification and multiple independent implementations.
               | Until then it's just an experimental toy.
        
               | monocasa wrote:
               | You can't build GCC on MSVC or vis versa.
               | 
               | And people do see the pain here as a problem, that's why
               | the project in this very thread exists. I think most in
               | the Rust community see mrustc and the pressure that an
               | independent implementation brings process wise as a very
               | good thing for the language.
        
         | pjmlp wrote:
         | You mean just like LLVM?
        
       | coolreader18 wrote:
       | Oh, nice! I didn't realize that development was still happening
       | for that, I thought it had stopped around Rust 1.20. Glad to see
       | that it's still being developed, stuff like this will definitely
       | be important for bootstrapping/"trusting trust".
        
       | lmkg wrote:
       | The interesting thing to me is that this does _not_ fully valid
       | Rust code, e.g. not checking for borrow safety. Which would
       | normally make no damn sense for a Rust compiler, but actually
       | makes sense for what _this particular_ project is trying to
       | achieve.
       | 
       | This is not a compiler for _developing_ Rust programs. This is a
       | compiler to let platforms unsupported by Rust /LLVM build rust
       | programs from source. So it takes a Rust program which already
       | exists, _assumes_ it can be compiled by rustc, and translates it
       | to C so it can be handed off to your local compiler. Skipping the
       | burrow checker probably lets this execute in environments which
       | are more constrained than what rustc can fit into.
       | 
       | Of course there's a question of how much of Rust's safety survive
       | translation into C. Certainly some. E.g. the emitted program
       | would include bounds checks where not specifically elided in
       | Rust, and references are statically guaranteed to be non-null
       | before being lowered into pointers. But there could be gaps,
       | especially in cases where Rust semantics mismatch with C in ways
       | that LLVM hasn't exposed yet.
        
         | rmdashrfstar wrote:
         | > But there could be gaps, especially in cases where Rust
         | semantics mismatch with C in ways that LLVM hasn't exposed yet.
         | 
         | Could you elaborate on what potential mismatches there
         | currently are between the semantics of Rust and C? Where
         | particularly do you think there might be issues/dragons hiding
         | in the semantics of a translation?
        
           | gpm wrote:
           | Integer overflows (signed overflows defined to wrap or panic
           | in rust, are undefined behavior in C).
           | 
           | Float integer conversions (overflows are defined to saturate
           | in rust, are undefined behavior in C).
           | 
           | Maybe things like how pointer casts are treated in correct
           | (but unsafe) rust vs C. Or similarly in transmute. Generally
           | it wouldn't surprise me if rust was standardizing a memory
           | model that was subtly different from C.
           | 
           | I'm sure the list goes on, but "that sort of thing".
        
             | tedunangst wrote:
             | Pretty sure rust wraps on overflow, not saturates, but it's
             | not hard to get that behavior from most C compilers if
             | desired.
        
           | [deleted]
        
         | Blikkentrekker wrote:
         | > _This is not a compiler for developing Rust programs. This is
         | a compiler to let platforms unsupported by Rust /LLVM build
         | rust programs from source. So it takes a Rust program which
         | already exists, assumes it can be compiled by rustc, and
         | translates it to C so it can be handed off to your local
         | compiler. Skipping the burrow checker probably lets this
         | execute in environments which are more constrained than what
         | rustc can fit into._
         | 
         | I would assume that borrow checking uses far less memory than
         | the actual compilation after the compiler has proven the
         | program to be in concordance with the borrow rules.
         | 
         | It more so seems that it is something they did not bother to
         | put too much effort into, as it wasn't necessary, than a
         | legitimate way to reduce a compiler's memory footprint
        
         | ris wrote:
         | To me the question it raises is - does the rust borrow checker
         | need to be part of the compiler or could it be separated into a
         | standalone component? Is the borrow checker architecture-
         | agnostic?
        
           | twic wrote:
           | It is architecture-agnostic.
           | 
           | As currently implemented in rustc, it operates on an
           | intermediate representation that has already had name
           | resolution and some simplification done. As such, i think it
           | would be rather hard to separate from the rest of the
           | compiler.
        
           | viraptor wrote:
           | For the first one, definitely could be split. You can already
           | run that part without code generation - https://doc.rust-
           | lang.org/cargo/commands/cargo-check.html
        
       ___________________________________________________________________
       (page generated 2021-02-28 23:01 UTC)