[HN Gopher] Porting Rust's Std to Rustix
       ___________________________________________________________________
        
       Porting Rust's Std to Rustix
        
       Author : jpgvm
       Score  : 204 points
       Date   : 2022-01-04 10:36 UTC (12 hours ago)
        
 (HTM) web link (blog.sunfishcode.online)
 (TXT) w3m dump (blog.sunfishcode.online)
        
       | codeflo wrote:
       | As a believer in removing unnecessary layers of abstraction, this
       | looks interesting. Is there a technical reason Rustix doesn't
       | support Windows, or is that just not a priority at the moment?
       | 
       | (For a bit of background info: It might seem counterintuitive,
       | but on Windows, a "native" application "should" actually call the
       | programming language agnostic Kernel32.dll functions directly. At
       | least that's the documented, stable way of doing things. Instead,
       | Rust's std currently goes through libc, which is also fine, but
       | on Windows is an abstraction built on top of the Windows APIs,
       | and is historically a lot less stable. This can cause code bloat
       | and distribution hassles.
       | 
       | This is a bit different from Linux, where the abstractions are
       | layered the other way round: the OS (POSIX) spec mostly assumes
       | an existing libc. As a result, on Linux, going libc-less is a bit
       | harder in my limited experience -- though certainly possible if
       | you know what you're doing.)
        
         | sanxiyn wrote:
         | > Instead, Rust's std currently goes through libc
         | 
         | Are you sure about this? Rust's std's File::open calls
         | CreateFile on Windows, not libc open. Isn't CreateFile correct
         | Kernel32.dll API to call?
         | 
         | https://github.com/rust-lang/rust/blob/master/library/std/sr...
        
           | codeflo wrote:
           | It is. I haven't checked the complete source code. What I
           | know for sure is that a program using Rust's std still
           | requires the C runtime to link and run, and from that
           | perspective, it doesn't really matter if some or even many
           | parts of the std don't actually use it.
        
             | [deleted]
        
             | sanxiyn wrote:
             | I see. I wonder where Rust's std is using libc on Windows.
             | I know for a fact filesystem portion of std doesn't call
             | libc at all on Windows.
        
               | ChrisSD wrote:
               | Rust needs the C runtime for startup/shutdown code (and
               | vcruntime for panics). It also needs basic memory
               | functions such as memcpy and stack probes. There is no
               | way round this except rewriting those in Rust.
        
               | sanxiyn wrote:
               | Rust already ships with its own memcpy, so it should be
               | possible: https://github.com/rust-lang/compiler-
               | builtins/blob/master/s...
        
               | ectopod wrote:
               | The startup/shutdown code is for the C runtime. If you
               | avoid the C runtime you don't need it.
        
               | ChrisSD wrote:
               | At a minimum Rust would have to run the C initializers as
               | these are used even in pure Rust code. It might also make
               | use of the security cookie but I'm uncertain about that.
               | Also the C runtime is needed by the SEH handling code (in
               | vcruntime) so Rust would have to replace that before it
               | can replace the C startup/shutdown.
               | 
               | To be clear, this is feasible but it'll require someone
               | knowledgable to put in the work of rewriting this in
               | Rust.
        
               | steveklabnik wrote:
               | https://crates.io/crates/r0 exists (not everything you're
               | talking about but like, there's some of this stuff
               | around, in some contexts. We'll see if the whole pile of
               | stuff needed ever gets ported or not, I'm guessing yes
               | but on a long timeframe.)
        
         | samhw wrote:
         | Is there a reason "native" and "should" are in scare quotes? I
         | think "should" is pretty well defined - after all, we've got an
         | RFC for it!
         | 
         | Edit: I'm not altogether sure why this is being downvoted, but
         | in case it wasn't clear: this is a sincere question and not
         | some kind of pedantic rhetorical question. (I don't really mind
         | whether people upvote or downvote the comment, except to the
         | extent that it's a signal that I perhaps wasn't clear in what I
         | was asking.)
        
           | codeflo wrote:
           | I didn't downvote, in fact, I'm a bit amused because the
           | reason I used quotes is exactly because I'm not using any
           | official definition! There's no consensus definition of
           | "native", and I'm using "should" purely in the colloquial
           | sense of "something I'd like to see".
        
           | CodesInChaos wrote:
           | There isn't anything objectively wrong with using libc as an
           | abstraction on Windows, especially since Win10 ships the
           | "Universal C Runtime" out of the box (earlier you had to
           | deploy a separate crt for each Visual Studio version).
           | 
           | Personally I'm also in favour of cutting out libc and
           | invoking the windows API directly. But others might argue
           | that using it as a shared abstraction that's very similar on
           | Windows, Linux, OSX, BSD, is worth it.
           | 
           | And "native" is ill defined. It's just one more abstraction
           | layer in the libc -> Win32-API -> NT-API -> Kernel chain.
           | Libc isn't comparable to a java runtime or electron, which is
           | what people usually mean when they talk about applications
           | not being "native".
        
             | pjmlp wrote:
             | Kind of, it is still a language runtime specially on non-
             | UNIX OSes.
             | 
             | The recent thread about cutting down the startup of C
             | applications on Linux proves the point it isn't a zero cost
             | library.
        
         | ylyn wrote:
         | > This is a bit different from Linux, where the abstractions
         | are layered the other way round: the OS (POSIX) spec mostly
         | assumes an existing libc.
         | 
         | FWIW the POSIX spec assumes libc, but on Linux specifically the
         | syscall interface is stable.
        
           | sanxiyn wrote:
           | Note that macOS syscall interface is specifically unstable so
           | Rust needs POSIX/libc path anyway.
        
             | masklinn wrote:
             | Also on Solaris/Illumos/SmartOS/..., and on OpenBSD (where
             | it's more or less verboten entirely since system-call-
             | origin verification).
        
           | codeflo wrote:
           | Thanks, that's a great correction/clarification.
        
       | samhw wrote:
       | This looks fantastic. In fact, in the best possible way, it looks
       | _boring_. I 'm amazed that this wasn't always how it was
       | implemented. I hope this can be upstreamed into std.
        
         | sanxiyn wrote:
         | It was mostly to save time. You need libc path to support macOS
         | anyway, and by using libc you can share most code between macOS
         | and Linux (and BSDs). Once that is done, I think Linux-only
         | syscall path was not justified then.
         | 
         | Now I think Rust has enough resource to try this.
        
           | samhw wrote:
           | Interesting, thanks for the detail! That makes perfect sense
           | as a rationale. I'm glad the Rust team now has enough time to
           | refine these things - thanks for all your work :)
        
       | algesten wrote:
       | If this was merged into Rust, I assume it means the use of
       | `unsafe` in std would be "pushed down" one level into Rustix
       | instead.
       | 
       | Wonder if that means it would be easier to verify the correctness
       | of `unsafe` use inside Rust itself?
        
         | ansible wrote:
         | The goal is to get this is merged _into_ Rust 's std library,
         | they wouldn't write std on top of Rustix.
         | 
         | > _Wonder if that means it would be easier to verify the
         | correctness of `unsafe` use inside Rust itself?_
         | 
         | If you mean usage of unsafe inside std (which the Rust compiler
         | does depend upon), that is one of the explicit goals of Rustix.
         | The usage of unsafe will be much more narrow overall, mostly
         | surrounding the system calls themselves.
        
           | Arnavion wrote:
           | The port of libstd linked in the article still uses rustix as
           | a separate crate.
           | 
           | In any case, I hope it remains separate. Third-party no_std
           | code would benefit from it.
        
             | ansible wrote:
             | I am under the impression that no-std code is primarily the
             | concern of embedded systems that might not have much of an
             | OS [1] at all, nevermind a suite of system calls to Linux.
             | 
             | [1] Like a typical RTOS, that may provide some basic
             | communication primitives, thread creation, and a hardware
             | abstraction layer (HAL).
        
               | roblabla wrote:
               | That is nor necessarily the case. Embedded is a big user,
               | but there are other reasons to use no_std, such as
               | explicitly wanting to avoid heap allocations, or wanting
               | greater control over the native APIs being called.
        
           | algesten wrote:
           | Yeah, that's what I mean. Sounds good!
        
       | Icathian wrote:
       | This looks like such a fascinating project. I would love to
       | contribute to this sort of low-level work, but frankly I don't
       | know enough yet. I'm going to have to dig into that repo a bunch.
       | 
       | Anyone have any personal favorite books or articles that would be
       | good for getting up to speed on this kind of thing?
        
         | antonok wrote:
         | I recommend checking out this issue [1] if you want to get your
         | feet wet. Attempting to build a new Rust program and
         | documenting what prevented it from working is a great way to
         | understand how everything is implemented. Pretty satisfying if
         | you can get a program working, too.
         | 
         | [1] https://github.com/sunfishcode/mustang/issues/22
        
           | Icathian wrote:
           | This is an excellent place to start. Thank you!
        
         | sanxiyn wrote:
         | For this project, you'd want The Linux Programming Interface.
         | Information here: https://man7.org/tlpi/
        
           | Icathian wrote:
           | I have a copy I grabbed for an OS class. I should take that
           | back out. Much appreciated!
        
       | styfle wrote:
       | Does this mean we could have a single binary for x86_64 linux
       | instead of two (glibc and musl)?
       | 
       | x86_64-unknown-linux-gnu
       | 
       | x86_64-unknown-linux-musl
        
         | CodesInChaos wrote:
         | I expect 4 options:
         | 
         | * glibc linked dynamically
         | 
         | * musl linked dynamically
         | 
         | * musl linked statically
         | 
         | * direct syscalls from rust
         | 
         | and possibly additional options choosing if math functions,
         | memcpy, etc. should use an implementation shipped with rust or
         | one shipped with libc.
         | 
         | The big question is if we'll get to a point where almost all
         | rust application don't use a libc at all.
        
           | masklinn wrote:
           | > * musl linked dynamically
           | 
           | > * musl linked statically
           | 
           | This is handled through the orthogonal "crt-static" target-
           | feature.
           | 
           | IIRC each supported CRT has a preference, but for some that
           | can be overridden (musl would be one of those, possibly one
           | of few of those since statically linking glibc is generally
           | recommended against, and so's pretty much every non-linux
           | libc, when that's an option at all).
        
         | sanxiyn wrote:
         | No, it means there will be three binaries instead of two.
         | 
         | Note that statically linked musl binaries run fine on glibc
         | Linux, so you can just distribute musl binaries. That works
         | right now. In the future, you will just distribute Rustix
         | binaries, which will be hopefully smaller than musl binaries
         | (because it doesn't need to support C API).
        
       | heythere22 wrote:
       | Just a quick reminder, using syscalls without libc is something
       | that will cause trouble with OpenBSD. From [1]: "The eventual
       | goal would be to disallow system calls from anywhere but the
       | region mapped for libc"
       | 
       | [1] https://lwn.net/Articles/806776/
        
         | colonwqbang wrote:
         | I think Linux is the about the only system where this makes
         | sense. I don't know of another system where the syscall ABI is
         | rigidly defined and stable like it is on Linux. Om most other
         | systems the syscall ABI tends to be "call this C function in
         | that shared library".
         | 
         | However, most computers in the world run Linux nowadays.
        
           | ink_13 wrote:
           | However good you think Linux is on this file, the BSDs,
           | particularly OpenBSD, are better.
        
             | FpUser wrote:
             | There is a saying: the best camera is the one you have with
             | you when needed.
             | 
             | For programming - whatever you use and are comfortable with
             | is the best. One does not need to concern what others like
             | / use as long as it does not impede one's own work.
        
             | WJW wrote:
             | That is irrelevant in the context of the `linux-raw`
             | backend of this project, which does not target the BSDs.
        
       | staticassertion wrote:
       | > This project promotes several other goals as well, such as
       | promoting I/O safety concepts and APIs, helping test some of the
       | infrastructure used by cap-std, and helping set the stage for
       | future projects related to sandboxing, WASI, nameless, and other
       | areas.
       | 
       | I'd be interested in hearing more about this. I have a lot of
       | thoughts about sandboxing and Rust, including both build and
       | runtime sandboxing. Would be cool to understand if others are
       | working on this and chat about it. I'm familiar with cap-std, but
       | curious about any other initiatives or where the discussions are
       | happening.
        
       | asplake wrote:
       | Possibly dumb q, but potential useful to other languages?
        
         | kibwen wrote:
         | It looks like the goal of this project is to allow Rust code to
         | avoid having to use FFI to call C code. In order to be useful
         | to non-Rust languages, this project would either have to expose
         | a C-compatible interface for FFI (which somewhat defeats the
         | point of trying to avoid C), or else the other language would
         | need to natively support Rust FFI (which would be a large
         | amount of ongoing labor, since Rust doesn't have a stable ABI).
        
         | clhodapp wrote:
         | I believe that it _would_ open the door to writing a Linux-only
         | libc in Rust, which could be cool
        
           | nicoburns wrote:
           | There is in fact already a library that does that
           | https://github.com/redox-os/relibc
        
       ___________________________________________________________________
       (page generated 2022-01-04 23:01 UTC)