[HN Gopher] A small Rust binary indeed (2022)
       ___________________________________________________________________
        
       A small Rust binary indeed (2022)
        
       Author : estebank
       Score  : 68 points
       Date   : 2024-02-18 17:32 UTC (5 hours ago)
        
 (HTM) web link (darkcoding.net)
 (TXT) w3m dump (darkcoding.net)
        
       | blovescoffee wrote:
       | Halfway through the article and we have
       | 
       | unsafe { asm!( "mov edi, 42", "mov eax, 60", "syscall",
       | options(nostack, noreturn) ) // nostack prevents `asm!` from
       | push/pop rax // noreturn prevents it putting a 'ret' at the end
       | // but it does put a ud2 (undefined instruction) instead }
       | 
       | and
       | 
       | > We will need to tell the C compiler that we're providing our
       | own entry point, telling it not to include it's own start files.
       | 
       | So it's a Rust program but it's just calling inline assembly and
       | using a C compiler?
        
         | remexre wrote:
         | Rust uses the C compiler as a linker, because this is often the
         | only way to ensure all the libraries needed by the system
         | toolchain are included. (Compare to the CCLD variable in
         | autotools -- it refers to the command to use the C compiler as
         | a linker, and exists for this very reason.)
         | 
         | This isn't only libc -- it also includes libgcc (or compiler-
         | rt, depending on your system toolchain), which, despite the
         | name, may still be called "behind your back" by the LLVM
         | toolchain.
         | 
         | > So it's a Rust program but it's just calling inline assembly
         | and using a C compiler?
         | 
         | Yeah, I think this article is more in the tradition of [0] (but
         | trying hard not to drop rustc) than being completely practical
         | advice on making the binary you ship to users smaller.
         | 
         | [0]:
         | http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...
        
           | andrewaylett wrote:
           | Well, not quite -- https://github.com/grahamking/demeter-
           | deploy/blob/master/see... is the Rust version of a program
           | that used to be written entirely in assembly, and it seems
           | that it ends up being the same size. There's a few bits of
           | asm in amongst the Rust, but it's still _definitely_ a Rust
           | program.
           | 
           | 800 lines of ASM file reduced to 600 lines of Rust, including
           | comments and constants in both cases. He might be pushing the
           | limits, and everything's unsafe Rust, but unsafe Rust is
           | still safer than raw assembly.
        
             | vacuity wrote:
             | > unsafe Rust is still safer than raw assembly
             | 
             | I don't think I would go that far. Assembly doesn't have
             | undefined behavior, and especially not with the strict
             | constraints around references as in Rust. The safe/unsafe
             | dichotomy in Rust is better than only using C or C++ when
             | there are concise, robust encapsulations around broken
             | invariants.
        
               | lmm wrote:
               | > Assembly doesn't have undefined behavior
               | 
               | Certainly some assembly languages do.
        
               | vacuity wrote:
               | Which ones? I assume at least the 1:1 machine code kind
               | doesn't, and you mean something more like bytecode, but
               | it'd be interesting if I'm wrong on that count.
        
               | vardump wrote:
               | > I don't think I would go that far. Assembly doesn't
               | have undefined behavior
               | 
               | As someone who has written a fair amount of assembler
               | over the years... Yes, it doesn't have undefined
               | behavior, but it also lacks practically all guard rails
               | and safeties.
               | 
               | The smallest error and you might do things like
               | completely messing up your call stack - just need to
               | forget one "POP" or mess up with stack pointer
               | adjustment. Or for example a computed jump in the middle
               | of an instruction.
               | 
               | You can create bugs that can be almost impossible to
               | figure out from a crash dump that even something as low
               | level as C will effectively protect you from doing.
        
               | vacuity wrote:
               | I wonder if those issues can't be somewhat mitigated with
               | a linter or interactive emulator. In any case, I think
               | assembly is more uniformly difficult (and not portable!),
               | while unsafe Rust generally feels less painless but you
               | might have no idea which invariants you need to enforce
               | unless you're very knowledgeable. Definitely don't write
               | a whole application in either!
        
         | estebank wrote:
         | Note that that step sheds libc entirely (so the binary needs to
         | provide the minimal things that libc does for your platform,
         | namely that assembly you mention, and you'd have to do the same
         | for a C binary that did that) and gets rid of 3kb (16kb ->
         | 13kb), but changing the linker flags to avoid page-aligning the
         | binary brings it down to _400 bytes_. I would have loved if the
         | author had tried that on the libc version too, just for
         | comparison 's sake.
         | 
         | In a lot of conversations around Rust binary sizes some people
         | extrapolate from the "Hello, World!" size difference as if the
         | additional cost on top of a bare C binary was linear, when in
         | reality it is (approximately) a constant cost. That on top of
         | completely disregarding that the "bloat" _is_ doing something
         | (panic machinery, string formatting, DWARF symbol storage,
         | DWARF symbol parsing, etc.).
        
           | tremon wrote:
           | It's definitely not a constant cost, presumably due to the
           | link-time optimization that rustc does. I've had binaries go
           | from 800kB to 6MB simply by switching from getopts to the
           | clap crate, for example.
        
             | pornel wrote:
             | Binary using clap with all the bells and whistles, even
             | without LTO, is 900KB _after strip_.
             | 
             | The standard library has 4MB of debug info baked in, which
             | due to its special integration with Cargo is always added,
             | even when you explicitly configure `debug=false`. This is
             | what usually surprises people and makes Rust executables
             | seem huge.
        
           | LtWorf wrote:
           | So it doesn't strip unneeded stuff?
        
       | cryo wrote:
       | Interesting, would be cool to see that applied to a real world
       | rust program.
       | 
       | Today I got rid of libc on the Windows version of a commandline
       | tool to flash firmware via USB, which freed 7 kB of the .exe
       | size.
       | 
       | The original version was done in C++ plus Qt and was ca. 3.5 MB
       | (.exe and dependencies).
       | 
       | The optimized C version is 14 kB compressed with upx.
       | 
       | FYI Code: https://github.com/dresden-elektronik/gcfflasher
        
       | Klasiaster wrote:
       | One can also create small binaries with
       | https://github.com/sunfishcode/origin (e.g.,
       | https://github.com/sunfishcode/origin/blob/main/example-crat...
       | is in that ~400 bytes range) and select features as wanted
       | without having to reimplement everything. Also see
       | https://github.com/sunfishcode/origin-studio and
       | https://github.com/sunfishcode/mustang - and of course
       | https://github.com/sunfishcode/eyra
        
       | r0rshrk wrote:
       | So, the way to make your Rust binary small is to make error
       | handling more difficult, or to rewrite it in assembly?
        
       | abathologist wrote:
       | I am not much concerned with hyper optimizations, but I was
       | curious how OCaml would fair with the initial, simple steps,
       | before things get crazy. But I opted for a more complex program:
       | (\* t.ml \*)         let () = print_endline "Hello, World!"
       | 
       | Then just doing a standard compilation and a strip:
       | $ ocamlopt -o t t.ml && ls -l -h t | cut -d " " -f5         1.5M
       | $ strip t && ls -l -h t | cut -d " " -f5         356K         $
       | ./t         Hello, World!
       | 
       | I may be overlooking something, and would be interested to learn
       | what if so, but I was surprised we got a result smaller than the
       | rust binary in the first instance.
        
         | estebank wrote:
         | It is interesting that before stripping the size of the Rust
         | version is bigger, but after only stripping the size of the
         | OCaml version is bigger. It'd be nice to try and see what the
         | "extra" info that Rust ships by default is.
        
           | abathologist wrote:
           | Yeah, I thought the same!
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:01 UTC)