[HN Gopher] Zlib-rs is faster than C
___________________________________________________________________
Zlib-rs is faster than C
Author : dochtman
Score : 140 points
Date : 2025-03-16 19:35 UTC (3 hours ago)
(HTM) web link (trifectatech.org)
(TXT) w3m dump (trifectatech.org)
| IshKebab wrote:
| It's _barely_ faster. I would say it 's more accurate to say it's
| as fast as C, which is still a great achievement.
| throwaway48476 wrote:
| But it is faster. The closer to theoretical maximum the smaller
| the gains become.
| mananaysiempre wrote:
| Zlib-ng is between a couple and multiple times away from the
| state of the art[1], it's just that nobody has yet done the
| (hard) work of adjusting libdeflate[2] to a richer API than
| "complete buffer in, complete buffer out".
|
| [1] https://github.com/zlib-ng/zlib-ng/issues/1486
|
| [2] https://github.com/ebiggers/libdeflate
| qweqwe14 wrote:
| "Barely" or not is completely irrelevant. The fact is that it's
| measurably faster than the C implementation with the more
| common parameters. So the point that you're trying to make
| isn't clear tbh.
|
| Also I'm pretty sure that the C implementation had more man
| hours put into it than the Rust one.
| bee_rider wrote:
| I think that would be really hard to measure. In particular,
| for this sort of very optimized code, we'd want to separate
| out the time spent designing the algorithms (which the Rust
| version benefits from as well). Actually I don't think that
| is possible at all (how will we separate out time spent
| coding experiments in C, then learning from them).
|
| Fortunately these "which language is best" SLOC measuring
| contests are just frivolous little things that only silly
| people take seriously.
| ajross wrote:
| It's... basically written in C. I'm no expert on zlib/deflate
| or related algorithms, but digging around
| https://github.com/trifectatechfoundation/zlib-rs/ almost every
| block with meaningful logic is marked unsafe. There's raw
| allocation management, raw slicing of arrays, etc... This code
| looks and smells like C, and very much not like rust. I don't
| know that this is a direct transcription of the C code, but if
| you were to try something like that this is sort of what it
| would look like.
|
| I think there's lots of value in wrapping a raw/unsafe
| implementation with a rust API, but that's not _quite_ what
| most people think of when writing code "in rust".
| hermanradtke wrote:
| > basically written in C
|
| Unsafe Rust still has to conform to many of Rust's rules. It
| is meaningfully different than C.
| est31 wrote:
| It has also way less tooling available than C to analyze
| its safety.
| nindalf wrote:
| The number of tools matters less than the quality of the
| tools. Rust's inherent guarantees + miri + software
| verification tools mean that in practice Rust code, even
| with unsafe, ends up being higher quality.
| ajross wrote:
| Are there examples you're thinking about? The only good
| ones I can think of are bits about undefined behavior
| semantics, which frankly are very well covered in modern C
| code via tools like ubsan, etc...
| sedatk wrote:
| This comment summarizes the difference of unsafe Rust
| quite well. Basically, mostly safe Rust, but with few
| exceptions, fewer than one would imagine:
| https://news.ycombinator.com/item?id=43382176
| steveklabnik wrote:
| They're just fundamentally different languages. There's
| semantics that exist in all four of these quadrants:
|
| * defined in C, undefined in Rust
|
| * undefined in C, undefined in Rust
|
| * defined in Rust, undefined in C
|
| * defined in Rust, defined in C
| xxs wrote:
| I mentioned in under another comment - and while I consider
| myself versed enough in deflate - comparing the library to
| zlib-ng is quite weird as the latter is generally hand
| written assembly. In order to beat it'd take some oddity in
| the test itself
| oneshtein wrote:
| Cannot understand your complain. It written in Rust, but for
| you it looks like C. So what?
| Alifatisk wrote:
| So, it is basically like it was written in C.
| ajross wrote:
| It doesn't exploit (and in fact deliberately evades) Rust's
| signature memory safety features. The impression from the
| headline is "Rust is as fast as C now!", but in fact the
| subset of the language that has been shown to be as fast as
| C is the subset that is basically _isomorphic_ to C.
|
| The impression a naive reader might take is that
| idiomatic/safe/best-practices Rust has now closed the
| performance gap. But clearly that's not happening here.
| sedatk wrote:
| Rust's many memory safety features (including the borrow
| checker) are still enabled in unsafe Rust blocks.
|
| For more information:
| https://news.ycombinator.com/item?id=43382176
| johnisgood wrote:
| It does actually seem like what a C -> Rust transpiler would
| spit out.
| gf000 wrote:
| C is not assembly, nor is it portable assembly at all in this
| century, so your phrasing is very off.
|
| C code will go through a huge amounts of transformations by
| the compiler, and unless you are a compiler expert you will
| have no idea how the resulting code looks. It's not targeting
| the PDP-11 anymore.
| johnisgood wrote:
| "faster than C" almost always boils down to different designs,
| implementations, algorithms, etc.
|
| Perhaps it is faster than already-existing implementations, sure,
| but not "faster than C", and it is odd to make such claims.
| oneshtein wrote:
| ... because by "C" we mean handwritten inline assembler.
|
| Typical realworld C code uses \0 terminated strings and
| strlen() with O(len^2) complexity.
| qweqwe14 wrote:
| The fact that it's faster than the C implementation that surely
| had more time and effort put into it doesn't look good for C
| here.
| johnisgood wrote:
| It says absolutely nothing about the programming language
| though.
| acdha wrote:
| Doesn't it say something if Rust programmers routinely feel
| more comfortable making aggressive optimizations and have
| more time to do so? We maintain code for longer than the
| time taken to write the first version and not having to pay
| as much ongoing overhead cost is worth something.
| vkou wrote:
| I think you'll find that if you re-write an application,
| feature-for-feature, _without_ changing its language, the re-
| written version will be faster.
| renewiltord wrote:
| This is known as the Second System Effect: where Great
| Rewrites always succeed in making a more performant thing.
| xxs wrote:
| zlib-ng is pretty much assembly - with a bit of C. There is
| this quote: _but was not entirely fair because our rust
| implementation could assume that certain SIMD capabilities
| would be available, while zlib-ng had to check for them at
| runtime_
|
| zlib-ng can be compiled to whatever target arch is necessary,
| and the original post doesn't mention how it was compiled and
| what architecture and so on.
|
| It's another case not to trust micro benchmarks
| tdiff wrote:
| Nevertheless Russinovich actually says something in the lines
| of "simple rewriting in rust made some our code 5-15% faster
| (without deliberate optimizations)":
| https://www.youtube.com/watch?v=1VgptLwP588&t=351s
| pinkmuffinere wrote:
| I'm sure I'm missing context, and presumably there are other
| benefits, but 5-15% improvement is such a small step to
| justify rewriting codebases.
|
| I also wonder how much of an improvement you'd get by just
| asking for a "simple rewrite" in the existing language. I
| suspect there are often performance improvements to be had
| with simple changes in the existing language
| tdiff wrote:
| I agree that simple rewriting could have given some if not
| all perf benefits, but can it be the case that rust forces
| us to structure code in a way that is for some reason more
| performant in some cases?
|
| 5-15% is a big deal for a low-level foundational code,
| especially if you get it along with some other guarantees,
| which may be of greater importance.
| turtletontine wrote:
| Far better justification for a rewrite like this is if it
| eases maintenance, or simplifies
| building/testing/distribution. Taking an experienced and
| committed team of C developers with a mature code base, and
| retraining them to rewrite their project in Rust for its
| own sake is pretty absurd. But if you have a team that's
| more comfortable in Rust, then doing so could make a lot of
| sense - and, yes, make it easier to ensure the product is
| secure and memory-safe.
| johnisgood wrote:
| > if you have a team that's more comfortable in
|
| As is the case with any languages, of course, it is not
| in favor (nor against) Rust.
| sedatk wrote:
| > 5-15% improvement is such a small step to justify
| rewriting codebases
|
| They hadn't expected any perf improvements at all. Quite
| the opposite, in fact. They were surprised that they saw
| perf improvements right away.
| kgeist wrote:
| I heard that aliasing in C prevents the compiler from
| optimizing aggressively. I can believe Rust's compiler can
| optimize more aggressively if there's no aliasing problem.
| layer8 wrote:
| C has the _restrict_ type qualifier to express non-aliasing,
| hence it shouldn't be a fundamental impediment.
| gf000 wrote:
| Which is so underused that the whole compiler feature was
| buggy as hell, and was only recently fixed because
| compiling Rust where it is the norm exposed it.
| layer8 wrote:
| If anything, this should be "zlib-rs is faster than zlib-ng",
| but not "$library is faster than $programming_language".
| chjj wrote:
| It should be, but you'll never convince the rust people of
| that. It's always a competition with them.
| kahlonel wrote:
| You mean the implementation is faster than the one in C. Because
| nothing is "faster than C".
| arlort wrote:
| Tachyons?
| einpoklum wrote:
| Maybe if you reverse the beam polarity and route them through
| the main deflector array.
| layer8 wrote:
| But that requires rerouting auxiliary power from life
| support to the shield generators. In Rust you would need to
| use _unsafe_ for that.
| mkoubaa wrote:
| C after an optimizing compiler has chewed through it is faster
| than C
| Jaxan wrote:
| Of course many things can be faster than C, because C is very
| far from modern hardware. If you compile with optimisation
| flags, the generated machine code looks nothing like what you
| programmed in C.
| dijit wrote:
| The kind of code you can write in rust can indeed be faster
| than C, but someone will wax poetic about how anything is
| possible in C and they would be valid.
|
| The major reason that rust can be faster than C though, is
| because due to the way the compiler is constructed, you can
| lean on threading idiomatically. The same can be true for Go,
| coroutines vs no coroutines in some cases is going to be faster
| for the use case.
|
| You _can_ write these things to be the same speed or even
| faster in C, but you won't, because it's hard and you will
| introduce more bugs per KLOC in C with concurrency vs Go or
| Rust.
| pornel wrote:
| If you don't count manual SIMD intrinsics or inline assembly as
| C, then Rust and FORTRAN can be faster than C. This is mainly
| thanks to having pointer aliasing guarantees that C doesn't
| have. They can get autovectorization optimizations where C's
| semantics get in the way.
| nindalf wrote:
| Why can't something be faster than C? If a language is able to
| convey more information to a backend like LLVM, the backend
| could use that to produce more optimised code than what it
| could do for C.
|
| For example, if the language is able to say, for any two
| pointers, the two pointers will not overlap - that would enable
| the backend to optimise further. In C this requires an explicit
| restrict keyword. In Rust, it's the default.
|
| By the way this isn't theoretical. Image decoders written in
| Rust are faster than ones written in C, probably because the
| backend is able to autovectorise better. (https://www.reddit.co
| m/r/rust/comments/1ha7uyi/memorysafe_pn...).
|
| grep (C) is about 5-10x slower than ripgrep (Rust). That's why
| ripgrep is used to execute all searches in VS Code and not
| grep.
|
| Or a different tack. If you wrote a program that needed to sort
| data, the Rust version would probably be faster thanks to the
| standard library sort being the fastest, across languages
| (https://github.com/rust-lang/rust/pull/124032). Again, faster
| than C.
|
| Happy to give more examples if you're interested.
|
| There's nothing special about C that entitles it to the crown
| of "nothing faster". This would have made sense in 2005, not
| 2025.
| burntsushi wrote:
| Narrow correction on two points:
|
| First, I would say that "ripgrep is generally faster than GNU
| grep" is a true statement. But sometimes GNU grep is faster
| than ripgrep and in many cases, performance is comparable or
| only a "little" slower than ripgrep.
|
| Secondly, VS Code using ripgrep because of its speed is only
| one piece of the picture. Licensing was also a major
| consideration. There is an issue about this where they
| originally considered ripgrep (and ag if I recall correctly),
| but I'm on mobile so I don't have the link handy.
| kllrnohj wrote:
| It is quite easy for C++ and Rust to both be faster than C in
| things larger than toy projects. C is hardly a panacea of
| efficiency, and the language makes useful things very hard to
| do efficiently.
|
| You can contort C to trick it into being fast[1], but it
| quickly becomes an unmaintainable nightmare so almost nobody
| does.
|
| 1: eg, correct use of restrict, manually creating move
| semantics, manually creating small string optimizations, etc...
| gf000 wrote:
| Wtf, since when?
|
| Besides the famous "C is not a low-level language" blog post..
| I don't even get what you are thinking. C is not even the
| performance queen for large programs (the de facto standard
| today is C++ for good reasons), let alone for tiny ultra hot
| loops like codecs and stuff, which are all hand-written
| assembly.
|
| It's not even hard to beat C with something like Rust or C++,
| because you can properly do high level optimizations as the
| language is expressive enough for that.
| YZF wrote:
| I found out I already know Rust: unsafe {
| let x_tmp0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x10);
| xmm_crc0 = _mm_clmulepi64_si128(xmm_crc0, crc_fold, 0x01);
| xmm_crc1 = _mm_xor_si128(xmm_crc1, x_tmp0);
| xmm_crc1 = _mm_xor_si128(xmm_crc1, xmm_crc0);
|
| Kidding aside, I thought the purpose of Rust was for safety but
| the keyword unsafe is sprinkled liberally throughout this
| library. At what point does it really stop mattering if this is C
| or Rust?
|
| Presumably with inline assembly both languages can emit what is
| effectively the same machine code. Is the Rust compiler a better
| optimizing compiler than C compilers?
| oneshtein wrote:
| > I thought the purpose of Rust was for safety but the keyword
| unsafe is sprinkled liberally throughout this library.
|
| What wrong with that?
| Filligree wrote:
| The usual answer is: You only need to verify the unsafe blocks,
| not every block. Though 'unsafe' in Rust is actually even less
| safe than regular C, if a bit more predictable, so there's a
| crossover point where you really shouldn't have bothered.
|
| The Rust compiler is indeed better than the C one, largely
| because of having more information and doing full-program
| optimisation. A `vec_foo =
| vec_foo.into_iter().map(...).collect::Vec<foo>`, for example,
| isn't going to do any bounds checks _or_ allocate.
| johnisgood wrote:
| I have been told that "unsafe" affects code outside of that
| block, but hopefully steveklabnik may explain it better
| (again).
|
| > isn't going to do any bounds checks or allocate.
|
| You need to add explicit bounds check or explicitly allocate
| _in C_ though. It is not there if you do not add it yourself.
| LegionMammal978 wrote:
| > I have been told that "unsafe" affects code outside of
| that block, but hopefully stevelabnik may explain it better
| (again).
|
| Poorly-written unsafe code can have effects extending out
| into safe code. But correctly-written unsafe code does not
| have any effects on safe code w.r.t. memory safety. So to
| ensure memory safety, you just have to verify the
| correctness of the unsafe code (and any helper functions,
| etc., it depends on), rather than the entire codebase.
|
| Also, some forms of unsafe code are far less dangeous than
| others in practice. E.g., most of the SIMD functions are
| practically safe to call in every situation, but they all
| have 'unsafe' slapped on them due to being intrinsics.
|
| > You need to add explicit bounds check or explicitly
| allocate _in C_ though. It is not there if you do not add
| it yourself.
|
| Unfortunately, you do need to allocate a new buffer in C if
| you change the type of the elements. The annoying side of
| strict aliasing is that every buffer has a single type
| that's set in stone for all time. (Unless you preemptively
| use unions for everything.)
| uecker wrote:
| C has type-changing stores. If you store to a buffer with
| a new type, it has the new type. Clang does not implement
| this correctly though, but GCC does.
| pornel wrote:
| Buggy unsafe blocks can affect code anywhere (through
| Undefined Behavior, or breaking the API contract).
|
| However, if you verify that the unsafe blocks are correct,
| and the safe API wrapping them rejects invalid inputs, then
| they won't be able to cause unsafety anywhere.
|
| This does reduce how much code you need to review for
| memory safety issues. Once it's encapsulated in a safe API,
| the compiler ensures it can't be broken.
|
| This encapsulation also prevents combinatorial explosion of
| complexity when multiple (unsafe) libraries interact.
|
| I can take zlib-rs, and some multi-threaded job executor
| (also unsafe internally), but I don't need to specifically
| check how these two interact. zlib-rs needs to ensure they
| use slices and lifetimes correctly, the threading library
| needs to ensure it uses correct lifetimes and type bounds,
| and then the compiler will check all interactions between
| these two libraries for me. That's like (M+N) complexity to
| deal with instead of (M*N).
| steveklabnik wrote:
| > I have been told that "unsafe" affects code outside of
| that block, but hopefully stevelabnik may explain it better
| (again).
|
| It's due to a couple of different things interacting with
| each other: unsafe relies on invariants that safe code must
| also uphold, and that the privacy boundary in Rust is the
| module.
|
| Before we get into the unsafe stuff, I want you to consider
| an example. Is this Rust code okay?
| struct Foo { bar: usize, }
| impl Foo { fn set_bar(&mut self, bar: usize) {
| self.bar = bar; } }
|
| No unsafe shenanigans here. This code is perfectly safe, if
| a bit useless.
|
| Let's talk about unsafe. The canonical example of unsafe
| code being affected outside of unsafe itself is the
| implementation of Vec<T>. Vecs look _something_ like this
| (the real code is different for reasons that don 't really
| matter in this context): struct Vec<T> {
| ptr: *mut T, len: usize, cap: usize,
| }
|
| The pointer is to a bunch of Ts in a row, the length is the
| current number of Ts that are valid, and the capacity is
| the total number of Ts. The length and the capacity are
| different so that memory allocation is amortized; the
| capacity is always greater than or equal to the length.
|
| That property is very important! If the length is greater
| than the capacity, when we try and index into the Vec, we'd
| be accessing random memory.
|
| So now, this function, which is the same as Foo::set_bar,
| is no longer okay: impl<T> Vec<T> {
| fn set_len(&mut self, len: usize) {
| self.len = len; } }
|
| This is because the unsafe code inside of other methods of
| Vec<T> need to be able to rely on the fact that len <=
| capacity. And so you'll find that Vec<T>::set_len in Rust
| is marked as unsafe, even though it doesn't contain unsafe
| code. It still requires judicious use of to not introduce
| memory unsafety.
|
| And this is why the module being the privacy boundary
| matters: the only way to set len directly in safe Rust code
| is code within the same privacy boundary as the Vec<T>
| itself. And so, that's the same module, or its children.
| dietr1ch wrote:
| > I thought the purpose of Rust was for safety but the keyword
| unsafe is sprinkled liberally throughout this library.
|
| Which is exactly the point, other languages have unsafe
| implicitly sprinkled in every single line.
|
| Rust tries to bound and explicitly delimit where unsafe code is
| to makes review and verification efforts precise.
| datadeft wrote:
| I thought that the point of Rust is to have safe {} blocks
| (implicit) as a default and unsafe {} when you need the
| absolute maximum performance available. You can audit those few
| lines of unsafe code very easily. With C everything is unsafe
| and you can just forget to call free() or call it twice and you
| are done.
| steveklabnik wrote:
| > unsafe {} when you need the absolute maximum performance
| available.
|
| Unsafe code is not inherently faster than safe code, though
| sometimes, it is. Unsafe is for when you want to do something
| that is legal, but the compiler cannot understand that it is
| legal.
| WD-42 wrote:
| It's not about performance, it's about undefined behavior.
| akx wrote:
| To quote the Rust book (https://doc.rust-
| lang.org/book/ch20-01-unsafe-rust.html): In
| addition, unsafe does not mean the code inside the block
| is necessarily dangerous or that it will definitely have
| memory safety problems: the intent is that as the
| programmer, you'll ensure the code inside an unsafe block
| will access memory in a valid way.
|
| Since you say you already know that much Rust, you can be that
| programmer!
| silisili wrote:
| I feel like C programmers had the same idea, and well, we see
| how that works out in practice.
| dijit wrote:
| the problem in those cases is that C can't help but be
| unsafe always.
|
| People can write memory safe code, just not 100% of the
| time.
| sunshowers wrote:
| No, C lacks encapsulation of unsafe code. This is very
| important. Encapsulation is the only way to scale local
| reasoning into global correctness.
| Aurornis wrote:
| Using unsafe blocks in Rust is confusing when you first see it.
| The idea is that you have to opt-out of compiler safety
| guarantees for specific sections of code, but they're clearly
| marked by the unsafe block.
|
| In good practice it's used judiciously in a codebase where it
| makes sense. Those sections receive extra attention and
| analysis by the developers.
|
| Of course you can find sloppy codebases where people reach for
| unsafe as a way to get around Rust instead of writing code the
| Rust way, but that's not the intent.
|
| You can also find die-hard Rust users who think unsafe should
| never be used and make a point to avoid libraries that use it,
| but that's excessive.
| timschmidt wrote:
| Unsafe is a very distinct code smell. Like the hydrogen
| sulfide added to natural gas to allow folks to smell a gas
| leak.
|
| If you smell it when you're not working on the gas lines,
| that's a signal.
| cmrdporcupine wrote:
| Look, no. Just go read the unsafe block in question. It's
| just SIMD intrinsics. No memory access. No pointers. It's
| unsafe in name only.
|
| No need to get all moral about it.
| kccqzy wrote:
| By your line of reasoning, SIMD intrinsics functions
| should not be marked as unsafe in the first place. Then
| why are they marked as unsafe?
| cmrdporcupine wrote:
| There's no standardization of simd in Rust yet, they've
| been sitting in nightly unstable for years:
|
| https://doc.rust-lang.org/std/intrinsics/simd/index.html
|
| So I suspect it's a matter of two things:
|
| 1. You're calling out to what's basically assembly, so
| buyer beware. This is basically FFI into C/asm.
|
| 2. There's no guarantee on what comes out of those
| 128-bit vectors after to follow any sanity or
| expectations, so... buyer beware. Same reason
| std::mem::transmute is marked unsafe.
|
| It's really the weakest form of unsafe.
|
| Still entirely within the bounds of a sane person to
| reason about.
| pclmulqdq wrote:
| > they've been sitting in nightly unstable for years
|
| So many very useful features of Rust and its core library
| spend years in "nightly" because the maintainers of those
| features don't have the discipline to see them through.
| cmrdporcupine wrote:
| simd and allocator_api are the two that irritate me
| enough to consider a different language for future
| systems dev projects.
|
| I don't have the personality or time to wade into
| committee type work, so I have no idea what it would take
| to get those two across the finish line, but the
| allocator one in particular makes me question Rust for
| lower level applications. I think it's just not going to
| happen.
|
| If Zig had proper ADTs and something equivalent to borrow
| checker, I'd be inclined to poke at it more.
| steveklabnik wrote:
| > There's no standardization of simd in Rust yet
|
| Of _safe_ SIMD, but some stuff in core::arch is
| stabilized. Here 's the first bit called in the example
| of the OP: https://doc.rust-
| lang.org/core/arch/x86/fn._mm_clmulepi64_si...
| CryZe wrote:
| They are in the process of marking them safe, which is
| enabled through the target_feature 1.1 RFC.
|
| In fact, it has already been merged two weeks ago:
| https://github.com/rust-lang/stdarch/pull/1714
|
| The change is already visible on nightly:
| https://doc.rust-
| lang.org/nightly/core/arch/x86/fn._mm_xor_s...
|
| Compared to stable: https://doc.rust-
| lang.org/core/arch/x86/fn._mm_xor_si128.htm...
|
| So this should be stable in 1.87 on May 15 (Rust's 10
| year anniversary since 1.0)
| timschmidt wrote:
| I don't read any moralizing in my previous comment. And
| it seems to mirror the relevant section in the book:
|
| "People are fallible, and mistakes will happen, but by
| requiring these five unsafe operations to be inside
| blocks annotated with unsafe you'll know that any errors
| related to memory safety must be within an unsafe block.
| Keep unsafe blocks small; you'll be thankful later when
| you investigate memory bugs."
|
| I hope the SIMD intrinsics make it to stable soon so
| folks can ditch unnecessary unsafes if that's the only
| issue.
| SkiFire13 wrote:
| SIMD intrinsics are unsafe because they are available
| only under some CPU features.
| mrob wrote:
| There's no standard recipe for natural gas odorant, but
| it's typically a mixture of various organosulfur compounds,
| not hydrogen sulfide. See:
|
| https://en.wikipedia.org/wiki/Odorizer#Natural_gas_odorizer
| s
| timschmidt wrote:
| TIL!
| api wrote:
| The idea is that you can trivially search the code base for
| "unsafe" and closely examine all unsafe code, and unless you
| are doing really low-level stuff there should not be much of
| it. Higher level code bases should ideally have none.
|
| It tends to be found in drivers, kernels, vector code, and
| low-level implementations of data structures and allocators
| and similar things. Not typical application code.
|
| As a general rule it should be avoided unless there's a good
| reason to do it. But it's there for a reason. It's almost
| impossible to create a systems language that imposes any kind
| of rules (like ownership etc.) that covers all possible cases
| and all possible optimization patterns on all hardware.
| timschmidt wrote:
| To the extent that it's even possible to write bare metal
| microcontroller firmware in Rust without unsafe, as the
| embedded hal ecosystem wraps unsafe hardware interfaces in
| a modular fairly universal safe API.
| formerly_proven wrote:
| My understanding from Aria Beingessner's and some other
| writings is that unsafe{} rust is significantly harder to
| get right in "non-trivial cases" than C, because the
| semantics are more complex and less specified.
| dwattttt wrote:
| It's hard to compare. Rust has stricter requirements than
| C, but looser requirements don't mean easier: ever bit
| shifted by a variable amount? Hope you never relied on
| shifting "entirely" out of a variable zeroing it.
| chongli wrote:
| Isn't it the case that once you use unsafe even a single
| time, you lose all of Rust's nice guarantees? As far as I'm
| aware, inside the unsafe block you can do whatever you want
| which means all of the nice memory-safety properties of the
| language go away.
|
| It's like letting a wet dog (who'd just been swimming in a
| nearby swamp) run loose inside your hermetically sealed
| cleanroom.
| timschmidt wrote:
| It seems like you've got it backwards. Even unsafe rust is
| still more strict than C. Here's what the book has to say
| (https://doc.rust-lang.org/book/ch20-01-unsafe-rust.html)
|
| "You can take five actions in unsafe Rust that you can't in
| safe Rust, which we call unsafe superpowers. Those
| superpowers include the ability to:
| Dereference a raw pointer Call an unsafe function
| or method Access or modify a mutable static
| variable Implement an unsafe trait Access
| fields of a union
|
| It's important to understand that unsafe doesn't turn off
| the borrow checker or disable any other of Rust's safety
| checks: if you use a reference in unsafe code, it will
| still be checked. The unsafe keyword only gives you access
| to these five features that are then not checked by the
| compiler for memory safety. You'll still get some degree of
| safety inside of an unsafe block.
|
| In addition, unsafe does not mean the code inside the block
| is necessarily dangerous or that it will definitely have
| memory safety problems: the intent is that as the
| programmer, you'll ensure the code inside an unsafe block
| will access memory in a valid way.
|
| People are fallible, and mistakes will happen, but by
| requiring these five unsafe operations to be inside blocks
| annotated with unsafe you'll know that any errors related
| to memory safety must be within an unsafe block. Keep
| unsafe blocks small; you'll be thankful later when you
| investigate memory bugs."
| pclmulqdq wrote:
| The way I have heard it described that I think is a bit
| more succinct is "unsafe admits undefined behavior as
| though it was safe."
| Someone wrote:
| But "Dereference a raw pointer", in combination with the
| ability to create raw pointers pointing to arbitrary
| memory addresses (that, you can do even in safe rust)
| allows you to write arbitrary memory from unsafe rust.
|
| So, _in theory_ , unsafe rust opens the floodgates. _In
| practice_ , though, you can use small fragments of unsafe
| code that programmers can fairly easily check to be safe.
|
| Then, once you've convinced yourself that those fragments
| are safe, you can be assured that your whole program is
| safe (using 'safe' in the rust sense, of course)
|
| So, there may be some small islands of unsafe code that
| require extra attention from the programmer, but that
| should be just a tiny fraction of all lines, and you
| should be able to verify those islands in isolation.
| steveklabnik wrote:
| > allows you
|
| This is where the rubber hits the road. Rust does not
| allow you to do this, in the sense that this is possibly
| undefined behavior. That "possibly" is why the compiler
| allows you to write this code, because by saying
| "unsafe", you are promising that this specific arbitrary
| address is legal for you to write to. But that doesn't
| mean that it's always legal to do so.
| timschmidt wrote:
| The compiler won't allow you to compile such code without
| the unsafe. The unsafe is *you* promising the compiler
| that *you* have checked to ensure that the address will
| always be legal. So that the compiler will allow you to
| compile the code.
| steveklabnik wrote:
| Right, I'm saying "allow" has two different connotations,
| and only one of them, the one that you're talking about,
| applies.
| timschmidt wrote:
| I gotcha. I misread and misunderstood. Yes, we agree.
| uecker wrote:
| This description is still misleading. The preconditions
| for the correctness of an unsafe block can very much
| depend on the correctness of the code outside and it is
| easy to find Rust bugs where exactly this was the cause.
| This is very similar where often C out of bounds accesses
| are caused by some logic error elsewhere. Also an unsafe
| block has to maintain all the invariants the safe Rust
| part needs to maintain correctness.
| iknowstuff wrote:
| No. Correctness of code _outside_ unsafe depends on
| correctness inside those blocks, not the other way around
| uecker wrote:
| Sweet summer child.
| iknowstuff wrote:
| tf are you talking about
| steveklabnik wrote:
| They are (rudely) talking about
| https://news.ycombinator.com/item?id=43382369
| dwattttt wrote:
| In a more helpful framing: safe Rust code doesn't need to
| worry about its own correctness, it just is.
|
| Unsafe code can be incorrect (or unsound), and needs to
| be careful about it. Part of being careful is that safe
| code can call the unsafe code in a way that triggers that
| unsoundness; in that way, safe code can cause undefined
| behaviour in unsafe code.
|
| It's not always the case that this is possible; there are
| unsafe blocks that don't need to depend on safe code for
| its correctness.
| dwattttt wrote:
| It's true, but I think it's only fair if you hold Rust to
| this analysis, other languages should too; the scrutiny
| you're implying you need in an unsafe Rust block needs to
| be applied to all C code, because all C code could depend
| on code anywhere else for its safety characteristics.
|
| In practice (in both languages) you check what the actual
| unsafe code does (or "all" code in C's case), note code
| that depends on external actors for safety (it's not all
| C code, nor is it all unsafe Rust blocks), and check
| their callers (and callers callers, etc).
| uecker wrote:
| What is true is that there are more operations in C which
| can cause undefined behavior and those are more densely
| distributed over the C code, making it harder to screen
| for undefined behavior. This is true and Rust certainly
| has an advantage, but it not nearly as big of an
| advantage as the "Rust is safe" (please do not look at
| all the unsafe blocks we need to make it also fast!) and
| "all C is unsafe" story wants you to believe.
| dwattttt wrote:
| The places where undefined behaviour can occur are also
| limited in scope; you insist that that part isn't true,
| because operations outside those unsafe blocks can impact
| their safety.
|
| That's only true at the same level of scrutiny as "all C
| operations can cause undefined behaviour, regardless of
| what they are", which I find similarly shallow.
| gf000 wrote:
| Rust is plenty fast, in fact there are countless examples
| of _safe_ rust that will trivially beat out C in
| performance due to no aliasing, enabling better
| vectorization among others. Let alone being simply a more
| expressive language and allowing writing better
| optimizations (e.g. small strings, vs the absolutely
| laughable c-strings that perform terribly, but also you
| can actually get away with sharing more stuff in memory
| vs doing defensive copies everywhere because it is safe
| to do so, etc)
|
| And there is not many things we have statistics on in CS,
| but memory vulnerabilities being absolutely everywhere in
| unsafe languages, and Rust cleaning up the absolute
| majority of them even when only the new parts are written
| in Rust are some of the few we _do_ know, based on
| actual, real life projects at Google /Microsoft among
| others.
|
| A memory safe low-level language is as novel as it gets.
| Rust is absolutely not just hype, it actually delivers
| and you might want to get on with the times.
| lambda wrote:
| So, it's true that unsafe code can depend on
| preconditions that need to be upheld by safe code.
|
| But using ordinary module encapsulation and private
| fields, you can scope the code that needs to uphold those
| preconditions to a particular module.
|
| So the "trusted computing base" for the unsafe code can
| still be scoped and limited, allowing you to reduce the
| amount of code you need to audit and be particularly
| careful about for upholding safety guarantees.
|
| Basically, when writing unsafe code, the actual unsafe
| operations are scoped to only the unsafe blocks, and they
| have preconditions that you need to scope to a particular
| module boundary to ensure that there's a limited amount
| of code that needs to be audited to ensure it upholds all
| of the safety invariants.
|
| Ralf Jung has written a number of good papers and blog
| posts on this topic.
| uecker wrote:
| And you think one can not modularize C code and
| encapsulate critical buffer operations in much safer
| APIs? One can, the problem is that a lot of legacy C code
| was not written this way. Also lot of newly written C
| code is not written this way, but the reason is often
| that people cut corners when they need to get things done
| with limited time and resources. The same you will see
| with Rust.
| gf000 wrote:
| Even innocent looking C code can be chock-full of UBs
| that can invalidate your "local reasoning" capabilities.
| So, not even close.
| wavemode wrote:
| Care to share an example?
| gf000 wrote:
| This is technically correct, but a bit pedantic.
|
| Sure, you can technically just write your own
| vulnerability for your own program and inject it at an
| unsafe and see the whole world crumble... but the exact
| same is true for any form of FFI calls in any language.
| Is Java memory safe? Yeah, just because I can grab a
| random pointer and technically break anything I want
| won't change that.
|
| The fact that a memory vulnerability _error_ may either
| appear at no place at all _OR_ at the couple hundred
| lines of code thorough the whole project is a night and
| day difference.
| onnimonni wrote:
| Would someone with more experience be able to explain to
| me why can't these operations be "safe"? What is blocking
| rust from producing the same machine code in a "safe"
| way?
| vlovich123 wrote:
| Those specific functions are compiler builtin vector
| intrinsics. The main reason is that they can easily read
| past ends of arrays and have type safety and aliasing
| issues.
|
| By the way, the rust compiler does generate such code
| because under the hood LLVM runs an autovectorizer when
| you turn on optimizations. However, for the
| autovectorizer to do a good job you have to write code in
| a very special way and you have no way of controlling
| whether or not it kicked in and once it did that it did a
| good job.
|
| There's work on creating safe abstractions (that also
| transparently scale to the appropriate vector
| instruction), but progress on that has felt slow to me
| personally and it's not available outside nightly
| currently.
| NobodyNada wrote:
| Rust's raw pointers are more-or-less equivalent to C
| pointers, with many of the same types of potential
| problems like dangling pointers or out-of-bounds access.
| Rust's references are the "safe" version of doing pointer
| operations; raw pointers exist so that you can express
| patterns that the borrow checker can't prove are sound.
|
| Rust encourages using unsafe to "teach" the language new
| design patterns and data structures; and uses this
| heavily in its standard library. For example, the Vec
| type is a wrapper around a raw pointer, length, and
| capacity; and exposes a safe interface allowing you to
| create, manipulate, and access vectors with no risk of
| pointer math going wrong -- assuming the people who
| implemented the unsafe code inside of Vec didn't make a
| mistake, the external, safe interface is guaranteed to be
| sound no matter what external code does.
|
| Think of unsafe not as "this code is unsafe", but as
| "I've proven this code to be safe, and the borrow checker
| can rely on it to prove the safety of the rest of my
| program."
| adgjlsfhk1 wrote:
| often the unsafe code is at the edges of the type system.
| e.g. sometimes the proof of safety is that someone read
| the source code of the c library that you are calling out
| to. it's not useful to think of machine code as safe or
| unsafe. safety often refers to whether the types of your
| data match the lifetime dataflow.
| rybosome wrote:
| I believe the post you are replying to was referring to
| the fact that you could take actions in that unsafe block
| that would compromise the guarantees of rust; eg you
| could do something silly, leave the unsafe block, then
| hit an "impossible" condition later in the program.
|
| A simple example might be modifying a const value deep
| down in some class, where it only becomes apparent later
| in the program's execution. Hence their analogy of the
| wet dog in a clean room - whatever beliefs you have about
| the structure of memory in your entire program, and
| guaranteed by the compiler, could have been undone by a
| rogue unsafe.
| CooCooCaCha wrote:
| I wouldn't go that far. Bevy for example, uses unsafe
| internally but is VERY strict about it, and every use of
| unsafe requires a comment explaining why the code is safe.
|
| In other words, unsafe works if you use it carefully and
| keep it contained.
| tonyhart7 wrote:
| right, the point is raising awareness and assumption its
| not 100 and 0 problem
| SkiFire13 wrote:
| You lose the nice guarantees inside the `unsafe` block, but
| the point is to write a sound and safe interface over it,
| that is an API that cannot lead to UB no matter how other
| safe code calls it. This is basically the encapsulation
| concept, but for safety.
|
| To continue the analogy of the dog, you let the dog get wet
| (=you use unsafe), but you put a cleaning room (=the sound
| and safe API) before your sealed room (=the safe code
| world)
| timeon wrote:
| > unsafe even a single time, you lose all of Rust's nice
| guarantees
|
| Not sure why would _one_ resulted in _all_. One of Rust 's
| advantages is the clear boundary between safe/unsafe.
| wongarsu wrote:
| If your unsafe code violates invariants it was supposed to
| uphold, that can wreck safety properties the compiler was
| trying to uphold elsewhere. If you can achieve something
| without unsafe you definitely should (safe, portable simd
| is available in rust nightly, but it isn't stable yet).
|
| At the same time, unsafe doesn't just turn off all compiler
| checks, it just gives you tools to go around them, as well
| as tools that happen to go around them because of the way
| they work. Rust unsafe is this weird mix of being safer
| than pure C, but harder to grasp; with lots of nuanced
| invariants you have to uphold. If you want to ensure your
| code still has all the nice properties the compiler
| guarantees (which go way beyond memory safety) you would
| have to carefully examine every unsafe block. Which few
| people do, but you generally still end up with a better
| status quo than C/C++ where _any_ code can in principle
| break properties other code was trying to uphold.
| sunshowers wrote:
| What language is the JVM written in?
|
| _All_ safe code in existence running on von Neumann
| architectures is built on a foundation of unsafe code. The
| goal of _all_ memory-safe languages is to provide safe
| abstractions on top of an unsafe core.
| janice1999 wrote:
| Claiming unsafe invalidates "all of the nice memory-safety
| properties" is like saying having windows in your house
| does away with all the structural integrity of your walls.
|
| There's even unsafe usage in the standard library and it's
| used a lot in embedded libraries.
| vlovich123 wrote:
| You only lose those guarantees if and only if the code
| within the unsafe block violates the rules of the Rust
| language.
|
| Normally in safe code you can't violate the language rules
| because the compiler enforces various rules. In unsafe
| mode, you can do several things the compiler would normally
| prevent you from doing (e.g. dereferencing a naked
| pointer). If you uphold all the preconditions of the
| language, safety is preserved.
|
| What's unfortunate is that the rules you are required to
| uphold can be more complex than you might anticipate if
| you're trying to use unsafe to write C-like code. What's
| fortunate is that you rarely need to do this in normal code
| and in SIMD which is what the snippet is representing
| there's not much danger of violating the rules.
| pdimitar wrote:
| Where did you even get that weird extreme take from?
|
| O_o
| colonwqbang wrote:
| Can't rust do safe simd? This is just vectorised
| multiplication and xor, but it gets labelled as unsafe. I
| imagine most code that wants to be fast would use simd to
| some extent.
| steveklabnik wrote:
| It's still nightly-only.
| pcwalton wrote:
| > Presumably with inline assembly both languages can emit what
| is effectively the same machine code. Is the Rust compiler a
| better optimizing compiler than C compilers?
|
| rustc uses LLVM just as clang does, so to a first approximation
| they're the same. For any given LLVM IR you can _mostly_ write
| equivalent Rust and C++ that causes the respective compiler to
| emit it (the switch fallthrough thing mentioned in the article
| is interesting though!) So if you 're talking about what's
| _possible_ (as opposed to what 's _idiomatic_ ), the question
| of "which language is faster" isn't very interesting.
| AlotOfReading wrote:
| The key difference is that there are invariants you can rely on
| as a user of the library, and they'll be enforced by the
| compiler outside the unsafe blocks. The corresponding C
| invariants mostly aren't enforced by the compiler. Worse, many
| C programmers will actively argue that some amount of undefined
| behavior is "fine".
| jdefr89 wrote:
| Not to mention they link to libc.. All rust code does last I
| checked...
| techjamie wrote:
| There is an option to not link to it for instances like OS
| writing and embedded. Writing everything in pure Rust without
| libc is entirely possible, even if an effort in losing sanity
| when you're reimplementing every syscall you need from
| scratch.
|
| But even then, your code is calling out to kernel functions
| which are probably written in C or assembly, and therefore
| "dangerous."
|
| Rust code safety is overhyped frequently, but reducing an
| attack surface is still an improvement over not doing so.
| jdefr89 wrote:
| I agree and binary exploitation/Vulnerability Research is
| my area of expertise.. The whole "Lets port everything to
| Rust" is so misguided. Binary exploitation has already
| gotten 20x harder than say ten years ago.. Even so.. Most
| big breaches happen because people reuse their password or
| just give it out... Nation States are pretty much the only
| parties capable of delivering full kill chains that
| exploit, say chrome... That is why I moved to the embedded
| space.. Still so insecure...
| einpoklum wrote:
| > At what point does it really stop mattering if this is C or
| Rust?
|
| That depends. If, for you, safety is something relative and
| imperfect rather than absolute, guaranteed and reliable, then -
| the answer is that once you have the first non-trivial unsafe
| block that has not gotten standard-library-level of scrutiny.
| But if that's your view, you should not be all that starry-eyed
| about how "Rust is a safe language!" to begin with.
|
| On the other hand, if you really do want to rely on Rust's
| strong safety guarantees, then the answer is: From the moment
| you use any library with unsafe code.
|
| My 2 cents, anyway.
| koito17 wrote:
| The purpose of `unsafe` is for the compiler to assume a block
| of code is correct. SIMD intrinsics are marked as unsafe
| because they take raw pointers as arguments.
|
| In safe Rust (the default), memory access is validated by the
| borrow checker and type system. Rust's goal of soundness means
| safe Rust should never cause out-of-bounds access, use-after-
| free, etc; if it does, then there's a bug in the Rust compiler.
| no_wizard wrote:
| How do we know if Rust is safe unless Rust is written purely
| in safe Rust?
|
| Is that not true? Even validators have bugs or miss things
| no?
| steveklabnik wrote:
| > Even validators have bugs
|
| Yep! For example, https://github.com/Speykious/cve-rs is an
| example of a bug in the Rust compiler, which allows
| something that it shouldn't. It's on its way to being
| fixed.
|
| > or miss things no?
|
| This is the trickier part! Yes, even proofs have axioms,
| that is, things that are accepted without proof, that the
| rest of the proof is built on top of. If an axiom is
| incorrect, so is the proof, even though we've proven it.
| int_19h wrote:
| Out of curiosity, _why_ do they take raw pointers as
| arguments, rather than references?
| steveklabnik wrote:
| From the RFC: https://rust-lang.github.io/rfcs/2325-stable-
| simd.html
|
| > The standard library will not deviate in naming or type
| signature of any intrinsic defined by an architecture.
|
| I think this makes sense, just like any other intrinsic:
| unsafe to use directly, but with safe wrappers.
|
| I believe that there are also some SIMD things that would
| have to inherently take raw pointers, as they work on
| pointers that aren't aligned, and/or otherwise not valid
| for references. In theory you could make only those take
| raw pointers, but I think the blanket policy of "follow
| upstream" is more important.
| sesm wrote:
| Rust code emitter is Clang, the same one that Apple uses for C
| on their platforms. I wouldn't expect any miracles there, as
| Rust authors have zero influence over it. If any compiler is
| using any secret Clang magic, that would be Swift or
| Objective-C, since they are developed by Apple.
| nindalf wrote:
| You're conflating clang and LLVM.
| sesm wrote:
| Yes, you are right, should be 'code emitter is LLVM, the
| same that Clang uses for C'
| xxs wrote:
| oddly enough that's not the most optimal version of crc32, e.g.
| it's not an avx512 variant.
| Shorel wrote:
| Awesome find. This really means:
|
| Assembly language faster than C. And faster than Rust. Assembly
| can be very fast.
| bitwize wrote:
| You can use 'unsafe' blocks to delineate places on the hot path
| where you _need_ to take the limiters off, then trust that the
| rest of the code will be safe. In C, _all_ your code is unsafe.
|
| We will see more and more Rust libraries trounce their C
| counterparts in speed, because Rust is more fun to work in
| because of the above. Rust has democratized high-speed and
| concurrent systems programming. Projects in it will attract a
| larger, more diverse developer base -- developers who would be
| loath to touch a C code base for (very justified) fear of
| breaking something.
| dzaima wrote:
| Looks like as of 2 weeks ago the unsafe block should no longer
| be required: https://github.com/rust-lang/stdarch/pull/1714
|
| ..at least outside of loads/stores. From a bit of looking at
| the code though it seems like a good amount of those should be
| doable in a safe way with some abstractions.
| gf000 wrote:
| Rust's borrow checker still checks within unsafe blocks, so
| unless you are _only_ operating with raw pointers (and not
| accessing certain references as raw pointers in some small,
| well-defined blocks) across the whole program it will be
| significantly more safe than C. Especially given all the other
| language benefits, like a proper type system that can encode a
| bunch of invariants, no footguns at every line
| /initialization/cast, etc.
| acdha wrote:
| Yes. I think it's easy to underestimate how much the richer
| language and library ecosystem chip away at the attack
| surface area. So many past vulnerabilities have been in code
| which isn't dealing with low-level interfaces or weird
| performance optimizations and wouldn't need to use unsafe.
| There've been so many vulnerabilities in crypto code which
| weren't the encryption or hashing algorithms but things like
| x509/ASN parsing, logging, or the kind of option/error
| handling logic a Rust programmer would use the type system to
| validate.
| asveikau wrote:
| > At what point does it really stop mattering if this is C or
| Rust?
|
| If I read TFA correctly, they came up with a library that is
| API compatible with the C one, but they've measured to be
| faster.
|
| At that point I think in addition to safety benefits in other
| parts of the library (apart from unsafe micro optimizations as
| quoted), what they're leveraging is better compiler technology.
| Intuitively, I start to assume that the rust compiler can
| perhaps get away with more optimizations that might not be safe
| to assume in C.
| cb321 wrote:
| I think this _may_ not be a very high bar. zippy in Nim claims to
| be about 1.5x to 2.0x faster than zlib:
| https://github.com/guzba/zippy I think there are also faster
| zlib's around in C than the standard install one, such as
| https://github.com/ebiggers/libdeflate (EDIT: also mentioned
| elsethread https://news.ycombinator.com/item?id=43381768 by
| mananaysiempre)
|
| zlib itself seems pretty antiquated/outdated these days, but it
| does remain popular, even as a basis for newer parallel-friendly
| formats such as https://www.htslib.org/doc/bgzip.html
| hinkley wrote:
| Zlib is unapologetically written to be portable rather than
| fast. It is absolutely no wonder that a Rust implementation
| would be faster. It runs on a pathetically small number of
| systems by contrast. This is not a dig at Rust, it's an
| acknowledgement of how many systems exist out there, once you
| include embedded, automotive, aerospace, telecom, industrial
| control systems, and mainframes.
|
| Richard Hipp denounces claims that SQLite is the widest-used
| piece of code in the world and offers zlib as a candidate for
| that title, which I believe he is entirely correct about. I've
| been consciously using it for almost thirty years, and for a
| few years before that without knowing I was.
| lern_too_spel wrote:
| They're comparing against zlib-ng, not zlib. zlib-ng is more
| than twice as fast as zlib for decompression.
| https://github.com/zlib-ng/zlib-ng/discussions/871
|
| libdeflate is not zlib compatible. It doesn't support streaming
| decompression.
| mastax wrote:
| The benchmarks in the parent post are comparing to zlib-ng,
| which is substantially faster than zlib. The zippy claims are
| against "zlib found on a fresh Linux install" which at least
| for Debian is classic zlib.
| JoshTriplett wrote:
| The bar here is not zlib, it's zlib-ng, which aims primarily
| for performance.
|
| libdeflate is an impressive library, but it doesn't help if you
| need to stream data rather than having it all in memory at
| once.
| jrockway wrote:
| Chromium is kind of stuck with zlib because it's the algorithm
| that's in the standards, but if you're making your own protocol,
| you can do even better than this by picking a better algorithm.
| Zstandard is faster and compresses better. LZ4 is much faster,
| but not quite as small.
|
| Some reading:
| https://jolynch.github.io/posts/use_fast_data_algorithms/
|
| (As an aside, at my last job container pushes / pulls were in the
| development critical path for a lot of workflows. It turns out
| that sha256 and gzip are responsible for a lot of the time spent
| during container startup. Fortunately, Zstandard is allowed, and
| blake3 digests will be allowed soon.)
| jeffbee wrote:
| Yeah I just discovered this a few days ago. All the docker-era
| tools default to gzip but if using, say, bazel rules_oci
| instead of rules_docker you can turn on zstd for large speedups
| in push/pull time.
| amorio2341 wrote:
| Not surprised at all, Rust is the future.
| akagusu wrote:
| Bravo. Now Rust has its existence justified.
___________________________________________________________________
(page generated 2025-03-16 23:00 UTC)