[HN Gopher] Porting Libyaml to Safe Rust: Some Thoughts
___________________________________________________________________
Porting Libyaml to Safe Rust: Some Thoughts
Author : agluszak
Score : 89 points
Date : 2024-02-08 17:03 UTC (5 hours ago)
(HTM) web link (simonask.github.io)
(TXT) w3m dump (simonask.github.io)
| vlovich123 wrote:
| Great writeup. This part stuck out to me:
|
| > The justification for "unsafe" C and C++ libraries is often
| that some performance is left on the table with safe Rust,
| primarily due to bounds checking. On modern hardware this
| justification has always seemed a bit suspicious to me, and I
| think this is another piece of (circumstantial) evidence that it
| often just doesn't matter in practice
|
| That's too strong a statement even though the sentiment is
| correct. Most software will not see a bottleneck because of
| bounds checking because it's not in the fast path. All we can say
| in this case is that the work that libyaml was doing is not
| impacted by bounds checking. It's not because of hardware but
| because of software inefficiencies / problem being solved isn't
| impacted by bounds checking - most likely libyaml is not running
| at full speed on the underlying hardware and thus any bounds
| checking is just noise because there's so much slack already.
|
| However, I have definitely encountered situations in very
| specific circumstances where bounds checking is a bottleneck when
| you're trying to actually run at hardware speeds and unsafe is
| required to recover performance - of course I benchmarked very
| very carefully to the specific unsafe that's the problem. I'll
| also give a shoutout to the assume crate which I've found a much
| nicer mechanism to cause the compiler to elide bounds checking in
| many cases when building optimized while keeping it for debug
| builds & validating your assertion is correct. The annoying bit
| is that you can't ask the Rust compiler to elide all bounds check
| for a benchmark to validate the impact of the bounds checking.
|
| Remember - on modern hardware your CPU is capable of executing
| ~4-6 billion operations / s, process main memory at ~100GB/s, and
| access disk at millions of times / s at throughputs of ~1-2 GB/s.
| This is all consumer grade hardware. Typical software
| applications are not typically targeting to try to run at full
| utilization but instead try to offer reasonable performance with
| a more straightforward solution that can be supported as cheaply
| as possible maintenance wise because the bottleneck cost is
| typically the programmer's time rather than how fast the software
| can be made to run.
| TylerE wrote:
| Throughputs on modern hardware are insane. Latency/time-til-
| first-byte can often still be pretty sucky.
| vlovich123 wrote:
| Care to give an example? A disk read is comparatively slow to
| compute/memory but it's still in the tens or hundreds of
| microseconds if I recall correctly and memory is hundreds of
| nanoseconds. I am convinced latency issues are a remnant of
| how we build abstractions into software stacks and not a
| fault of the hardware.
| epage wrote:
| Unless this was updated after your post, I think it was covered
| in the following line:
|
| > It's certainly possible to build things where bounds checking
| makes a difference. This just isn't that.
|
| I will add that idiomatic Rust causes a good number of bounds
| checks to be optimized away. Its impressive the difference this
| can make. I maintain a parser combinator library that started
| as a fork. With some simple changes, I've seen massive
| performance improvements and my best guess is that those
| changes allowed bounds checks to be removed.
| norman784 wrote:
| Can you share what kind of changes, in your case, improved
| the performance?
| wredue wrote:
| Avoid directly indexing if possible. Use the collection
| functions and slices instead.
| epage wrote:
| I'd add careful use of inlining so the compiler can see
| both the bounds check and the index to know the bounds
| check can be removed.
| vlovich123 wrote:
| Possible I missed it when I read it. Would need the author to
| indicate whether or not it was edited.
|
| Yes, idiomatic Rust can cause a number of bounds checks to be
| elided when using iterators. Typically that's when you have
| all the data that you need to process up front. When you
| don't have that ability (e.g. data is being fed in piecemeal
| and you're copying into a temporary buffer), the bounds
| checking elision is something you need to do yourself.
| stouset wrote:
| I think the general takeaway is that for _most_ use cases,
| Rust's additional safety comes at barely any cost. In some
| cases the safety gains performance (in not checking thing that
| the compiler can assert statically), in some cases there's a
| hit, but it usually ends up somewhere in the middle.
|
| The takeaway is that Rust's safety and maintainability
| generally don't impose corresponding penalties. And in the
| cases they do and it's critical, you can always punt to unsafe.
| norir wrote:
| The paradox of bounds checking is that it both rarely matters
| and always matters.
| hedora wrote:
| I suspect that information about memory aliasing from the
| borrow checker (that things are not aliased) allows the
| compiler to add in many optimizations that C/C++ can't perform,
| and the benefit from that is similar to the cost of bounds
| checking.
|
| The memcpy vs. memmove thing comes to mind. So does stuff like
| iterating over the indices of a pointer to one std::vector
| while modifying another pointer to a std::vector. If you know
| you have exclusive access to the vector you are reading, then
| you know the size isn't changing, so you can just iterate from
| 0 to size and elide any bounds checking.
|
| With C++, if the pointers to the vectors could alias, then it
| will have to bounds check at runtime. Templates and Link time
| optimization helps with this sort of thing, but I'm guessing
| Rust still has an advantage.
| wyldfire wrote:
| > In order to compare the performance of unsafe-libyaml and the
| safe Rust port, I found a relatively large YAML document (~700
| KiB) and built a very simple benchmark using Criterion.
|
| Is criterion environment-aware enough to be able to purge/drop
| the filesystem cache? It might be interesting to see how the
| performance looks with a cold cache.
|
| > What these numbers do indicate is that the port to safe,
| idiomatic Rust does not produce meaningfully slower code.
|
| I think that's reasonable and I would even suggest that Rust's
| value proposition is somewhat predicated on that. A c2rust port
| like the baseline you started with is probably best as a working
| crutch from which someone can rewrite a bit here / a bit there.
| pkolaczk wrote:
| Criterion has a warmup phase. So I guess both measurements were
| using a hot cache.
| vlovich123 wrote:
| Disk I/O isn't relevant here since you're testing the
| performance of the parser, not the speed of reading the string
| from disk. I would expect a file this small to be read into a
| buffer up-front & the benchmark code just invokes the parse
| function on it. Indeed, the document is small enough, that I'd
| be worried about it hanging out in L3 cache the entire time
| which would overestimate the parsing performance which would
| typically see strings in random spots in memory being parsed.
| viraptor wrote:
| This makes me happy. I've just done a review of yaml parsers
| available from Ruby recently to improve the error reporting.
| Unfortunately libyaml is just bad there - many indentation issues
| at the end of the document will report errors in line 2 with a
| completely lost context. Unfortunately x2, there's no real
| alternative.
|
| I ended up parsing with libyaml and when something fails, parsing
| again using node and displaying errors through
| https://github.com/eemeli/yaml It's terrible but it works better.
| I hope to be able to use libyaml-safer in the future if it does
| improve on errors.
|
| Also, having found an asserting case for libyaml in the past, I'm
| not trusting it _that_ much.
| wredue wrote:
| I find that you can usually safely disregard the opinion of
| people who state "<thing> probably doesn't matter in practice."
|
| Measure yourselves. Maybe it does. Maybe it doesn't matter. It
| doesn't change that this "it probably doesn't matter in practice"
| sentiment is the primary reason a typical web request today is in
| the order of 20-30x slower than it should be.
|
| For this matter, if you're using Libyaml to parse one yaml file,
| you're probably fine. If you're AWS and use libyaml, you're
| definitely going to feel this change.
| edflsafoiewq wrote:
| There are an infinite number of ways to do anything, you can't
| try them all. Rules of thumb are necessary for making progress.
| autoexecbat wrote:
| If you're AWS you could probably put some FTEs on optimizing
| the new lib too
| jvanderbot wrote:
| > The interesting takeaway is this: For all of the different
| roles that yaml_string_t takes on, it turns out there is an
| abstraction in the Rust standard library that matches each use
| case precisely. At all points in the libyaml source code where
| strings are used, the use case maps directly to one of the
| several many ways of dealing with strings in Rust. Phew
|
| Reminder! Our forefathers knew and loved ownership concepts, but
| lacked to tools to do anything but mentally map them. We stand on
| the shoulders of giants.
| _benj wrote:
| > Mercifully, C does not have exceptions
|
| This made me smile, because while yes, C just segment faults. I
| think there's a way to handle segmentation fault? Or is it just
| the OS that kills the process when it tries to access an invalid
| memory address?
| topaz0 wrote:
| Yes, at least on linux, the os sends a signal (namely SIGSEGV)
| to the process, which the process may or may not handle. But
| kill also sends signals (including SIGSEGV if you specify), so
| the answer to your question is that it's not really either/or
| but both.
|
| (edit to add: note that it isn't trivial to handle seg fault in
| particular because unless you do some trickery you return to
| the same instruction that caused the segfault, and just get
| another segfault see e.g.:
| https://stackoverflow.com/questions/2663456/how-to-write-a-s...
| )
| AceJohnny2 wrote:
| Tangential C style survey... For code like this:
| if (some_function(some_memory) != ERROR) { // ...
| } else { goto cleanup; }
|
| This approaches what I dub "Happy Path Indentation Syndrome"
| (HPIS), where the "normal"/"happy path" functionality of the code
| happens under the `// ...` inside the passing scope.
|
| If you have several such functions that can each fail, this
| approach means your "happy path" follows the deepening scope /
| indentation.
|
| Instead, I much prefer styling it like this (assuming, obviously,
| that the "happy path" happens always): if
| (some_function(some_memory) == ERROR) { goto cleanup;
| } // ...
|
| How do people approach this kind of situation?
| AceJohnny2 wrote:
| > _If you have several such functions that can each fail, this
| approach means your "happy path" follows the deepening scope /
| indentation._
|
| In fact, I once had an interview candidate approach a simple
| string parsing problem by checking each char in a nested scope,
| happily going a dozen deep and not seeing a problem!
| whstl wrote:
| Your preferred version is often called "guard clause" when it's
| in the beginning of functions. Although there's nothing
| preventing them from being in the middle...
|
| I do agree that it's much better for readability.
___________________________________________________________________
(page generated 2024-02-08 23:01 UTC)