[HN Gopher] Porting Libyaml to Safe Rust: Some Thoughts
       ___________________________________________________________________
        
       Porting Libyaml to Safe Rust: Some Thoughts
        
       Author : agluszak
       Score  : 89 points
       Date   : 2024-02-08 17:03 UTC (5 hours ago)
        
 (HTM) web link (simonask.github.io)
 (TXT) w3m dump (simonask.github.io)
        
       | vlovich123 wrote:
       | Great writeup. This part stuck out to me:
       | 
       | > The justification for "unsafe" C and C++ libraries is often
       | that some performance is left on the table with safe Rust,
       | primarily due to bounds checking. On modern hardware this
       | justification has always seemed a bit suspicious to me, and I
       | think this is another piece of (circumstantial) evidence that it
       | often just doesn't matter in practice
       | 
       | That's too strong a statement even though the sentiment is
       | correct. Most software will not see a bottleneck because of
       | bounds checking because it's not in the fast path. All we can say
       | in this case is that the work that libyaml was doing is not
       | impacted by bounds checking. It's not because of hardware but
       | because of software inefficiencies / problem being solved isn't
       | impacted by bounds checking - most likely libyaml is not running
       | at full speed on the underlying hardware and thus any bounds
       | checking is just noise because there's so much slack already.
       | 
       | However, I have definitely encountered situations in very
       | specific circumstances where bounds checking is a bottleneck when
       | you're trying to actually run at hardware speeds and unsafe is
       | required to recover performance - of course I benchmarked very
       | very carefully to the specific unsafe that's the problem. I'll
       | also give a shoutout to the assume crate which I've found a much
       | nicer mechanism to cause the compiler to elide bounds checking in
       | many cases when building optimized while keeping it for debug
       | builds & validating your assertion is correct. The annoying bit
       | is that you can't ask the Rust compiler to elide all bounds check
       | for a benchmark to validate the impact of the bounds checking.
       | 
       | Remember - on modern hardware your CPU is capable of executing
       | ~4-6 billion operations / s, process main memory at ~100GB/s, and
       | access disk at millions of times / s at throughputs of ~1-2 GB/s.
       | This is all consumer grade hardware. Typical software
       | applications are not typically targeting to try to run at full
       | utilization but instead try to offer reasonable performance with
       | a more straightforward solution that can be supported as cheaply
       | as possible maintenance wise because the bottleneck cost is
       | typically the programmer's time rather than how fast the software
       | can be made to run.
        
         | TylerE wrote:
         | Throughputs on modern hardware are insane. Latency/time-til-
         | first-byte can often still be pretty sucky.
        
           | vlovich123 wrote:
           | Care to give an example? A disk read is comparatively slow to
           | compute/memory but it's still in the tens or hundreds of
           | microseconds if I recall correctly and memory is hundreds of
           | nanoseconds. I am convinced latency issues are a remnant of
           | how we build abstractions into software stacks and not a
           | fault of the hardware.
        
         | epage wrote:
         | Unless this was updated after your post, I think it was covered
         | in the following line:
         | 
         | > It's certainly possible to build things where bounds checking
         | makes a difference. This just isn't that.
         | 
         | I will add that idiomatic Rust causes a good number of bounds
         | checks to be optimized away. Its impressive the difference this
         | can make. I maintain a parser combinator library that started
         | as a fork. With some simple changes, I've seen massive
         | performance improvements and my best guess is that those
         | changes allowed bounds checks to be removed.
        
           | norman784 wrote:
           | Can you share what kind of changes, in your case, improved
           | the performance?
        
             | wredue wrote:
             | Avoid directly indexing if possible. Use the collection
             | functions and slices instead.
        
               | epage wrote:
               | I'd add careful use of inlining so the compiler can see
               | both the bounds check and the index to know the bounds
               | check can be removed.
        
           | vlovich123 wrote:
           | Possible I missed it when I read it. Would need the author to
           | indicate whether or not it was edited.
           | 
           | Yes, idiomatic Rust can cause a number of bounds checks to be
           | elided when using iterators. Typically that's when you have
           | all the data that you need to process up front. When you
           | don't have that ability (e.g. data is being fed in piecemeal
           | and you're copying into a temporary buffer), the bounds
           | checking elision is something you need to do yourself.
        
         | stouset wrote:
         | I think the general takeaway is that for _most_ use cases,
         | Rust's additional safety comes at barely any cost. In some
         | cases the safety gains performance (in not checking thing that
         | the compiler can assert statically), in some cases there's a
         | hit, but it usually ends up somewhere in the middle.
         | 
         | The takeaway is that Rust's safety and maintainability
         | generally don't impose corresponding penalties. And in the
         | cases they do and it's critical, you can always punt to unsafe.
        
         | norir wrote:
         | The paradox of bounds checking is that it both rarely matters
         | and always matters.
        
         | hedora wrote:
         | I suspect that information about memory aliasing from the
         | borrow checker (that things are not aliased) allows the
         | compiler to add in many optimizations that C/C++ can't perform,
         | and the benefit from that is similar to the cost of bounds
         | checking.
         | 
         | The memcpy vs. memmove thing comes to mind. So does stuff like
         | iterating over the indices of a pointer to one std::vector
         | while modifying another pointer to a std::vector. If you know
         | you have exclusive access to the vector you are reading, then
         | you know the size isn't changing, so you can just iterate from
         | 0 to size and elide any bounds checking.
         | 
         | With C++, if the pointers to the vectors could alias, then it
         | will have to bounds check at runtime. Templates and Link time
         | optimization helps with this sort of thing, but I'm guessing
         | Rust still has an advantage.
        
       | wyldfire wrote:
       | > In order to compare the performance of unsafe-libyaml and the
       | safe Rust port, I found a relatively large YAML document (~700
       | KiB) and built a very simple benchmark using Criterion.
       | 
       | Is criterion environment-aware enough to be able to purge/drop
       | the filesystem cache? It might be interesting to see how the
       | performance looks with a cold cache.
       | 
       | > What these numbers do indicate is that the port to safe,
       | idiomatic Rust does not produce meaningfully slower code.
       | 
       | I think that's reasonable and I would even suggest that Rust's
       | value proposition is somewhat predicated on that. A c2rust port
       | like the baseline you started with is probably best as a working
       | crutch from which someone can rewrite a bit here / a bit there.
        
         | pkolaczk wrote:
         | Criterion has a warmup phase. So I guess both measurements were
         | using a hot cache.
        
         | vlovich123 wrote:
         | Disk I/O isn't relevant here since you're testing the
         | performance of the parser, not the speed of reading the string
         | from disk. I would expect a file this small to be read into a
         | buffer up-front & the benchmark code just invokes the parse
         | function on it. Indeed, the document is small enough, that I'd
         | be worried about it hanging out in L3 cache the entire time
         | which would overestimate the parsing performance which would
         | typically see strings in random spots in memory being parsed.
        
       | viraptor wrote:
       | This makes me happy. I've just done a review of yaml parsers
       | available from Ruby recently to improve the error reporting.
       | Unfortunately libyaml is just bad there - many indentation issues
       | at the end of the document will report errors in line 2 with a
       | completely lost context. Unfortunately x2, there's no real
       | alternative.
       | 
       | I ended up parsing with libyaml and when something fails, parsing
       | again using node and displaying errors through
       | https://github.com/eemeli/yaml It's terrible but it works better.
       | I hope to be able to use libyaml-safer in the future if it does
       | improve on errors.
       | 
       | Also, having found an asserting case for libyaml in the past, I'm
       | not trusting it _that_ much.
        
       | wredue wrote:
       | I find that you can usually safely disregard the opinion of
       | people who state "<thing> probably doesn't matter in practice."
       | 
       | Measure yourselves. Maybe it does. Maybe it doesn't matter. It
       | doesn't change that this "it probably doesn't matter in practice"
       | sentiment is the primary reason a typical web request today is in
       | the order of 20-30x slower than it should be.
       | 
       | For this matter, if you're using Libyaml to parse one yaml file,
       | you're probably fine. If you're AWS and use libyaml, you're
       | definitely going to feel this change.
        
         | edflsafoiewq wrote:
         | There are an infinite number of ways to do anything, you can't
         | try them all. Rules of thumb are necessary for making progress.
        
         | autoexecbat wrote:
         | If you're AWS you could probably put some FTEs on optimizing
         | the new lib too
        
       | jvanderbot wrote:
       | > The interesting takeaway is this: For all of the different
       | roles that yaml_string_t takes on, it turns out there is an
       | abstraction in the Rust standard library that matches each use
       | case precisely. At all points in the libyaml source code where
       | strings are used, the use case maps directly to one of the
       | several many ways of dealing with strings in Rust. Phew
       | 
       | Reminder! Our forefathers knew and loved ownership concepts, but
       | lacked to tools to do anything but mentally map them. We stand on
       | the shoulders of giants.
        
       | _benj wrote:
       | > Mercifully, C does not have exceptions
       | 
       | This made me smile, because while yes, C just segment faults. I
       | think there's a way to handle segmentation fault? Or is it just
       | the OS that kills the process when it tries to access an invalid
       | memory address?
        
         | topaz0 wrote:
         | Yes, at least on linux, the os sends a signal (namely SIGSEGV)
         | to the process, which the process may or may not handle. But
         | kill also sends signals (including SIGSEGV if you specify), so
         | the answer to your question is that it's not really either/or
         | but both.
         | 
         | (edit to add: note that it isn't trivial to handle seg fault in
         | particular because unless you do some trickery you return to
         | the same instruction that caused the segfault, and just get
         | another segfault see e.g.:
         | https://stackoverflow.com/questions/2663456/how-to-write-a-s...
         | )
        
       | AceJohnny2 wrote:
       | Tangential C style survey... For code like this:
       | if (some_function(some_memory) != ERROR) {             // ...
       | } else {             goto cleanup;         }
       | 
       | This approaches what I dub "Happy Path Indentation Syndrome"
       | (HPIS), where the "normal"/"happy path" functionality of the code
       | happens under the `// ...` inside the passing scope.
       | 
       | If you have several such functions that can each fail, this
       | approach means your "happy path" follows the deepening scope /
       | indentation.
       | 
       | Instead, I much prefer styling it like this (assuming, obviously,
       | that the "happy path" happens always):                   if
       | (some_function(some_memory) == ERROR) {             goto cleanup;
       | }         // ...
       | 
       | How do people approach this kind of situation?
        
         | AceJohnny2 wrote:
         | > _If you have several such functions that can each fail, this
         | approach means your "happy path" follows the deepening scope /
         | indentation._
         | 
         | In fact, I once had an interview candidate approach a simple
         | string parsing problem by checking each char in a nested scope,
         | happily going a dozen deep and not seeing a problem!
        
         | whstl wrote:
         | Your preferred version is often called "guard clause" when it's
         | in the beginning of functions. Although there's nothing
         | preventing them from being in the middle...
         | 
         | I do agree that it's much better for readability.
        
       ___________________________________________________________________
       (page generated 2024-02-08 23:01 UTC)