[HN Gopher] `noexcept` affects libstdc++'s `unordered_set`
       ___________________________________________________________________
        
       `noexcept` affects libstdc++'s `unordered_set`
        
       Author : signa11
       Score  : 60 points
       Date   : 2024-08-18 15:40 UTC (7 hours ago)
        
 (HTM) web link (quuxplusone.github.io)
 (TXT) w3m dump (quuxplusone.github.io)
        
       | formerly_proven wrote:
       | Money quote
       | 
       | > The outcome is that libstdc++'s unordered_set has performance
       | characteristics that subtly depend on the true name and
       | noexceptness of the hash function.
       | 
       | > - A user-defined struct H : std::hash<std::string> {} will see
       | smaller allocations and more calls to the hash function, than if
       | you had just used std::hash<std::string> directly. (Because
       | std::hash<std::string> is on the blacklist and H is not.)
       | 
       | > - A user-defined hasher with size_t operator() const noexcept
       | will see smaller allocations and more calls to the hash function
       | (especially during rehashing). One with size_t operator() const
       | will see larger allocations and fewer calls to the hash function.
        
       | z_open wrote:
       | somewhat unrelated but it's worth pointing out that noexcept is
       | more specific for move semantics.
       | 
       | In fact most c++ developers believe that throwing an exception in
       | a noexcept function is undefined behavior. It is not. The
       | behavior is defined to call std::terminate. Which would lead one
       | to ask how does it know to call that. Because noexcept functions
       | have a hidden try catch to see if it should call it. The result
       | is that noexcept can hurt performance, which is surprising
       | behavior. C++ is just complicated.
        
         | colejohnson66 wrote:
         | I'm not well versed in C++'s exception system, but why can't
         | the unwind system itself call std::terminate? Why does it need
         | to be the annotated method (that unwinding returns to)?
        
           | epcoa wrote:
           | Because the exception can't be allowed to escape the function
           | marked noexcept. No matter the actual implementation, the
           | exception has to be effectively caught in that no except
           | function.
        
             | quotemstr wrote:
             | Catching is a table lookup in modern systems. I've yet to
             | see a measurable happy path slowdown caused by adding
             | noexcept.
        
               | AshamedCaptain wrote:
               | Well TFA explains one.
               | 
               | I also find it difficult to conceive of a case where
               | adding noexcept would lead to slower/longer code, other
               | than arbitrary noexcept overloads such as TFA.
        
           | layer8 wrote:
           | It doesn't need to be, but the annotated function can still
           | miss optimization opportunities, because it must be compiled
           | _as if_ it had a try-catch block if the compiler can't prove
           | the absence of exceptions, and this places constraints on
           | reordering and inlining.
           | 
           | On the other hand, the guarantee given by _noexcept_ can
           | enable optimizations in the caller.
        
             | AshamedCaptain wrote:
             | A try { } catch block that calls terminate has no overhead.
             | Normally the constraints on reordering are because e.g.
             | constructor/destructor semantics and other side effects
             | need to be accurately preserved during unwinding, but here
             | any exception is going to result on a call to terminate,
             | and (auto) destructors are not going to run.
             | 
             | This was the entire point of noexcept versus the throw()
             | specifier...
        
               | ack_complete wrote:
               | This is unfortunately not always true, even with a table-
               | based unwinder. In order to detect the noexcept frame and
               | call terminate(), the unwinder must be able to see the
               | stack frame that has noexcept. This means that the
               | compiler must suppress tail call optimization when a
               | noexcept function tail calls a non-noexcept function.
        
               | compiler-guy wrote:
               | It also needs an entry in the exception table, and more
               | if the cold path is moved out of line. So at the very
               | least there are code size issues.
        
           | klyrs wrote:
           | Way more history than you asked for about how we got into
           | this situation with noexcept:
           | 
           | Up to 2011:
           | 
           | https://akrzemi1.wordpress.com/2011/06/10/using-noexcept/
           | 
           | As of '17:
           | 
           | https://devblogs.microsoft.com/oldnewthing/20180928-00/?p=99.
           | ..
           | 
           | (I think this is the final word; I don't see any changes to
           | exceptions in high-level release notes for c++20 and c++23 --
           | did I miss anything?)
        
           | leni536 wrote:
           | It can, and it does on decent runtimes.
        
         | quotemstr wrote:
         | Yeah. Throwing from a noexcept function is often a better
         | abort() than abort() itself because the std::terminate
         | machinery will print information about whatever caused the
         | termination, whereas abort will just SIGABRT.
        
         | flamedoge wrote:
         | what.. noexcept throws exception..? what kind of infinite
         | wisdom led to this
        
           | pavlov wrote:
           | Noexcept terminates if an exception is thrown within. That's
           | a very C++ way to ensure that the exception can't propagate.
        
           | api wrote:
           | There's a reason both Go and Rust eschew exceptions. They're
           | something that superficially seemed like a great idea but
           | that in practice complicate things by creating two exit paths
           | for every function. They also don't play nice with any form
           | of async or dynamic programming.
           | 
           | C++ should never have had them, but we have sane clean C++
           | now. It's called Rust.
        
             | forrestthewoods wrote:
             | Rust panics are basically exceptions, aren't they?
             | Typically they aren't caught without terminating. But you
             | totally can. And if you're writing a service that runs in
             | the background you'll probably have to.
        
               | wizzwizz4 wrote:
               | Catching panics is best-effort only. In general, Rust
               | panics can't be caught. (Even if a program is compiled
               | with panic=unwind, this can change to abort at run-time.)
        
               | forrestthewoods wrote:
               | I don't think that's correct. Panics can be configured to
               | abort instead of unwind. But if panic != abort then
               | catching should be reliable.
               | 
               | https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
        
               | bonzini wrote:
               | Rust Result is basically a checked exception. Java makes
               | you choose between adding an exception to "throws" or
               | catching it, Rust makes you choose between declaring your
               | function as returning Result or checking if you got an
               | Err.
               | 
               | The only difference is that Rust has better syntactic
               | sugar for the latter, but Result is really isomorphic to
               | Java checked exceptions.
               | 
               | Panic could be said to be the same as an unchecked
               | exception, except you have a lot more control on what
               | causes them. The panic you get from calling unwrap() on
               | an Option is the same as a NullPointerException, but you
               | have full control on which points of the program that can
               | generate it.
        
             | bbarnett wrote:
             | Come back and explain to me how wonderfully perfect rust
             | is, after it is as old as C variants. Let's say 2040 at the
             | earliest, maybe 2050.
             | 
             | Legacy will affect rust too. It's not better, just younger.
             | 
             | I tend to think of coding languages as laws passed by
             | parliaments. A lot of legacy laws, and regulations hang
             | around pver time.
        
               | tialaramex wrote:
               | It is better. That doesn't make it _perfect_ but it 's
               | better.
               | 
               | This is to be expected, in fact Rust has to be a _lot_
               | better to even make a showing, because C is the
               | "default" in some sense, you can't just be similarly
               | good, you have to be significantly better for people to
               | even notice.
               | 
               | I expect that long before 2050 there will be other, even
               | better languages, which learn from not only the mistakes
               | Rust learned from, but the mistakes in Rust, and in other
               | languages from this period.
               | 
               | Take Editions. C was never able to figure out a way to
               | add keywords. Simple idea, but it couldn't be done. They
               | had to be kludged as magic with an underscore prefix to
               | take advantage of an existing requirement in the language
               | design, in C++ they decided to take the compatibility hit
               | and invalidate all code using the to-be-reserved words.
               | But in Rust they were able to add several new keywords,
               | no trouble at all, because they'd thought about this and
               | designed the language accordingly. That's what Editions
               | did for them. You can expect future innovation along that
               | dimension in future languages.
        
             | astral303 wrote:
             | If you find that exception-free code that is necessarily
             | littered with exit value checks at every call, which
             | discourages refactoring and adds massive noise, then you
             | can call the decisions to eschew exceptions as "sane" and
             | "clean", but I find the resulting code to be neither. Or
             | practically speaking, exit codes will often not be checked
             | at all, or an error check will be omitted by mistake,
             | thereby interrupting the entire intended error handling
             | chain. Somehow that qualifies as a better outcome or more
             | sane? Perhaps it is for a code base that is not changing
             | (write-once) or is not expected to change (i.e. throwaway
             | code written in "true microservice" style).
        
             | sobellian wrote:
             | IMO the pendulum swung too far with Rust. The experience is
             | better than C++, but the template generics system is not
             | very powerful. They ostensibly made up for this with
             | macros, but (a) they're annoying to write and (b) they're
             | _really_ annoying to read. For these reasons Rust is much
             | safer than C++ but has difficulty providing fluent
             | interfaces like Eigen. There are libraries that try, but
             | AFAICT none match Eigen 's ability to eliminate unnecessary
             | allocation for subexpressions and equivalently performing
             | Rust code looks much more imperative.
        
           | chipdart wrote:
           | > what.. noexcept throws exception..? what kind of infinite
           | wisdom led to this
           | 
           | Not wisdom at all, just a very basic and predictable
           | failsafe.
           | 
           | If a function that is declared to not throw any exception
           | happens to throw one, the runtime handles that scenario as an
           | eggregious violation of its contract. Consequently, as it
           | does with any malformed code, it terminates the application.
        
         | chipdart wrote:
         | > (...) throwing an exception in a noexcept function is
         | undefined behavior.
         | 
         | Small but critical correction: noexcept functions can throw
         | exceptions. What they cannot do is allow exceptions to bubble
         | up to function invokers. This means that it is trivial to
         | define noexcept functions: just add a try-catch block that
         | catches all exceptions.
        
           | tredre3 wrote:
           | I suppose you are technically correct that noexcept can throw
           | to themselves. But that's just being pedantic, isn't it? From
           | the observer/caller point of view the function won't ever
           | throw. It will always return (or abort).
        
           | fluoridation wrote:
           | Hmm... If you were reading the documentation for function
           | foo() and it read "if the argument is negative, foo() throws
           | an exception", would you understand that the function throws
           | an exception and catches it internally before doing something
           | else, or that it throws an exception that the caller must
           | catch?
        
         | yosefk wrote:
         | -fno-exceptions FTW. (Of course you can't always do this. Great
         | to do this early on before it's too late and the code is full
         | of exceptions IMO.)
        
           | ryandrake wrote:
           | -fno-exceptions doesn't get rid of exceptions, it just causes
           | your program to abort when one is thrown, which sounds--kind
           | of worse? How do you deal with (for example) a constructor
           | that fails? And if you're using the Standard Library, it is
           | very difficult to avoid code that might throw an exception.
        
             | rmholt wrote:
             | You avoid any but trivial constructors an place all
             | failable logic in a separate init()
        
               | MauranKilom wrote:
               | In other words, you introduce an invalid state to every
               | object and make constructing objects a lot more
               | cumbersome. The first is the exact opposite of the (imo
               | highly desirable) "make invalid states unrepresentable"
               | principle, and the second is also a pretty extreme cost
               | to productivity. I wouldn't say this is never worth it,
               | but it's a _very_ high price to pay.
        
               | rmholt wrote:
               | I know it sucks but such is the price of C++
               | 
               | I'm not saying it's perfect but it's better than dealing
               | with C++ exceptions. At least with error codes you can
               | manually do some printing to find out what went wrong.
               | With C++ exceptions you don't even get a line number or a
               | stack trace.
        
             | leni536 wrote:
             | > How do you deal with (for example) a constructor that
             | fails?
             | 
             | The usual alternative is to have a factory function that
             | returns std::optional<T> or std::expected<T, Err>. This
             | avoids two-stage init, but has other tradeoffs.
        
         | mgaunard wrote:
         | I believe the original proposal for noexcept was that throwing
         | was UB, and the original author believed strongly it should
         | have been that way.
         | 
         | In the end what got into the standard is that it is well-
         | defined: it calls std::terminate. If I recall that change
         | wasn't made without drama.
        
       | quotemstr wrote:
       | If you care about performance, you should consider using Abseil's
       | hash tables instead of unordered_set and unordered_map. The
       | Abseil containers are mostly drop in replacements for the
       | standard unordered containers, except they have a tweaked API
       | (e.g. erase() returning void) that admits important performance
       | optimizations and (except the node based ones) use open addressed
       | hash tables instead of chaining (i.e. each hash bucket is no
       | longer a linked list). You end up with 2x-3x speedups in a lot of
       | scenarios. The standard containers can't be fixed to match the
       | performance of the Abseil ones because 1) the specification API
       | requires pessimization, 2) Linux distributors are reluctant to
       | break C++ "ABI" (such as it is).
       | 
       | I mean, of course you should follow this article's advice on
       | noexcept, but there's a whole world of associative container
       | performance optimizations out there.
        
         | jeffbee wrote:
         | They are good, but I note that there are now hash tables in
         | Boost that may be even better.
         | 
         | https://www.boost.org/doc/libs/develop/libs/unordered/doc/ht...
        
           | spacechild1 wrote:
           | Thanks a lot! I have been looking for fast hashtables and I
           | didn't know about Boost.Unordered. They even have concurrent
           | (lockfree) versions!
        
         | forrestthewoods wrote:
         | Adding libraries like Abseil, Folly, or Boost is kind of a huge
         | pain in the ass. They're monstrously huge libraries with a
         | kajillion headers.
         | 
         | I recently tried to compile Folly on Windows and gave up. Maybe
         | Abseil is better? I'm not sure. But it's still a HUGE
         | dependency to take on.
        
           | leni536 wrote:
           | Use conan or vcpkg.
        
             | binary132 wrote:
             | Dependencies giving you trouble? Try adding a dependency to
             | handle them!
        
               | leni536 wrote:
               | Adding one dependency to your toolchain which in turn
               | helps you with dealing with an open-ended set of 3rd
               | party dependencies is a good deal.
        
         | jcalvinowens wrote:
         | I 100% agree with the other comment: Abseil and Folly are both
         | broken monstrosities by design... the build system garbage is
         | just one reason almost nobody uses them outside facebook and
         | google.
         | 
         | The chaining hashtable in liburcu is truly lock free, based on
         | a real build system, and in my experience outperforms
         | everything facebook and google have ever published:
         | https://github.com/urcu/userspace-rcu
        
       | leni536 wrote:
       | _Also, I hope that if you're reading this post in a year or two
       | (say, December 2025), these specific examples won't even
       | reproduce anymore. I hope libstdc++ gets on the ball and
       | eliminates some of this weirdness. (In short: They should get rid
       | of the blacklist; pay the 8 bytes by default; introduce a
       | whitelist for trivial hashers specifically; stop checking
       | noexceptness for any reason.)_
       | 
       | I think this would be an ABI break.
        
       | mgaunard wrote:
       | if you want a fast hash map, just use a library that is built
       | specifically around a particular implementation strategy.
       | 
       | std unordered map is slow by design for several reasons.
       | 
       | Things of interest are:
       | 
       | - open addressing method: linear, quadratic, robinhood, hybrid
       | 
       | - the default hash function
       | 
       | - the logic for the number of buckets (power of two, prime) and
       | associated hash projection (fibonacci, modulo).
       | 
       | Boost seems to be doing a decent job at this now.
        
       ___________________________________________________________________
       (page generated 2024-08-18 23:00 UTC)