[HN Gopher] `noexcept` affects libstdc++'s `unordered_set`
___________________________________________________________________
`noexcept` affects libstdc++'s `unordered_set`
Author : signa11
Score : 60 points
Date : 2024-08-18 15:40 UTC (7 hours ago)
(HTM) web link (quuxplusone.github.io)
(TXT) w3m dump (quuxplusone.github.io)
| formerly_proven wrote:
| Money quote
|
| > The outcome is that libstdc++'s unordered_set has performance
| characteristics that subtly depend on the true name and
| noexceptness of the hash function.
|
| > - A user-defined struct H : std::hash<std::string> {} will see
| smaller allocations and more calls to the hash function, than if
| you had just used std::hash<std::string> directly. (Because
| std::hash<std::string> is on the blacklist and H is not.)
|
| > - A user-defined hasher with size_t operator() const noexcept
| will see smaller allocations and more calls to the hash function
| (especially during rehashing). One with size_t operator() const
| will see larger allocations and fewer calls to the hash function.
| z_open wrote:
| somewhat unrelated but it's worth pointing out that noexcept is
| more specific for move semantics.
|
| In fact most c++ developers believe that throwing an exception in
| a noexcept function is undefined behavior. It is not. The
| behavior is defined to call std::terminate. Which would lead one
| to ask how does it know to call that. Because noexcept functions
| have a hidden try catch to see if it should call it. The result
| is that noexcept can hurt performance, which is surprising
| behavior. C++ is just complicated.
| colejohnson66 wrote:
| I'm not well versed in C++'s exception system, but why can't
| the unwind system itself call std::terminate? Why does it need
| to be the annotated method (that unwinding returns to)?
| epcoa wrote:
| Because the exception can't be allowed to escape the function
| marked noexcept. No matter the actual implementation, the
| exception has to be effectively caught in that no except
| function.
| quotemstr wrote:
| Catching is a table lookup in modern systems. I've yet to
| see a measurable happy path slowdown caused by adding
| noexcept.
| AshamedCaptain wrote:
| Well TFA explains one.
|
| I also find it difficult to conceive of a case where
| adding noexcept would lead to slower/longer code, other
| than arbitrary noexcept overloads such as TFA.
| layer8 wrote:
| It doesn't need to be, but the annotated function can still
| miss optimization opportunities, because it must be compiled
| _as if_ it had a try-catch block if the compiler can't prove
| the absence of exceptions, and this places constraints on
| reordering and inlining.
|
| On the other hand, the guarantee given by _noexcept_ can
| enable optimizations in the caller.
| AshamedCaptain wrote:
| A try { } catch block that calls terminate has no overhead.
| Normally the constraints on reordering are because e.g.
| constructor/destructor semantics and other side effects
| need to be accurately preserved during unwinding, but here
| any exception is going to result on a call to terminate,
| and (auto) destructors are not going to run.
|
| This was the entire point of noexcept versus the throw()
| specifier...
| ack_complete wrote:
| This is unfortunately not always true, even with a table-
| based unwinder. In order to detect the noexcept frame and
| call terminate(), the unwinder must be able to see the
| stack frame that has noexcept. This means that the
| compiler must suppress tail call optimization when a
| noexcept function tail calls a non-noexcept function.
| compiler-guy wrote:
| It also needs an entry in the exception table, and more
| if the cold path is moved out of line. So at the very
| least there are code size issues.
| klyrs wrote:
| Way more history than you asked for about how we got into
| this situation with noexcept:
|
| Up to 2011:
|
| https://akrzemi1.wordpress.com/2011/06/10/using-noexcept/
|
| As of '17:
|
| https://devblogs.microsoft.com/oldnewthing/20180928-00/?p=99.
| ..
|
| (I think this is the final word; I don't see any changes to
| exceptions in high-level release notes for c++20 and c++23 --
| did I miss anything?)
| leni536 wrote:
| It can, and it does on decent runtimes.
| quotemstr wrote:
| Yeah. Throwing from a noexcept function is often a better
| abort() than abort() itself because the std::terminate
| machinery will print information about whatever caused the
| termination, whereas abort will just SIGABRT.
| flamedoge wrote:
| what.. noexcept throws exception..? what kind of infinite
| wisdom led to this
| pavlov wrote:
| Noexcept terminates if an exception is thrown within. That's
| a very C++ way to ensure that the exception can't propagate.
| api wrote:
| There's a reason both Go and Rust eschew exceptions. They're
| something that superficially seemed like a great idea but
| that in practice complicate things by creating two exit paths
| for every function. They also don't play nice with any form
| of async or dynamic programming.
|
| C++ should never have had them, but we have sane clean C++
| now. It's called Rust.
| forrestthewoods wrote:
| Rust panics are basically exceptions, aren't they?
| Typically they aren't caught without terminating. But you
| totally can. And if you're writing a service that runs in
| the background you'll probably have to.
| wizzwizz4 wrote:
| Catching panics is best-effort only. In general, Rust
| panics can't be caught. (Even if a program is compiled
| with panic=unwind, this can change to abort at run-time.)
| forrestthewoods wrote:
| I don't think that's correct. Panics can be configured to
| abort instead of unwind. But if panic != abort then
| catching should be reliable.
|
| https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
| bonzini wrote:
| Rust Result is basically a checked exception. Java makes
| you choose between adding an exception to "throws" or
| catching it, Rust makes you choose between declaring your
| function as returning Result or checking if you got an
| Err.
|
| The only difference is that Rust has better syntactic
| sugar for the latter, but Result is really isomorphic to
| Java checked exceptions.
|
| Panic could be said to be the same as an unchecked
| exception, except you have a lot more control on what
| causes them. The panic you get from calling unwrap() on
| an Option is the same as a NullPointerException, but you
| have full control on which points of the program that can
| generate it.
| bbarnett wrote:
| Come back and explain to me how wonderfully perfect rust
| is, after it is as old as C variants. Let's say 2040 at the
| earliest, maybe 2050.
|
| Legacy will affect rust too. It's not better, just younger.
|
| I tend to think of coding languages as laws passed by
| parliaments. A lot of legacy laws, and regulations hang
| around pver time.
| tialaramex wrote:
| It is better. That doesn't make it _perfect_ but it 's
| better.
|
| This is to be expected, in fact Rust has to be a _lot_
| better to even make a showing, because C is the
| "default" in some sense, you can't just be similarly
| good, you have to be significantly better for people to
| even notice.
|
| I expect that long before 2050 there will be other, even
| better languages, which learn from not only the mistakes
| Rust learned from, but the mistakes in Rust, and in other
| languages from this period.
|
| Take Editions. C was never able to figure out a way to
| add keywords. Simple idea, but it couldn't be done. They
| had to be kludged as magic with an underscore prefix to
| take advantage of an existing requirement in the language
| design, in C++ they decided to take the compatibility hit
| and invalidate all code using the to-be-reserved words.
| But in Rust they were able to add several new keywords,
| no trouble at all, because they'd thought about this and
| designed the language accordingly. That's what Editions
| did for them. You can expect future innovation along that
| dimension in future languages.
| astral303 wrote:
| If you find that exception-free code that is necessarily
| littered with exit value checks at every call, which
| discourages refactoring and adds massive noise, then you
| can call the decisions to eschew exceptions as "sane" and
| "clean", but I find the resulting code to be neither. Or
| practically speaking, exit codes will often not be checked
| at all, or an error check will be omitted by mistake,
| thereby interrupting the entire intended error handling
| chain. Somehow that qualifies as a better outcome or more
| sane? Perhaps it is for a code base that is not changing
| (write-once) or is not expected to change (i.e. throwaway
| code written in "true microservice" style).
| sobellian wrote:
| IMO the pendulum swung too far with Rust. The experience is
| better than C++, but the template generics system is not
| very powerful. They ostensibly made up for this with
| macros, but (a) they're annoying to write and (b) they're
| _really_ annoying to read. For these reasons Rust is much
| safer than C++ but has difficulty providing fluent
| interfaces like Eigen. There are libraries that try, but
| AFAICT none match Eigen 's ability to eliminate unnecessary
| allocation for subexpressions and equivalently performing
| Rust code looks much more imperative.
| chipdart wrote:
| > what.. noexcept throws exception..? what kind of infinite
| wisdom led to this
|
| Not wisdom at all, just a very basic and predictable
| failsafe.
|
| If a function that is declared to not throw any exception
| happens to throw one, the runtime handles that scenario as an
| eggregious violation of its contract. Consequently, as it
| does with any malformed code, it terminates the application.
| chipdart wrote:
| > (...) throwing an exception in a noexcept function is
| undefined behavior.
|
| Small but critical correction: noexcept functions can throw
| exceptions. What they cannot do is allow exceptions to bubble
| up to function invokers. This means that it is trivial to
| define noexcept functions: just add a try-catch block that
| catches all exceptions.
| tredre3 wrote:
| I suppose you are technically correct that noexcept can throw
| to themselves. But that's just being pedantic, isn't it? From
| the observer/caller point of view the function won't ever
| throw. It will always return (or abort).
| fluoridation wrote:
| Hmm... If you were reading the documentation for function
| foo() and it read "if the argument is negative, foo() throws
| an exception", would you understand that the function throws
| an exception and catches it internally before doing something
| else, or that it throws an exception that the caller must
| catch?
| yosefk wrote:
| -fno-exceptions FTW. (Of course you can't always do this. Great
| to do this early on before it's too late and the code is full
| of exceptions IMO.)
| ryandrake wrote:
| -fno-exceptions doesn't get rid of exceptions, it just causes
| your program to abort when one is thrown, which sounds--kind
| of worse? How do you deal with (for example) a constructor
| that fails? And if you're using the Standard Library, it is
| very difficult to avoid code that might throw an exception.
| rmholt wrote:
| You avoid any but trivial constructors an place all
| failable logic in a separate init()
| MauranKilom wrote:
| In other words, you introduce an invalid state to every
| object and make constructing objects a lot more
| cumbersome. The first is the exact opposite of the (imo
| highly desirable) "make invalid states unrepresentable"
| principle, and the second is also a pretty extreme cost
| to productivity. I wouldn't say this is never worth it,
| but it's a _very_ high price to pay.
| rmholt wrote:
| I know it sucks but such is the price of C++
|
| I'm not saying it's perfect but it's better than dealing
| with C++ exceptions. At least with error codes you can
| manually do some printing to find out what went wrong.
| With C++ exceptions you don't even get a line number or a
| stack trace.
| leni536 wrote:
| > How do you deal with (for example) a constructor that
| fails?
|
| The usual alternative is to have a factory function that
| returns std::optional<T> or std::expected<T, Err>. This
| avoids two-stage init, but has other tradeoffs.
| mgaunard wrote:
| I believe the original proposal for noexcept was that throwing
| was UB, and the original author believed strongly it should
| have been that way.
|
| In the end what got into the standard is that it is well-
| defined: it calls std::terminate. If I recall that change
| wasn't made without drama.
| quotemstr wrote:
| If you care about performance, you should consider using Abseil's
| hash tables instead of unordered_set and unordered_map. The
| Abseil containers are mostly drop in replacements for the
| standard unordered containers, except they have a tweaked API
| (e.g. erase() returning void) that admits important performance
| optimizations and (except the node based ones) use open addressed
| hash tables instead of chaining (i.e. each hash bucket is no
| longer a linked list). You end up with 2x-3x speedups in a lot of
| scenarios. The standard containers can't be fixed to match the
| performance of the Abseil ones because 1) the specification API
| requires pessimization, 2) Linux distributors are reluctant to
| break C++ "ABI" (such as it is).
|
| I mean, of course you should follow this article's advice on
| noexcept, but there's a whole world of associative container
| performance optimizations out there.
| jeffbee wrote:
| They are good, but I note that there are now hash tables in
| Boost that may be even better.
|
| https://www.boost.org/doc/libs/develop/libs/unordered/doc/ht...
| spacechild1 wrote:
| Thanks a lot! I have been looking for fast hashtables and I
| didn't know about Boost.Unordered. They even have concurrent
| (lockfree) versions!
| forrestthewoods wrote:
| Adding libraries like Abseil, Folly, or Boost is kind of a huge
| pain in the ass. They're monstrously huge libraries with a
| kajillion headers.
|
| I recently tried to compile Folly on Windows and gave up. Maybe
| Abseil is better? I'm not sure. But it's still a HUGE
| dependency to take on.
| leni536 wrote:
| Use conan or vcpkg.
| binary132 wrote:
| Dependencies giving you trouble? Try adding a dependency to
| handle them!
| leni536 wrote:
| Adding one dependency to your toolchain which in turn
| helps you with dealing with an open-ended set of 3rd
| party dependencies is a good deal.
| jcalvinowens wrote:
| I 100% agree with the other comment: Abseil and Folly are both
| broken monstrosities by design... the build system garbage is
| just one reason almost nobody uses them outside facebook and
| google.
|
| The chaining hashtable in liburcu is truly lock free, based on
| a real build system, and in my experience outperforms
| everything facebook and google have ever published:
| https://github.com/urcu/userspace-rcu
| leni536 wrote:
| _Also, I hope that if you're reading this post in a year or two
| (say, December 2025), these specific examples won't even
| reproduce anymore. I hope libstdc++ gets on the ball and
| eliminates some of this weirdness. (In short: They should get rid
| of the blacklist; pay the 8 bytes by default; introduce a
| whitelist for trivial hashers specifically; stop checking
| noexceptness for any reason.)_
|
| I think this would be an ABI break.
| mgaunard wrote:
| if you want a fast hash map, just use a library that is built
| specifically around a particular implementation strategy.
|
| std unordered map is slow by design for several reasons.
|
| Things of interest are:
|
| - open addressing method: linear, quadratic, robinhood, hybrid
|
| - the default hash function
|
| - the logic for the number of buckets (power of two, prime) and
| associated hash projection (fibonacci, modulo).
|
| Boost seems to be doing a decent job at this now.
___________________________________________________________________
(page generated 2024-08-18 23:00 UTC)