[HN Gopher] Advancing the state of the art for std:unordered_map...
___________________________________________________________________
Advancing the state of the art for std:unordered_map
implementations
Author : fanf2
Score : 35 points
Date : 2022-06-19 18:12 UTC (4 hours ago)
(HTM) web link (bannalia.blogspot.com)
(TXT) w3m dump (bannalia.blogspot.com)
| 0x23 wrote:
| There also has been a lot research lately in the field. For
| example, [1,2] has a lot of new methods on concurrent or space
| efficient hash tables.
|
| [1] http://algo2.iti.kit.edu/maier.php
|
| [2] https://github.com/TooBiased/growt
| syspec wrote:
| Really interesting post, love reading these kind of details into
| how collection types I take for granted work, and are iterated on
| kccqzy wrote:
| > Back in the day, open-addressing approaches were not regarded
| as sufficiently mature, so closed addressing was taken as the
| safe implementation of choice.
|
| I always chuckle when I read statements like this because it
| didn't match my own experience as a learner. When I learned
| computer science, I was taught hash tables right after arrays,
| but before linked lists or even pointers. So open addressing had
| always been the default for hash tables for me. I'm curious to
| hear what other people's learning journeys are like.
| RicoElectrico wrote:
| I lack experience to tell - does anybody use chaining in
| production?
| zeroonetwothree wrote:
| unordered_map uses chaining
| masklinn wrote:
| IIRC php and go use separate chaining (unless php switched
| while I wasn't looking).
| ghusbands wrote:
| It is conceptually simpler, especially once deletion is
| involved, and is the more common implementation.
| OskarS wrote:
| The fact that the STL api leaks these kind of implementation
| details is such a shame. Multiplied globally over all users of
| unordered_map, the amount of CPU time that has been wasted
| because of that decision boggles the mind. Has to be one of the
| biggest library design blunders in the history of the industry
| (up there with C strings being null-terminated). They should just
| add a new map to the standard with a better API.
| BeetleB wrote:
| > The fact that the STL api leaks these kind of implementation
| details is such a shame.
|
| There were good reasons for it to do so. C++ is often used for
| performance reasons, and requiring low level access was a means
| to prevent some implementation being standards compliant yet
| with horrible performance.
|
| I once had a coworker who had spent some time on the standards
| committee. He said they would dive into the technical details
| and tradeoffs to an insane level (utilizing journal papers,
| theses, and real world benchmarks). He often would quip that
| the amount of times some coworker came to him with a claim that
| they have a better implementation than the standard's and were
| correct, was zero. Sure, if you're a Boost developer, you may
| have the chops to improve on the standard, but you really need
| to be at that level.
|
| The other issue, of course, is that when you go into such
| detail, then yes - the standard is a bit brittle. The solution
| is for the standard to introduce another unordered_map (with a
| different name) that allows for open addressing to the
| standard, and keep the old one for backwards compatibility.
| Anyone who's spent time with the algorithms library knows that
| this is a common approach: Adding new functions to improve
| deficiencies in the previous ones.
|
| Anti-disclaimer: I hate C++. Not a fan.
| forrestthewoods wrote:
| C++ STL design is broken. The language shouldn't define an API
| for std, it should provide an implementation. AFAIK no other
| major language defines the std library the way C++ does. It's
| so bad and problematic.
| MauranKilom wrote:
| Sorry, can you elaborate? Some things in the STL require
| compiler support, so you can't really "provide" an
| implementation in the narrow sense.
|
| Also, how does providing a concrete/reference implementation
| prevent the API problems? Hyrum's law [0] means that you
| would make the situation _worse_ because now people will
| depend on every detail you can think of. And if you want to
| improve the implementation in a new version, you now are
| (most likely) breaking ABI on _every_ platform at once,
| instead of giving different platforms the ability to find
| compromises as needed. Just look at the recent "no ABI break
| in C++23" discussions to see how contentious that topic is.
|
| [0]: https://www.hyrumslaw.com/
| forrestthewoods wrote:
| STL should "just" be a library written against the language
| spec. The fact that there are so many different, unrelated
| implementations is crazy.
|
| The STL spec is so over designed that they strongly imply,
| if not mandate, a particular implementation. But they don't
| actually provide an implementation so every platform has to
| implement one. However those implementations select
| different trade-offs which means different behaviors so
| actually developers wind up writing their own version that
| is consistent. It's basically the worst outcome.
| olliej wrote:
| I am always sad that the default STL collections leak so much
| implementation detail that they are stuck on such abysmally
| performing algorithms.
|
| That every serious project ends up with its own hashmap, etc is
| such a shame.
| usefulcat wrote:
| IME, if you want a fast hash map, the very first thing you should
| do is ignore anything that uses closed addressing. If you really
| d I need one of the extra guarantees provided by
| std::unordered_map, see if you can't get it by using a container
| of unique_ptr<T> or shared_ptr<T> instead of unordered_map<T>.
___________________________________________________________________
(page generated 2022-06-19 23:00 UTC)