[HN Gopher] Scientists find optimal space-time balance for hash ...
___________________________________________________________________
Scientists find optimal space-time balance for hash tables
Author : Brajeshwar
Score : 71 points
Date : 2024-02-09 15:01 UTC (7 hours ago)
(HTM) web link (www.quantamagazine.org)
(TXT) w3m dump (www.quantamagazine.org)
| vlovich123 wrote:
| For anyone curious this is about a 2023 paper proving that a 2022
| theoretical hash table construction achieved not only an upper
| bound in performance in terms of time and space, the construction
| also achieved a lower bound. It's a theoretical construction so
| not actually a practical one you'd find in your software toolkit
| (constants are too large for most applications, likely even
| realistic databases). The referenced "practical" IcebergHT data
| structure isn't even one you'd likely encounter as it seems to be
| a research hash table focusing on persistent memory which isn't
| hardware you typically would encounter.
| jakedowns wrote:
| love when things like this are ready and waiting on the shelf
| for when we need them
| opticfluorine wrote:
| In 200 years this is going to be like when we discover yet
| another proof of Euler or Gauss, 20 years after a current
| mathematician arrived at the same result.
| TeMPOraL wrote:
| Or it could be like multipoint touch screens, early version
| of which were a solved problem in 1980s or earlier, and
| just needed compute power to catch up, which it did over
| subsequent decades.
|
| Or it could be like code=data and homoiconity and other CS
| fundamentals that were figured out 40+ years ago, but are
| _still_ mostly ignored by software industry /culture.
| madars wrote:
| Somewhat off-topic but does anyone have a bookmarklet that would
| remove annoying scrolling behavior on sites like Quanta? I.e.
| pressing space bar should immediately scroll down a page, not do
| it slowly with awkward mechanics.
|
| In exchange I'm happy to share a bookmarklet that "unsticks"
| sticky elements of the page (e.g. an ever-visible header like on
| WaPo, though they are not the worst offender):
| https://pastebin.com/Vh594168
| btdmaster wrote:
| I think they're using sscroll from npm, but not entirely sure.
| I was able to fix it, but too well, in that now it'll scroll
| with unbounded speed:
| https://tio.run/##LY5BDsIgEEX3PcXsgKQS9wYXRpMu3HkCAlMdbYFQ2q...
| jiggawatts wrote:
| This kind of thing is unfortunately not useful at any scale.
|
| Real computer systems have performance that varies due to memory
| locality and size.
|
| Locality because of things such as cache hierarchy and even the
| size of cache lines and memory pages.
|
| Size because of physical implementation: larger memories are
| physically bigger and hence further away. The speed of light
| makes access to larger memories inescapably slower.
|
| The best current hashtable implementations are all quite far from
| the purely theoretical computer science optimums, but are faster
| despite this because they take these factors into account.
|
| Back when CPUs were simple and had no virtual memory or caches,
| there was a good correspondence between theoretical CS algorithms
| and their real implementations.
|
| Now? Everything I see published in this space is basically pure
| maths with little or no practical utility. It's still
| interesting, sure, but it's a bit sad that the theorists have
| retreated into a virtual world to escape the messy details of our
| reality.
| trentnelson wrote:
| Hey, if you're looking for a real-world pragmatic and
| performant implementation of a theoretically-cool algorithm, my
| https://github.com/tpn/perfecthash project might fit the bill.
|
| It's geared to generating perfect hash tables with the fastest
| possible lookup/index times (for 32-bit keys), for key sets in
| the <=100,000 range. (It scales well up to millions of keys,
| but the solving time takes a lot longer.)
| kevinventullo wrote:
| What is a theorist if not someone who works on interesting
| problems which have a tenuous-at-best relationship with
| reality?
| andersa wrote:
| Yep, it's the same thing every time. "We propose this new thing
| with optimal time complexity blah blah blah" - don't care. Show
| benchmark. What do you mean it's slower than this basic but
| cache efficient hash table implemented in 100 lines of c++?
| sylware wrote:
| optimal for which use cases?
___________________________________________________________________
(page generated 2024-02-09 23:00 UTC)