[HN Gopher] SIMD Binary Heap Operations
___________________________________________________________________
SIMD Binary Heap Operations
Author : ryandotsmith
Score : 50 points
Date : 2025-08-12 00:09 UTC (2 days ago)
(HTM) web link (0x80.pl)
(TXT) w3m dump (0x80.pl)
| dooglius wrote:
| is_heap doesn't seem like a particularly useful operation though,
| generally a heap is intentionally constructed as such via
| push_heap/pop_heap
| simonask wrote:
| I think it's fairly useful. It means that you can convert a
| contiguous array to a heap with a fast O(n) read-only check
| instead of O(n log n) writes, so if you know that's the common
| case, you can detect it up front and only revert to normal
| binary heap insertion if it returns false.
| madars wrote:
| By the way, you can also construct heap in linear time -
| instead of doing n consecutive insertions, at least half of
| which require log n work, you can apply sift-down operations
| from bottom up, beginning with the last non-leaf node and
| working backwards to the root.
|
| That way, roughly half the nodes are leaves (requiring no
| work), a quarter are at the second-to-last level (requiring
| at most 1 comparison/swap), an eighth at the third-to-last
| level (requiring at most 2 comparisons/swaps), and so on.
| Summing up 1 n/4 + 2 n/8 + ... gets you O(n) total
| complexity.
|
| See https://en.wikipedia.org/wiki/Heapsort?useskin=monobook#V
| ari...
| gotoeleven wrote:
| I think the parent comment is asking what process or
| algorithm is there that would result in an array that was
| sometimes but not always a heap, and you'd want to do
| something based on whether the array was in fact a heap or
| not? Like in your example, what process do you have in mind
| that might result in a heap and might not?
| mananaysiempre wrote:
| Indeed that part seems more like an artist's study than an
| attempt at actual usefulness, but that's okay when figuring out
| if anything at all in the neighbourhood of what you're trying
| to do with SIMD is even possible. As far as constructing heaps,
| don't forget about (linear-time) heapify, which can be
| significantly faster if you have a bunch of elements and want
| to construct a heap with all of them in it. (This doesn't get
| you a linear-time heap sort because you'll still pay the full
| linearithmic price for the subsequent pop_heaps.)
| camel-cdr wrote:
| make_heap should be vectorizable, that would be more useful. I
| can also see a path to vectorize bulk insert, but that seems
| harder.
| dragontamer wrote:
| I don't believe it's possible to vectorize the classic heap.
|
| I've seen vectorized and SIMD heap implementations. They are
| a different data structure entirely. You basically work with
| 16-sorted items and then sort your working set (16-items)
| with any node, allowing you to generalize the push down or
| push up operations of a heap. (Sort both lists. Top16 make a
| node and stay here. Bottom16 push down and recurse).
|
| This is very very similar to a classic heap but you need a
| lot of operations so that you have enough work to SIMD.
|
| Sorting is after all, a highly parallelized operation (Bionic
| Sort, MergePath, etc) and is a good basis for generalizing
| single threaded data structures into a multi thread or SIMD
| version.
| CalChris wrote:
| It's 2025 and the simd operations in this aren't that obscure. So
| I wonder how close std::simd gets to the instrinsics' performance
| with clang and gcc.
| Sesse__ wrote:
| The fastest way of doing a heap I've found is generally: Don't.
| For many of the relevant operations (graph search, merging
| streams, etc.), you can do just as well with a winner-tree; it
| can usually be updated branch-free with min/max operations,
| whereas with a heap you'll usually have completely unpredictable
| branches for every single operation.
|
| A winner-tree, or its counterpart the loser-tree (for min instead
| of max), is a very simple binary tree: Bottom layer is all your
| values (2^N of them). The layer above that is the highest of
| pairs of values. The layer above that is the highest of pairs of
| pairs. And so on, until you get to the top of the tree, which
| contains the largest value. Updating a value is trivial; you
| overwrite the relevant one at the bottom, and then run exactly
| log2(n) max operations upwards until the you hit the root.
| Inserting and deleting may, of course, be more complicated.
| DennisL123 wrote:
| A winner tree uses extra space, doesn't it? That might exclude
| it from certain applications to be an alternative. Four-ary
| heaps are roughly (ymmv) twice as fast as binary heaps by
| exploiting cache locality in a better way for small key/value
| types. And it seems to be a sweet spot since eight-ary heaps
| don't deliver additional improvements.
| Sesse__ wrote:
| Yes, since the inner nodes duplicate information, it uses
| more space (roughly twice as much). I've found them generally
| most effective for things like merging 256 streams or doing
| Dijkstra with not super-many nodes (e.g. Contraction
| Hierarchies). If you have millions or billions of entries,
| then cache considerations start becoming more important than
| branch mispredictions and you want something like a B-heap.
| za_creature wrote:
| That's a nice niche you found (spoken from one heap fan to
| another) but I have to say I strongly disagree with your
| use of *roughly* twice as much
|
| At best you were off by one but in the context of
| performance, you'd want to assign that extra to a super-
| root of +-inf in order to save log2n extra range checks per
| heap-up, no?
| Sesse__ wrote:
| I don't think I understand what you are saying. Anyway,
| if you are strapped for space, do not use a winner tree.
| za_creature wrote:
| Ah, I meant that for a classic heap, it's convenient to
| assign h[0] to the limit of your goal function (e.g. -inf
| for a min heap) cause that way you can skip the while i>0
| check and just do while h[i>>1] > min(h[i], h[i+1]),
| asserting that the condition is definitely false when i <
| 2
|
| I guess that it's not as important when you're explicitly
| willing to pay the piper and run the full log2n
| iterations just for the sake of branch predictability.
|
| Thanks for the algo though, before today I never would've
| thought about doing a branchless downheap with i = i<<1 +
| h[i<<1] != h[i]
| _ache_ wrote:
| Can't see the Website.
|
| I'm the only one with a HTTPS problem here? Bad domain,
| `art.mahajana.net` instead of 0x80.pl.
| superjan wrote:
| i have wasted several weeks worth of evenings on vectorizing
| heaps (4ary heaps: with SIMD, you're not limited to binary
| heaps). It did not provide any speedup. I'd expect that halving
| heap depth would help but no. Still don't know why.
___________________________________________________________________
(page generated 2025-08-14 23:01 UTC)