[HN Gopher] SIMD Binary Heap Operations
       ___________________________________________________________________
        
       SIMD Binary Heap Operations
        
       Author : ryandotsmith
       Score  : 50 points
       Date   : 2025-08-12 00:09 UTC (2 days ago)
        
 (HTM) web link (0x80.pl)
 (TXT) w3m dump (0x80.pl)
        
       | dooglius wrote:
       | is_heap doesn't seem like a particularly useful operation though,
       | generally a heap is intentionally constructed as such via
       | push_heap/pop_heap
        
         | simonask wrote:
         | I think it's fairly useful. It means that you can convert a
         | contiguous array to a heap with a fast O(n) read-only check
         | instead of O(n log n) writes, so if you know that's the common
         | case, you can detect it up front and only revert to normal
         | binary heap insertion if it returns false.
        
           | madars wrote:
           | By the way, you can also construct heap in linear time -
           | instead of doing n consecutive insertions, at least half of
           | which require log n work, you can apply sift-down operations
           | from bottom up, beginning with the last non-leaf node and
           | working backwards to the root.
           | 
           | That way, roughly half the nodes are leaves (requiring no
           | work), a quarter are at the second-to-last level (requiring
           | at most 1 comparison/swap), an eighth at the third-to-last
           | level (requiring at most 2 comparisons/swaps), and so on.
           | Summing up 1 n/4 + 2 n/8 + ... gets you O(n) total
           | complexity.
           | 
           | See https://en.wikipedia.org/wiki/Heapsort?useskin=monobook#V
           | ari...
        
           | gotoeleven wrote:
           | I think the parent comment is asking what process or
           | algorithm is there that would result in an array that was
           | sometimes but not always a heap, and you'd want to do
           | something based on whether the array was in fact a heap or
           | not? Like in your example, what process do you have in mind
           | that might result in a heap and might not?
        
         | mananaysiempre wrote:
         | Indeed that part seems more like an artist's study than an
         | attempt at actual usefulness, but that's okay when figuring out
         | if anything at all in the neighbourhood of what you're trying
         | to do with SIMD is even possible. As far as constructing heaps,
         | don't forget about (linear-time) heapify, which can be
         | significantly faster if you have a bunch of elements and want
         | to construct a heap with all of them in it. (This doesn't get
         | you a linear-time heap sort because you'll still pay the full
         | linearithmic price for the subsequent pop_heaps.)
        
         | camel-cdr wrote:
         | make_heap should be vectorizable, that would be more useful. I
         | can also see a path to vectorize bulk insert, but that seems
         | harder.
        
           | dragontamer wrote:
           | I don't believe it's possible to vectorize the classic heap.
           | 
           | I've seen vectorized and SIMD heap implementations. They are
           | a different data structure entirely. You basically work with
           | 16-sorted items and then sort your working set (16-items)
           | with any node, allowing you to generalize the push down or
           | push up operations of a heap. (Sort both lists. Top16 make a
           | node and stay here. Bottom16 push down and recurse).
           | 
           | This is very very similar to a classic heap but you need a
           | lot of operations so that you have enough work to SIMD.
           | 
           | Sorting is after all, a highly parallelized operation (Bionic
           | Sort, MergePath, etc) and is a good basis for generalizing
           | single threaded data structures into a multi thread or SIMD
           | version.
        
       | CalChris wrote:
       | It's 2025 and the simd operations in this aren't that obscure. So
       | I wonder how close std::simd gets to the instrinsics' performance
       | with clang and gcc.
        
       | Sesse__ wrote:
       | The fastest way of doing a heap I've found is generally: Don't.
       | For many of the relevant operations (graph search, merging
       | streams, etc.), you can do just as well with a winner-tree; it
       | can usually be updated branch-free with min/max operations,
       | whereas with a heap you'll usually have completely unpredictable
       | branches for every single operation.
       | 
       | A winner-tree, or its counterpart the loser-tree (for min instead
       | of max), is a very simple binary tree: Bottom layer is all your
       | values (2^N of them). The layer above that is the highest of
       | pairs of values. The layer above that is the highest of pairs of
       | pairs. And so on, until you get to the top of the tree, which
       | contains the largest value. Updating a value is trivial; you
       | overwrite the relevant one at the bottom, and then run exactly
       | log2(n) max operations upwards until the you hit the root.
       | Inserting and deleting may, of course, be more complicated.
        
         | DennisL123 wrote:
         | A winner tree uses extra space, doesn't it? That might exclude
         | it from certain applications to be an alternative. Four-ary
         | heaps are roughly (ymmv) twice as fast as binary heaps by
         | exploiting cache locality in a better way for small key/value
         | types. And it seems to be a sweet spot since eight-ary heaps
         | don't deliver additional improvements.
        
           | Sesse__ wrote:
           | Yes, since the inner nodes duplicate information, it uses
           | more space (roughly twice as much). I've found them generally
           | most effective for things like merging 256 streams or doing
           | Dijkstra with not super-many nodes (e.g. Contraction
           | Hierarchies). If you have millions or billions of entries,
           | then cache considerations start becoming more important than
           | branch mispredictions and you want something like a B-heap.
        
             | za_creature wrote:
             | That's a nice niche you found (spoken from one heap fan to
             | another) but I have to say I strongly disagree with your
             | use of *roughly* twice as much
             | 
             | At best you were off by one but in the context of
             | performance, you'd want to assign that extra to a super-
             | root of +-inf in order to save log2n extra range checks per
             | heap-up, no?
        
               | Sesse__ wrote:
               | I don't think I understand what you are saying. Anyway,
               | if you are strapped for space, do not use a winner tree.
        
               | za_creature wrote:
               | Ah, I meant that for a classic heap, it's convenient to
               | assign h[0] to the limit of your goal function (e.g. -inf
               | for a min heap) cause that way you can skip the while i>0
               | check and just do while h[i>>1] > min(h[i], h[i+1]),
               | asserting that the condition is definitely false when i <
               | 2
               | 
               | I guess that it's not as important when you're explicitly
               | willing to pay the piper and run the full log2n
               | iterations just for the sake of branch predictability.
               | 
               | Thanks for the algo though, before today I never would've
               | thought about doing a branchless downheap with i = i<<1 +
               | h[i<<1] != h[i]
        
       | _ache_ wrote:
       | Can't see the Website.
       | 
       | I'm the only one with a HTTPS problem here? Bad domain,
       | `art.mahajana.net` instead of 0x80.pl.
        
       | superjan wrote:
       | i have wasted several weeks worth of evenings on vectorizing
       | heaps (4ary heaps: with SIMD, you're not limited to binary
       | heaps). It did not provide any speedup. I'd expect that halving
       | heap depth would help but no. Still don't know why.
        
       ___________________________________________________________________
       (page generated 2025-08-14 23:01 UTC)