[HN Gopher] Consistently faster and smaller compressed bitmaps w...
       ___________________________________________________________________
        
       Consistently faster and smaller compressed bitmaps with Roaring
       (2016)
        
       Author : hambandit
       Score  : 50 points
       Date   : 2024-11-01 18:57 UTC (7 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | o11c wrote:
       | Title is confusing: this is not about the original "Roaring", but
       | an extension of it called "Roaring+Run".
       | 
       | Here, "bitmap" = "set of sometimes-compact integers". The
       | "uncompressed" and several "rle" implementations are obvious. Hm,
       | except this only seems to be talking about a particularly-naive
       | RLE approach (suitable for storage but not computation)? If
       | you're doing computation I expect you to use absolute offsets
       | rather than relative ones which means you can just do binary
       | search (the only downside of this is that you can't use variable-
       | length integers now).
       | 
       | Roaring is just a fixed 2-level trie, where the outer node is
       | always an array of pointers and where the inner nodes can be
       | either uncompressed bitvectors (if dense) or an array of low-half
       | integers (if sparse). Also, it _only_ works for 32-bit integers
       | at a fundamental level; significant changes are needed for 64-bit
       | integers.
       | 
       | This paper adds a third representation for the inner node, the
       | bsearch'able absolute RLE method I mentioned earlier (before even
       | reading the paper beyond the abstract).
       | 
       | Overall there's neither anything novel nor anything particularly
       | exciting about this paper, but if you ignore all the self-
       | congratulations it might work as a decent intro to the subject?
       | Except maybe not since there are a lot of things it fails to
       | mention (the ping-pong problem, deduplicated tries, the approach
       | of using a separate set for sparse values in the same range, the
       | "overall sparse but locally semi-dense" representation that uses
       | fixed-size single-word bitsets, ...)
        
         | Drup wrote:
         | You seem well versed into that corner. Do you have a good (and
         | reasonably complete) introduction/exploration for these memory-
         | efficient data-structure for computation ?
         | 
         | I've been working on memory representation of algebraic data
         | types quite a bit, and I've always wondered if we could combine
         | them with succinct data-structures.
        
           | pram wrote:
           | Theres actually a whole website about it! I found it useful
           | when I was doing deeper research into ElasticSearch:
           | https://roaringbitmap.org
        
       | pncnmnp wrote:
       | Hey everyone, if you're looking for a more approachable guide on
       | bitmap compression, I wrote a blog post on it this year:
       | https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers
       | run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.
        
         | skybrian wrote:
         | That's a good read, thanks! The history is interesting, but in
         | practice, I'm wondering if there's any reason not to skip it
         | and just learn about roaring bitmaps?
        
           | pncnmnp wrote:
           | Absolutely! For inspiration on how roaring bitmaps can be
           | used in practice, check out https://roaringbitmap.org/. The
           | blog post is part of a book I am writing on obscure data
           | structures (https://pncnmnp.github.io/blogs/dsa-book.html),
           | so explaining the history was a way to dive into the
           | evolution of this topic and the limitations of each
           | implementation, in order to motivate a SOTA.
        
       | softwaredoug wrote:
       | Roaring bitmaps are really useful for doing phrase search in
       | search engines.
       | 
       | Basically you can find cases where one term is immediately before
       | another by left shifting the right terms roaring encoded
       | positions in all documents and bitwise-anding the similarly
       | roaring-encoded payload of the preceding term. All with a highly
       | compressed representation of term positions.
       | 
       | With something like numpy you can do this in a handful of logical
       | operations in python.
       | 
       | https://softwaredoug.com/blog/2024/01/21/search-array-phrase...
        
       ___________________________________________________________________
       (page generated 2024-11-08 23:01 UTC)