[HN Gopher] Consistently faster and smaller compressed bitmaps w...
___________________________________________________________________
Consistently faster and smaller compressed bitmaps with Roaring
(2016)
Author : hambandit
Score : 50 points
Date : 2024-11-01 18:57 UTC (7 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| o11c wrote:
| Title is confusing: this is not about the original "Roaring", but
| an extension of it called "Roaring+Run".
|
| Here, "bitmap" = "set of sometimes-compact integers". The
| "uncompressed" and several "rle" implementations are obvious. Hm,
| except this only seems to be talking about a particularly-naive
| RLE approach (suitable for storage but not computation)? If
| you're doing computation I expect you to use absolute offsets
| rather than relative ones which means you can just do binary
| search (the only downside of this is that you can't use variable-
| length integers now).
|
| Roaring is just a fixed 2-level trie, where the outer node is
| always an array of pointers and where the inner nodes can be
| either uncompressed bitvectors (if dense) or an array of low-half
| integers (if sparse). Also, it _only_ works for 32-bit integers
| at a fundamental level; significant changes are needed for 64-bit
| integers.
|
| This paper adds a third representation for the inner node, the
| bsearch'able absolute RLE method I mentioned earlier (before even
| reading the paper beyond the abstract).
|
| Overall there's neither anything novel nor anything particularly
| exciting about this paper, but if you ignore all the self-
| congratulations it might work as a decent intro to the subject?
| Except maybe not since there are a lot of things it fails to
| mention (the ping-pong problem, deduplicated tries, the approach
| of using a separate set for sparse values in the same range, the
| "overall sparse but locally semi-dense" representation that uses
| fixed-size single-word bitsets, ...)
| Drup wrote:
| You seem well versed into that corner. Do you have a good (and
| reasonably complete) introduction/exploration for these memory-
| efficient data-structure for computation ?
|
| I've been working on memory representation of algebraic data
| types quite a bit, and I've always wondered if we could combine
| them with succinct data-structures.
| pram wrote:
| Theres actually a whole website about it! I found it useful
| when I was doing deeper research into ElasticSearch:
| https://roaringbitmap.org
| pncnmnp wrote:
| Hey everyone, if you're looking for a more approachable guide on
| bitmap compression, I wrote a blog post on it this year:
| https://pncnmnp.github.io/blogs/roaring-bitmaps.html. It covers
| run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.
| skybrian wrote:
| That's a good read, thanks! The history is interesting, but in
| practice, I'm wondering if there's any reason not to skip it
| and just learn about roaring bitmaps?
| pncnmnp wrote:
| Absolutely! For inspiration on how roaring bitmaps can be
| used in practice, check out https://roaringbitmap.org/. The
| blog post is part of a book I am writing on obscure data
| structures (https://pncnmnp.github.io/blogs/dsa-book.html),
| so explaining the history was a way to dive into the
| evolution of this topic and the limitations of each
| implementation, in order to motivate a SOTA.
| softwaredoug wrote:
| Roaring bitmaps are really useful for doing phrase search in
| search engines.
|
| Basically you can find cases where one term is immediately before
| another by left shifting the right terms roaring encoded
| positions in all documents and bitwise-anding the similarly
| roaring-encoded payload of the preceding term. All with a highly
| compressed representation of term positions.
|
| With something like numpy you can do this in a handful of logical
| operations in python.
|
| https://softwaredoug.com/blog/2024/01/21/search-array-phrase...
___________________________________________________________________
(page generated 2024-11-08 23:01 UTC)