[HN Gopher] A visual interactive guide to Bloom filters
___________________________________________________________________
A visual interactive guide to Bloom filters
Author : flyingsky
Score : 166 points
Date : 2024-02-20 09:58 UTC (13 hours ago)
(HTM) web link (samwho.dev)
(TXT) w3m dump (samwho.dev)
| machekb wrote:
| I appreciate the author's dedication to making animations
| accessible!
| samwho wrote:
| Thank you so much. I still have a way to go. People who
| struggle with low contrast will struggle with the default
| colour scheme, for example. But I'm trying to make progress
| with every new post <3
| dndn1 wrote:
| These are beautiful interactives along with the rest by Sam!
| samwho wrote:
| You're very kind! I'm a big believer in writing you can really
| lose yourself in, and try and achieve that with interactivity
| and visuals. The aesthetics of my posts is very important to
| me.
| samwho wrote:
| I'm the author of this post, and am happy to answer any
| questions. :)
| mzs wrote:
| I really like this husky graphic, did you create it?
|
| https://samwho.dev/images/sage-warning.png
| samwho wrote:
| I didn't, it is AI generated. I wasn't sure if I'd re-use
| them after the Memory Allocation post, but they've proven to
| work really well and people like them. I've been considering
| getting proper, consistent sets commissioned. One of the
| things I want to do in future is make it so you can pet them,
| which could require some animation work. Open to artists
| reaching out to me about this. :)
| nhggfu wrote:
| article about a related app - that some guy created in 30
| mins to allow people to "pet" virtual animals:
| https://www.theguardian.com/games/2024/feb/15/stroke-of-
| geni...
| gwern wrote:
| DALL-E 3? I've seen some similar images out of it while
| trying for more pointilist views. (DALL-E 3 has a certain
| obsession with regularity & centering.)
| samwho wrote:
| Midjourney! The prompt for the puppy with his head cocked
| to one side was: "playful happy cartoon huskie puppy
| asking an insightful question looking at the camera,
| question mark above, muted colors, block print." Never
| could convince the model to give me that question mark.
| martinky24 wrote:
| Awesome resource! Bloom filters are one of my favorite data
| structure (that I wish I got to employ more often in my line of
| work... but alas).
| samwho wrote:
| i-just-think-they're-neat.gif
|
| Have loved bloom filters for a long time. If probabilistic data
| structures that make use of hashing are your thing, you should
| also check out: https://djhworld.github.io/hyperloglog/
| svraghavan wrote:
| Thanks to the author for all the hard work. Its easy to consume
| and understand for beginners too.
| samwho wrote:
| This is really reassuring to hear, thank you! I make sure my
| posts are reviewed by people with a range of experience and
| familiarity, without them the posts wouldn't be as accessible
| as they are. I'm glad this is working. :)
| nalgeon wrote:
| Sam's visual tutorials are as good as ever (doggos are very
| important, by the way).
|
| And if you want to try some code, I've prepared three toy Bloom
| filter implementations to accompany the article:
|
| JavaScript:
| https://codapi.org/embed/?engine=browser&sandbox=javascript&...
|
| Python:
| https://codapi.org/embed/?sandbox=python&src=gist:e7bde93f98...
|
| Go:
| https://codapi.org/embed/?sandbox=go&src=gist:e7bde93f98c5e4...
| who-shot-jr wrote:
| Amazing! Very well explained.
| samwho wrote:
| Thank you!
| the_arun wrote:
| What are the business use cases we use bloom filters? Some
| examples would be great. Thanks!
| ajoseps wrote:
| the article provide 2 real world examples at Akamai and Google
| samwho wrote:
| On top of those, the Wikipedia page also has a list of
| examples: https://en.wikipedia.org/wiki/Bloom_filter#Examples
| dvirsky wrote:
| One example: I personally used it to deduplicate recently
| viewed ads in an app, to avoid spamming users with the same ad
| too much. One interesting challenge there was that we needed to
| make sure that the client and server encode and decode the
| filter exactly the same, despite being implemented in different
| languages.
| samwho wrote:
| Ooh that sounds fun.
|
| Something I spent time thinking about, but wasn't able to
| find a huge amount of information on, is how you could use
| compression alongside bloom filters. You could make enormous
| bloom filters that make use of run length encoding or sparse
| bitmaps. You sacrifice insert and lookup speed but you could
| make enormous bloom filters this way.
| dvirsky wrote:
| Bloom Filters are already lossy compression, so depending
| on how sparse they are I'm not sure you'll get too much
| benefit out of it, but maybe I'm just thinking of much,
| much smaller filters.
|
| BTW we ended up open sourcing that BF library that encodes
| and decodes filters in multiple languages, the company has
| been out of business for nearly a decade but the project is
| still out there https://github.com/EverythingMe/inbloom
| pncnmnp wrote:
| I'm not sure if it's a big deal for business, but for my
| bachelor's thesis, we showed how we could use a type of
| counting bloom filters (mentioned in the blog) known as
| Spectral Bloom Filters, for client-side searching on static
| websites. Basically, it means doing all the searching and
| ranking directly on the client side, without needing to send
| any requests to the server.
|
| https://pncnmnp.github.io/blogs/spectral-bloom-filters.html
| nhggfu wrote:
| very well written and informative.
| samwho wrote:
| Thank you!
| ctxc wrote:
| Flows beautifully. Looking forward to the next one Sam!
| samwho wrote:
| Thank you, ctxc! Topic is decided but it's going to be a tricky
| one to get right, so it's a few months away at least I'm
| afraid.
| franky47 wrote:
| A wonderful article, thank you.
|
| Rather than using multiple hash functions, would it make more
| sense to use a single algorithm over (prefix | input), with k
| different prefixes? This may allow computing those hashes in
| parallel, using SIMD for example, and caching the prehash state
| of the prefixes.
|
| Edit: looks like there has been some research on this:
| https://ieeexplore.ieee.org/document/8462781
| boyd wrote:
| It's fairly common to use two hash functions and then compute
| the remaining n hashes as hn(x) = h1(x) + n * h2(x).
|
| Paper: https://link.springer.com/chapter/10.1007/11841036_42
| naugtur wrote:
| I remember discussing bloom and cuckoo filters a while back and
| someone linked.to.great papers from late 2010s with much more
| advanced probabilistic structures that were handling modification
| much better or had other interesting properties. Anyone here
| knows where to look for these? I didn't take notes...
| thinkharderdev wrote:
| Not sure, but I recently came across xorsat filters
| (https://eecs.ceas.uc.edu/~weaversa/XORSATFilters.pdf). It's
| not a great replacement for bloom filters in many use cases
| because they are not modifiable but have some other interesting
| properties:
|
| 1. They are very close to the information theoretic minimum
| needed to encode the information so are more space-efficient
| than bloom filters or cuckoo filters.
|
| 2. They have a natural way to store metadata (at the cost of
| additional space of course) so you can associate a few bits of
| metadata with each set bit at pretty reasonable space cost.
| Bloomfu wrote:
| Very good explanation of Bloom filters, but are they optimum from
| a complexity point of view given a false positive rate and a
| bounded number of bits? Are there other kind of filters if we
| relax de condicion of zero false negative rate?, are there other
| better but more complex data structures able to provide the same
| effect?
___________________________________________________________________
(page generated 2024-02-20 23:00 UTC)