[HN Gopher] A visual interactive guide to Bloom filters
       ___________________________________________________________________
        
       A visual interactive guide to Bloom filters
        
       Author : flyingsky
       Score  : 166 points
       Date   : 2024-02-20 09:58 UTC (13 hours ago)
        
 (HTM) web link (samwho.dev)
 (TXT) w3m dump (samwho.dev)
        
       | machekb wrote:
       | I appreciate the author's dedication to making animations
       | accessible!
        
         | samwho wrote:
         | Thank you so much. I still have a way to go. People who
         | struggle with low contrast will struggle with the default
         | colour scheme, for example. But I'm trying to make progress
         | with every new post <3
        
       | dndn1 wrote:
       | These are beautiful interactives along with the rest by Sam!
        
         | samwho wrote:
         | You're very kind! I'm a big believer in writing you can really
         | lose yourself in, and try and achieve that with interactivity
         | and visuals. The aesthetics of my posts is very important to
         | me.
        
       | samwho wrote:
       | I'm the author of this post, and am happy to answer any
       | questions. :)
        
         | mzs wrote:
         | I really like this husky graphic, did you create it?
         | 
         | https://samwho.dev/images/sage-warning.png
        
           | samwho wrote:
           | I didn't, it is AI generated. I wasn't sure if I'd re-use
           | them after the Memory Allocation post, but they've proven to
           | work really well and people like them. I've been considering
           | getting proper, consistent sets commissioned. One of the
           | things I want to do in future is make it so you can pet them,
           | which could require some animation work. Open to artists
           | reaching out to me about this. :)
        
             | nhggfu wrote:
             | article about a related app - that some guy created in 30
             | mins to allow people to "pet" virtual animals:
             | https://www.theguardian.com/games/2024/feb/15/stroke-of-
             | geni...
        
             | gwern wrote:
             | DALL-E 3? I've seen some similar images out of it while
             | trying for more pointilist views. (DALL-E 3 has a certain
             | obsession with regularity & centering.)
        
               | samwho wrote:
               | Midjourney! The prompt for the puppy with his head cocked
               | to one side was: "playful happy cartoon huskie puppy
               | asking an insightful question looking at the camera,
               | question mark above, muted colors, block print." Never
               | could convince the model to give me that question mark.
        
       | martinky24 wrote:
       | Awesome resource! Bloom filters are one of my favorite data
       | structure (that I wish I got to employ more often in my line of
       | work... but alas).
        
         | samwho wrote:
         | i-just-think-they're-neat.gif
         | 
         | Have loved bloom filters for a long time. If probabilistic data
         | structures that make use of hashing are your thing, you should
         | also check out: https://djhworld.github.io/hyperloglog/
        
       | svraghavan wrote:
       | Thanks to the author for all the hard work. Its easy to consume
       | and understand for beginners too.
        
         | samwho wrote:
         | This is really reassuring to hear, thank you! I make sure my
         | posts are reviewed by people with a range of experience and
         | familiarity, without them the posts wouldn't be as accessible
         | as they are. I'm glad this is working. :)
        
       | nalgeon wrote:
       | Sam's visual tutorials are as good as ever (doggos are very
       | important, by the way).
       | 
       | And if you want to try some code, I've prepared three toy Bloom
       | filter implementations to accompany the article:
       | 
       | JavaScript:
       | https://codapi.org/embed/?engine=browser&sandbox=javascript&...
       | 
       | Python:
       | https://codapi.org/embed/?sandbox=python&src=gist:e7bde93f98...
       | 
       | Go:
       | https://codapi.org/embed/?sandbox=go&src=gist:e7bde93f98c5e4...
        
       | who-shot-jr wrote:
       | Amazing! Very well explained.
        
         | samwho wrote:
         | Thank you!
        
       | the_arun wrote:
       | What are the business use cases we use bloom filters? Some
       | examples would be great. Thanks!
        
         | ajoseps wrote:
         | the article provide 2 real world examples at Akamai and Google
        
           | samwho wrote:
           | On top of those, the Wikipedia page also has a list of
           | examples: https://en.wikipedia.org/wiki/Bloom_filter#Examples
        
         | dvirsky wrote:
         | One example: I personally used it to deduplicate recently
         | viewed ads in an app, to avoid spamming users with the same ad
         | too much. One interesting challenge there was that we needed to
         | make sure that the client and server encode and decode the
         | filter exactly the same, despite being implemented in different
         | languages.
        
           | samwho wrote:
           | Ooh that sounds fun.
           | 
           | Something I spent time thinking about, but wasn't able to
           | find a huge amount of information on, is how you could use
           | compression alongside bloom filters. You could make enormous
           | bloom filters that make use of run length encoding or sparse
           | bitmaps. You sacrifice insert and lookup speed but you could
           | make enormous bloom filters this way.
        
             | dvirsky wrote:
             | Bloom Filters are already lossy compression, so depending
             | on how sparse they are I'm not sure you'll get too much
             | benefit out of it, but maybe I'm just thinking of much,
             | much smaller filters.
             | 
             | BTW we ended up open sourcing that BF library that encodes
             | and decodes filters in multiple languages, the company has
             | been out of business for nearly a decade but the project is
             | still out there https://github.com/EverythingMe/inbloom
        
         | pncnmnp wrote:
         | I'm not sure if it's a big deal for business, but for my
         | bachelor's thesis, we showed how we could use a type of
         | counting bloom filters (mentioned in the blog) known as
         | Spectral Bloom Filters, for client-side searching on static
         | websites. Basically, it means doing all the searching and
         | ranking directly on the client side, without needing to send
         | any requests to the server.
         | 
         | https://pncnmnp.github.io/blogs/spectral-bloom-filters.html
        
       | nhggfu wrote:
       | very well written and informative.
        
         | samwho wrote:
         | Thank you!
        
       | ctxc wrote:
       | Flows beautifully. Looking forward to the next one Sam!
        
         | samwho wrote:
         | Thank you, ctxc! Topic is decided but it's going to be a tricky
         | one to get right, so it's a few months away at least I'm
         | afraid.
        
       | franky47 wrote:
       | A wonderful article, thank you.
       | 
       | Rather than using multiple hash functions, would it make more
       | sense to use a single algorithm over (prefix | input), with k
       | different prefixes? This may allow computing those hashes in
       | parallel, using SIMD for example, and caching the prehash state
       | of the prefixes.
       | 
       | Edit: looks like there has been some research on this:
       | https://ieeexplore.ieee.org/document/8462781
        
         | boyd wrote:
         | It's fairly common to use two hash functions and then compute
         | the remaining n hashes as hn(x) = h1(x) + n * h2(x).
         | 
         | Paper: https://link.springer.com/chapter/10.1007/11841036_42
        
       | naugtur wrote:
       | I remember discussing bloom and cuckoo filters a while back and
       | someone linked.to.great papers from late 2010s with much more
       | advanced probabilistic structures that were handling modification
       | much better or had other interesting properties. Anyone here
       | knows where to look for these? I didn't take notes...
        
         | thinkharderdev wrote:
         | Not sure, but I recently came across xorsat filters
         | (https://eecs.ceas.uc.edu/~weaversa/XORSATFilters.pdf). It's
         | not a great replacement for bloom filters in many use cases
         | because they are not modifiable but have some other interesting
         | properties:
         | 
         | 1. They are very close to the information theoretic minimum
         | needed to encode the information so are more space-efficient
         | than bloom filters or cuckoo filters.
         | 
         | 2. They have a natural way to store metadata (at the cost of
         | additional space of course) so you can associate a few bits of
         | metadata with each set bit at pretty reasonable space cost.
        
       | Bloomfu wrote:
       | Very good explanation of Bloom filters, but are they optimum from
       | a complexity point of view given a false positive rate and a
       | bounded number of bits? Are there other kind of filters if we
       | relax de condicion of zero false negative rate?, are there other
       | better but more complex data structures able to provide the same
       | effect?
        
       ___________________________________________________________________
       (page generated 2024-02-20 23:00 UTC)