[HN Gopher] Fleur - A bloom filter implementation in C
___________________________________________________________________
Fleur - A bloom filter implementation in C
Author : adulau
Score : 50 points
Date : 2022-07-26 12:13 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| lambdaloop wrote:
| I love that name! Based on the metrics at the bottom, is it right
| that it's 3 times faster than the Go implementation? What makes
| it so fast?
| FreakLegion wrote:
| Without looking at the implementations I'd guess it's Go's
| bounds checking.
| drfuchs wrote:
| Could be faster still if it didn't calloc/free a "Fingerprint"
| buffer each time you add or query an entry. Since its individual
| BloomFilters are not thread-safe anyway, it could allocate the
| buffer once at BloomFilter creation time and be done with it.
| mitchellh wrote:
| If you want a networked version (note: not based on Fleur):
| https://github.com/armon/bloomd In the source there is a
| "libbloom" as well that you could copy directly into your
| codebase if you wanted an embedded version. The networked version
| has a simple Redis-like text protocol so you can just use
| `telnet` to play around with it, too.
|
| Also written in C, with great performance (no comparison to this
| one, I haven't done it), and has been used in production by many
| companies for many years (since ~2012 or 2013).
|
| Just pointing out other implementations if anyone is curious!
| llimllib wrote:
| Neat! Because it's a thing I do[1], the hash algorithms in use:
|
| Fleur (and DCSO/bloom and DCSO/flor): fnv
|
| bloomd: a combination of SpookyHash and murmur[2]
|
| [1]: https://llimllib.github.io/bloomfilter-tutorial/ (I'll
| update it to add bloomd and spooky)
|
| [2]:
| https://github.com/armon/bloomd/blob/23c19a7f5cbb35d7c3d970b...
|
| [3]: I have a vague recollection of somebody telling me why
| combining two hashes in the way bloomd does for k >= 4 is a
| good idea but I can't remember - anybody have a good reference
| for me to link to? (edit: nvm, I already link the paper on my
| page! sheesh)
| FreakLegion wrote:
| It's in the source, _Less Hashing, Same Performance: Building
| a Better Bloom Filter_ : https://www.eecs.harvard.edu/~michae
| lm/postscripts/rsa2008.p...
|
| I've used this trick at scale on network gear and it works
| great.
| llimllib wrote:
| hah yeah I noticed a minute later, the link is right in the
| comments, and I'd already linked the paper in my article! I
| feel like this is one of those things I re-learn every 5
| years
| ascar wrote:
| I vaguely remember having read about multiple hashes for
| bloom filters in the past, but I have trouble extracting
| the actual approach from the linked paper.
|
| Could you give a summary? The paper is quite mathematical
| and seems to lack a clear description of how to actually
| use the two hashes without reading in depth.
| jstrieb wrote:
| I appreciate the Go-style docstring comments in the code!
|
| For anyone trying to understand Bloom filters who may be
| interested in checking out an alternative C implementation
| (designed to be compiled to WebAssembly), I'll link mine below.
|
| I used it to build a Bloom filter of every Hacker News submission
| ever for a privacy-preserving browser extension that tells you if
| the page you're currently on has been submitted before.
|
| https://github.com/jstrieb/hackernews-button/blob/master/blo...
___________________________________________________________________
(page generated 2022-07-27 23:00 UTC)