[HN Gopher] Show HN: Hackernews-Button - Bloom Filter-based, pri...
___________________________________________________________________
Show HN: Hackernews-Button - Bloom Filter-based, private HN
browsing extension
Author : jstrieb
Score : 43 points
Date : 2021-03-01 11:09 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| [deleted]
| sktrdie wrote:
| Have you compared this with using Merkle Trees? Would be curious
| how big the size of a merkle branch is in comparison to the bloom
| filter set.
| jstrieb wrote:
| I haven't investigated this (or considered Merkle Trees), but
| will follow up if/when I get a chance to test it.
|
| I've tried to design the code so it shouldn't be too hard to do
| a drop-in replacement of Bloom filters with something else if a
| better data structure comes along.
| jstrieb wrote:
| Hello! This Firefox extension shows approximately how many points
| (if any) the current page scored on Hacker News using Bloom
| filters. Clicking the extension navigates to the best discussion
| for the current page. It is great for reading interesting
| comments about things you may not have known were submitted to HN
| in the past. I have been using it for the past two months while
| building it, and honestly it has been really fun to read
| insightful comments I otherwise never would have seen. It's also
| handy for figuring out whether to submit a page, since it shows
| whether it's been submitted before or is new territory.
|
| Rather than determining if the current site has been submitted by
| querying the Firebase/Algolia APIs with every page you visit, the
| extension contains regularly-updating Bloom filters for all
| submitted HN stories to preserve user privacy. There is a single
| C library underlying _both_ the code to generate Bloom filters,
| _and_ the actual extension code (it is compiled to WebAssembly so
| the C functions can be called from JavaScript). I have included
| an architecture overview in the README to help a prospective
| reader get through the important parts of the code, and have
| tried to justify design decisions via extensive comments within
| the source.
|
| I am happy to answer any questions! I'm also always looking for
| constructive criticism.
| stunt wrote:
| Just add a feature to submit a page when it isn't listed. I
| wouldn't be using it, but it make sense to have that too.
| jstrieb wrote:
| This is a good idea, thanks! The only feature I haven't added
| but eventually plan to is a right-click context menu with a
| bunch of additional actions. For example "search by page
| title" (as opposed to URL), "see all past submissions,"
| "manually add a site to the Bloom filter," etc.
|
| Your idea to add a "submit page" option would fit very nicely
| with this.
| ignoramous wrote:
| > _Rather than determining if the current site has been
| submitted by querying the Firebase /Algolia APIs with every
| page you visit, the extension contains regularly-updating Bloom
| filters for all submitted HN stories to preserve user privacy._
|
| Nice!
|
| I built a pi-hole esque stub dns-resolver that uses Bloom
| Filters generated from hostfiles (60 MiB, 5M entries --> 2 MiB
| with 1% false positives) and it worked like a charm. At some
| point, I also looked into Xor Filters which are apparently even
| lighter and faster but couldn't find a JavaScript
| implementation for it [0].
|
| I; however, stopped using Bloom Filters because its
| immutability meant building it over and over again which was a
| pain. Inverted Bloom Filters [1] or Spectral Bloom Filters [2]
| might have been useful since they can be updated in-place.
| Instead, I went for storing hostnames in a Finite State
| Automata [3], which while not as compact as Bloom Filters,
| could be updated in-place, are deterministic, and search
| faster. Likely not a fit for your use-case however.
|
| PinSketches, otoh, might be a fit for accomplishing efficient
| set reconciliation [4] of the filters you're distributing.
|
| [0] https://github.com/FastFilter/xorfilter#implementations-
| of-x...
|
| [1] https://www.youtube.com/watch?v=eIs9nJ-JFvA
|
| [2] https://pncnmnp.github.io/blogs/spectral-bloom-filters.html
|
| [3] http://stevehanov.ca/blog/?id=115
|
| [4] https://github.com/sipa/minisketch
| jstrieb wrote:
| Thanks for the links! I am not sure how well Xor Filters
| would fit here because they're meant to be immutable, whereas
| the extension has a content script that adds any submissions
| from /new or the front page to the Bloom filter so you can
| navigate directly back to the discussion.
|
| I had not heard of PinSketches before this, and will
| definitely look into them. As I mentioned in another comment:
| the way the project is set up, it should be possible to do a
| drop-in replacement with a better data structure, should one
| come along. Thanks!
| jacobobryant wrote:
| I just installed it. I don't have much to add, but wanted to
| say: wow, this is awesome. I appreciated the How It Works
| section. I made a tiny open-source web app in a similar vein,
| though it just uses the naive approach (querying algolia on
| each page load).
| jstrieb wrote:
| Thanks! Extensions are powerful and dangerous, so I wanted to
| do everything possible to make it easy to read the code for
| this one and understand how it works.
|
| I also really like using <details> tags so that interested
| users can dive into the documentation without everyone having
| to be hit with a massive README.
___________________________________________________________________
(page generated 2021-03-01 23:01 UTC)