[HN Gopher] Show HN: Hackernews-Button - Bloom Filter-based, pri...
       ___________________________________________________________________
        
       Show HN: Hackernews-Button - Bloom Filter-based, private HN
       browsing extension
        
       Author : jstrieb
       Score  : 43 points
       Date   : 2021-03-01 11:09 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | [deleted]
        
       | sktrdie wrote:
       | Have you compared this with using Merkle Trees? Would be curious
       | how big the size of a merkle branch is in comparison to the bloom
       | filter set.
        
         | jstrieb wrote:
         | I haven't investigated this (or considered Merkle Trees), but
         | will follow up if/when I get a chance to test it.
         | 
         | I've tried to design the code so it shouldn't be too hard to do
         | a drop-in replacement of Bloom filters with something else if a
         | better data structure comes along.
        
       | jstrieb wrote:
       | Hello! This Firefox extension shows approximately how many points
       | (if any) the current page scored on Hacker News using Bloom
       | filters. Clicking the extension navigates to the best discussion
       | for the current page. It is great for reading interesting
       | comments about things you may not have known were submitted to HN
       | in the past. I have been using it for the past two months while
       | building it, and honestly it has been really fun to read
       | insightful comments I otherwise never would have seen. It's also
       | handy for figuring out whether to submit a page, since it shows
       | whether it's been submitted before or is new territory.
       | 
       | Rather than determining if the current site has been submitted by
       | querying the Firebase/Algolia APIs with every page you visit, the
       | extension contains regularly-updating Bloom filters for all
       | submitted HN stories to preserve user privacy. There is a single
       | C library underlying _both_ the code to generate Bloom filters,
       | _and_ the actual extension code (it is compiled to WebAssembly so
       | the C functions can be called from JavaScript). I have included
       | an architecture overview in the README to help a prospective
       | reader get through the important parts of the code, and have
       | tried to justify design decisions via extensive comments within
       | the source.
       | 
       | I am happy to answer any questions! I'm also always looking for
       | constructive criticism.
        
         | stunt wrote:
         | Just add a feature to submit a page when it isn't listed. I
         | wouldn't be using it, but it make sense to have that too.
        
           | jstrieb wrote:
           | This is a good idea, thanks! The only feature I haven't added
           | but eventually plan to is a right-click context menu with a
           | bunch of additional actions. For example "search by page
           | title" (as opposed to URL), "see all past submissions,"
           | "manually add a site to the Bloom filter," etc.
           | 
           | Your idea to add a "submit page" option would fit very nicely
           | with this.
        
         | ignoramous wrote:
         | > _Rather than determining if the current site has been
         | submitted by querying the Firebase /Algolia APIs with every
         | page you visit, the extension contains regularly-updating Bloom
         | filters for all submitted HN stories to preserve user privacy._
         | 
         | Nice!
         | 
         | I built a pi-hole esque stub dns-resolver that uses Bloom
         | Filters generated from hostfiles (60 MiB, 5M entries --> 2 MiB
         | with 1% false positives) and it worked like a charm. At some
         | point, I also looked into Xor Filters which are apparently even
         | lighter and faster but couldn't find a JavaScript
         | implementation for it [0].
         | 
         | I; however, stopped using Bloom Filters because its
         | immutability meant building it over and over again which was a
         | pain. Inverted Bloom Filters [1] or Spectral Bloom Filters [2]
         | might have been useful since they can be updated in-place.
         | Instead, I went for storing hostnames in a Finite State
         | Automata [3], which while not as compact as Bloom Filters,
         | could be updated in-place, are deterministic, and search
         | faster. Likely not a fit for your use-case however.
         | 
         | PinSketches, otoh, might be a fit for accomplishing efficient
         | set reconciliation [4] of the filters you're distributing.
         | 
         | [0] https://github.com/FastFilter/xorfilter#implementations-
         | of-x...
         | 
         | [1] https://www.youtube.com/watch?v=eIs9nJ-JFvA
         | 
         | [2] https://pncnmnp.github.io/blogs/spectral-bloom-filters.html
         | 
         | [3] http://stevehanov.ca/blog/?id=115
         | 
         | [4] https://github.com/sipa/minisketch
        
           | jstrieb wrote:
           | Thanks for the links! I am not sure how well Xor Filters
           | would fit here because they're meant to be immutable, whereas
           | the extension has a content script that adds any submissions
           | from /new or the front page to the Bloom filter so you can
           | navigate directly back to the discussion.
           | 
           | I had not heard of PinSketches before this, and will
           | definitely look into them. As I mentioned in another comment:
           | the way the project is set up, it should be possible to do a
           | drop-in replacement with a better data structure, should one
           | come along. Thanks!
        
         | jacobobryant wrote:
         | I just installed it. I don't have much to add, but wanted to
         | say: wow, this is awesome. I appreciated the How It Works
         | section. I made a tiny open-source web app in a similar vein,
         | though it just uses the naive approach (querying algolia on
         | each page load).
        
           | jstrieb wrote:
           | Thanks! Extensions are powerful and dangerous, so I wanted to
           | do everything possible to make it easy to read the code for
           | this one and understand how it works.
           | 
           | I also really like using <details> tags so that interested
           | users can dive into the documentation without everyone having
           | to be hit with a massive README.
        
       ___________________________________________________________________
       (page generated 2021-03-01 23:01 UTC)