[HN Gopher] The mounting cost of stale ad blocking rules (2018)
       ___________________________________________________________________
        
       The mounting cost of stale ad blocking rules (2018)
        
       Author : fossislife
       Score  : 20 points
       Date   : 2021-06-13 18:13 UTC (4 hours ago)
        
 (HTM) web link (brave.com)
 (TXT) w3m dump (brave.com)
        
       | yummypaint wrote:
       | Can someone explain the steplike shapes in the curve in the "time
       | to filter a request" plot? I was under the impression that ad
       | blockers used hash tables or a similar structure which is
       | agnostic of the address being checked with O(1). Are these some
       | kind of cache misses?
        
         | edoceo wrote:
         | It's not a simple address check (lookup) it's a pattern-match -
         | like loops around a bunch of regex. Those 1000+ stale patterns
         | slow it down.
        
         | mdoms wrote:
         | The steps are an artifact of the X-axis, which is logarithmic.
         | The first step occurs from 1 resource to 2 resources, on so on.
         | 
         | A hash table can't be used for this kind of check because it
         | uses patterns, so resources need to be compared on a pattern.
         | Although I'm sure there are special cases, like hash tables of
         | domain names, which cover a large portion rules.
        
       | pmoriarty wrote:
       | _"... we applied EasyList to both the Alexa 5k, a curated list of
       | the 5,000 most popular sites on the web, and a random sampling of
       | 5,000 sites from the Alexa 1,000,000 (ensuring no duplicate
       | sites). Our measurement was in several steps:_
       | 
       |  _1. Use Selenium and the DevTools Protocol to record every URL
       | requested when rendering and executing a website._
       | 
       |  _2. Add additional automation to randomly select three distinct
       | same-domain URLs from anchor tags on a page._
       | 
       |  _3. Used the above automation to visit the homepage of each
       | site, and a maximum of three child pages, and recorded all URLs
       | requested for images, script files, and other web resources._
       | 
       |  _4. Determine which of those URLs would be blocked by the
       | version of EasyList fetched on that day, using Brave 's optimized
       | ad-block implementation._
       | 
       | ...
       | 
       |  _We found that the vast majority of EasyList rules are not used
       | when browsing popular websites; 3,268 of 39,198 (~8%) of network
       | and exception rules were used during our crawls (these
       | measurements exclude element rules). "_
       | 
       | That doesn't mean that EasyList is not useful for browsing the
       | rest of the internet.
        
         | seoaeu wrote:
         | Yeah, there are likely statistical methods you could use to
         | estimate the number of stale rules, but "try the top 5000 and
         | ~0.5% of the rest" isn't that
        
       | gorhill wrote:
       | Much work has been done since this article came out to remove
       | stale filters, see:
       | 
       | https://twitter.com/fanboynz/status/1344796683612299265
        
       ___________________________________________________________________
       (page generated 2021-06-13 23:01 UTC)