[HN Gopher] The mounting cost of stale ad blocking rules (2018)
___________________________________________________________________
The mounting cost of stale ad blocking rules (2018)
Author : fossislife
Score : 20 points
Date : 2021-06-13 18:13 UTC (4 hours ago)
(HTM) web link (brave.com)
(TXT) w3m dump (brave.com)
| yummypaint wrote:
| Can someone explain the steplike shapes in the curve in the "time
| to filter a request" plot? I was under the impression that ad
| blockers used hash tables or a similar structure which is
| agnostic of the address being checked with O(1). Are these some
| kind of cache misses?
| edoceo wrote:
| It's not a simple address check (lookup) it's a pattern-match -
| like loops around a bunch of regex. Those 1000+ stale patterns
| slow it down.
| mdoms wrote:
| The steps are an artifact of the X-axis, which is logarithmic.
| The first step occurs from 1 resource to 2 resources, on so on.
|
| A hash table can't be used for this kind of check because it
| uses patterns, so resources need to be compared on a pattern.
| Although I'm sure there are special cases, like hash tables of
| domain names, which cover a large portion rules.
| pmoriarty wrote:
| _"... we applied EasyList to both the Alexa 5k, a curated list of
| the 5,000 most popular sites on the web, and a random sampling of
| 5,000 sites from the Alexa 1,000,000 (ensuring no duplicate
| sites). Our measurement was in several steps:_
|
| _1. Use Selenium and the DevTools Protocol to record every URL
| requested when rendering and executing a website._
|
| _2. Add additional automation to randomly select three distinct
| same-domain URLs from anchor tags on a page._
|
| _3. Used the above automation to visit the homepage of each
| site, and a maximum of three child pages, and recorded all URLs
| requested for images, script files, and other web resources._
|
| _4. Determine which of those URLs would be blocked by the
| version of EasyList fetched on that day, using Brave 's optimized
| ad-block implementation._
|
| ...
|
| _We found that the vast majority of EasyList rules are not used
| when browsing popular websites; 3,268 of 39,198 (~8%) of network
| and exception rules were used during our crawls (these
| measurements exclude element rules). "_
|
| That doesn't mean that EasyList is not useful for browsing the
| rest of the internet.
| seoaeu wrote:
| Yeah, there are likely statistical methods you could use to
| estimate the number of stale rules, but "try the top 5000 and
| ~0.5% of the rest" isn't that
| gorhill wrote:
| Much work has been done since this article came out to remove
| stale filters, see:
|
| https://twitter.com/fanboynz/status/1344796683612299265
___________________________________________________________________
(page generated 2021-06-13 23:01 UTC)