[HN Gopher] AdFlush
___________________________________________________________________
AdFlush
Author : grac3
Score : 235 points
Date : 2024-05-28 06:32 UTC (1 days ago)
(HTM) web link (dl.acm.org)
(TXT) w3m dump (dl.acm.org)
| jarbus wrote:
| I didn't realize this was an active area of research, love this.
| alexcason wrote:
| Looks like this is the associated repo on GitHub:
| https://github.com/SKKU-SecLab/AdFlush
| KennyBlanken wrote:
| ....and of course only a chrome plugin is available.
| 3abiton wrote:
| > We tested AdFlush on a dataset of 10,000 real-world websites,
| achieving an F1 score of 0.98, thereby outperforming AdGraph (F1
| score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score:
| 0.84). Additionally, AdFlush significantly reduces computational
| overhead, requiring 56% less CPU and 80% less memory than
| AdGraph. We also assessed AdFlush's robustness against
| adversarial manipulations, demonstrating superior resilience with
| F1 scores ranging from 0.89 to 0.98
|
| Neat results, I wonder how it compares to uBO or the different
| blacklists. I assume it self-update with newer techniques and can
| detect certain patterns?
| Mkengine wrote:
| You can find the comparison to uBO under 5.5
| h4kor wrote:
| How does this compare to list based solutions? An
| overblocking/underblocking comparison would be great
| Havoc wrote:
| How realtime is this? Or well enough to not be noticeable while
| browsing
| mrbluecoat wrote:
| I'd be okay with a hybrid approach: lists for real-time
| blocking and machine learning for passive analysis to augment
| the lists over time.
| nomilk wrote:
| AdFlush (F1 Score: 0.98) seems to do better than some other
| adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90),
| and WTAgraph (F1 score: 0.84), but it begs the question: why not
| compare to the most popular adblockers: uBlock Origin, Adblock
| Plus etc.
|
| I think the authors want to compare apples with apples, so they
| only compare their algorithm to other adblockers that use
| algorithms, as opposed to those which use crowdsourced lists. The
| paper somewhat acknowledges this:
|
| > _However, manual maintenance of these filter lists requires
| significant human effort_
|
| Seems like one of those tasks where crowdsourcing scales so
| nicely (only one person has to report an ad for it to go into a
| crowdsourced list that blocks it for millions of others) that it
| makes an algorithmic approach unnecessary.
| 1oooqooq wrote:
| practical solutions don't get you published
| ko27 wrote:
| "Practical solutions" also leave you vulnerable to cat and
| mouse games against sites that block or bypass adblockers
| (even with ublock origin). The end game is to have
| heuristic/AI adblocking which would directly hook into
| browser rendering so that it becomes undetectable. Obviously
| leading browsers do not support this for extensions, but
| forking Chromium wouldn't be so hard.
| 1oooqooq wrote:
| "doing thing X work and everyone uses it, so bad actors
| invest time against things X. While thing Y isn't used by
| anyone so bad actors aren't spending time to work around
| it, q.e.d. we prove thing Y is better".
|
| i don't really buy your argument
| ko27 wrote:
| The argument is that Y is more robust.
| RamRodification wrote:
| _> only one person has to report an ad for it to go into a
| crowdsourced list that blocks it for millions of others_
|
| Is it that easy? Sounds very abusable
| rvnx wrote:
| Yes, and some list maintainers accept money to add or remove
| you from the list (officially, or officiously through a
| secondary maintainer, depending on the list), but otherwise
| it's no different than getting a domain marked as malware or
| phishing (with a few paid editors on Phishtank or
| VirusTotal).
|
| It's easier to get a domain added than removed. and for the
| "corruption"/"rackeetering" part, it's a "win-win" for the
| adblockers and the list maintainers.
|
| Adblockers also often pay browsers to be integrated by
| default (AdGuard, Adblock Plus, etc), and then they negociate
| with publishers to whitelist some domains (not necessarily
| the most obvious, can just be analytics).
|
| "We offer your domain to be unblocked on xx millions of
| devices by default, this will create you a uplift of revenue
| of +yy%"
| JAlexoid wrote:
| Humans are really the primary attack vectors for any
| security system.
| bityard wrote:
| Which lists do this? Do any of them ship with uBlock
| Origin?
| kmlx wrote:
| yes, one of my clients was hit by this and i was tasked with
| solving the situation.
|
| i had to create a ticket in a repo explaining why blocking a
| whole domain instead of a single subdomain was actually
| pretty bad. they approved it and reverted the change.
|
| finding where exactly i had to open the ticket and what to
| write was a "down the rabbit hole" experience.
| pbhjpbhj wrote:
| Domains are cheap, don't serve content on an ad domain
| maybe?
|
| Sounds like perhaps your task was to ensure a company's ads
| got through an adblocker?
| __jonas wrote:
| You could be right but you are definitely jumping to a
| conclusion here.
|
| The default lists used by uBlock for example include
| things like error tracking telemetry, Sentry for example.
|
| I can see why people want to block that stuff (privacy)
| but it's not exactly an "ad"
| kmlx wrote:
| my task was to rectify an issue in one of these crowd
| sourced lists of ad servers.
|
| they were blocking a whole domain instead of blocking the
| ad-serving subdomain.
|
| the issue was rectified, the main domain was replaced by
| the ad-serving subdomain.
| hathawsh wrote:
| Still, as pbhjpbhj suggested, if I were publishing both
| content and ads, I would consider publishing the ads on a
| different domain (not just a subdomain) to reduce
| technical issues. Domains with ugly names are very cheap.
| fckgw wrote:
| Yes, but the effects of that abuse are observable and easily
| fixable. If suddenly a whole site goes offline for a bunch of
| people a change like that is likely to get reversed very
| quickly.
| Cthulhu_ wrote:
| The filter based adblockers are at risk though, with Google's
| new extension thingy that - at least a few years ago, I haven't
| heard from it since - limited the amount of rules. If there's a
| non-rule based system that is 98% effective then that would
| circumvent the arbitrary rule limits that Google set.
| AlexandrB wrote:
| My understanding is that under manifest v3[1] _only_ a list
| of rules is allowed. An algorithmic ad blocker wouldn 't be
| able to work _at all_.
|
| [1] https://arstechnica.com/gadgets/2023/11/google-chrome-
| will-l...
| GioM wrote:
| This is true. Extensions currently (manifest v2) are able
| to evaluate net requests dynamically, and are able to
| modify requests according to a dynamic ruleset that the
| extension can retrieve from some filter list published on
| the internet.
|
| Under manifest v3, extensions are not able to dynamically
| inspect requests, instead, they may only apply rules to net
| requests. Even worse, there is a limitation of only 5000
| rules per extension!! [1]
|
| Even WORSE worse, under Chrome's manifest v3 rules, the
| extension cannot load any external code! Meaning that
| blocklists _must be packaged with the extension_. [2] Now,
| one might consider the reading of that link to no affect
| block lists, it 's not a "library" and it's not "code" so
| long as it's just a list of textual rules.... however,
| google considers the following to be a violation: "Building
| an interpreter to run complex commands fetched from a
| remote source, even if those commands are fetched as data".
| [3]
|
| Sneaky sneaky. An extension update (and hence new app store
| submission) is required to update filter lists.
|
| In other words, dynamic net requests are banned, and
| remotely-updated blocklists are banned as well.
|
| [1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-
| ons/Web...
|
| [2] https://developer.chrome.com/docs/extensions/develop/mi
| grate...
|
| [3] https://developer.chrome.com/docs/webstore/program-
| policies/...
| nolist_policy wrote:
| Chrome allows at least 30000 static rules + 30000 dynamic
| rules[1].
|
| [1] https://developer.chrome.com/docs/extensions/referenc
| e/api/d...
| avhon1 wrote:
| That's not enough. Just uBlock Origin's default list
| "uBlock filters - Ads" already accounts for over 38,000
| rules. EasyList is over 87,000!
| Gud wrote:
| They day Google starts blocking ad blocking users is the day
| the exodus starts from Google services.
| inversetelecine wrote:
| I think you're overestimating the number of people who 1)
| care and 2) use adblocking extensions or any extension for
| that matter.
|
| Google knows what will likely happen, and pays people lots
| of money to know.
| TheNewsIsHere wrote:
| I think you are unfortunately correct about this.
|
| I am consistently blown away when I inadvertently
| experience the Internet without ad-blocking. It's
| absolute garbage.
|
| I am sad that people are either OK with this or don't
| care. For many they don't know any better, and asking
| many of those same groups to install and manage plugins
| is a fraught request.
| hedora wrote:
| 32.8% of global users use an ad blocker. (33% of
| Americans.) [1]
|
| Chrome's market share is about 65% [2]. If their recent
| manifest changes eventually break ad blocking (which
| seems to be the goal), it'll lose a bunch of market share
| (I guess they're optimizing for short-term profit).
|
| [1] https://backlinko.com/ad-blockers-users [2]
| https://gs.statcounter.com/browser-market-share
| Spoom wrote:
| Without commenting on Google[1], I think this sort of
| thing is true in the short term but less true in the long
| term. I expect that, were Chrome to ban ad blockers,
| technical folks will start to teach non-technical folks
| in their orbit how to e.g. install Firefox to regain ad-
| blocking capability. I think it would take some number of
| years but there would be a pushback in the medium- to
| long-term.
|
| 1. Googler, opinion solely my own.
| 7734128 wrote:
| Next month
|
| https://www.pcmag.com/news/rip-ublock-origin-google-
| proceeds....
| treyd wrote:
| They'd massively alienate a large and motivated subset
| userbase with the ability to build viable alternatives to
| Google products or at least build more active means to
| cirvumvent their platform restrictions.
| rustcleaner wrote:
| Do you remember IE exodus to Firefox pre-2010? Yeah
| Google better watch its hyperback.
| MajimasEyepatch wrote:
| I suspect that such a move would draw significant scrutiny
| from regulators, potentially far outweighing any impacts
| from users switching browsers on their own.
| bityard wrote:
| I don't know what you mean. They are already blocking
| adblock users on YouTube and there is certainly no exodus
| happening there. A few people complain about it and get a
| handful of upvotes on social media from their friends, but
| it hasn't even come close to rising to "backlash" status.
| klaussilveira wrote:
| Isn't this the case for a bloom filter (vacuum maybe)? You
| can have very few rules.
| 4ggr0 wrote:
| I guess that's why uBO Lite exists :) I started using it a
| couple of months ago instead of Ublock Origin, and still
| haven't seen any ads since.
|
| https://github.com/uBlockOrigin/uBOL-home
| ladzoppelin wrote:
| I think eventually there is nothing that can stop certain
| adds on Chrome once specific API's are removed, even using
| manifest 3. Maybe someone could chime in on this as its
| really confusing now since Google keeps pushing back the
| date to remove manifest 2. (This might be outdated info)
| 4ggr0 wrote:
| Yeah, it generally does feel like a "Catch me if you can"
| situation. I'm sure that there will be different ad-
| blockers once those APIs are removed, as there seems to
| be a very strong desire from some people not to see ads.
|
| I hope we'll not end up in a DRM-like system where ads
| are somehow really baked in and content stops working for
| lay-people if they try to circumvent ads.
| downrightmike wrote:
| We'll create a shim to render the page in the background
| and use AI to remove ads and then serve the result to the
| user, at the least. Fuck ads and malvertising
| specialist wrote:
| Yes and: There will be a tipping point where it'll be
| easier to allow the content rather than blocking the
| garbage. Dynamic screen scrapping, more or less.
| Centigonal wrote:
| If Google's goal is to thwart adblockers by creating
| limitations on what browser extensions can do, then creating
| a browser extension that blocks ads within the current set of
| limitations is a temporary solution at best.
| avmich wrote:
| Google doesn't control the browser, user does.
| Centigonal wrote:
| Google controls the APIs that extension writers can use.
| They are currently using that control to impose limits on
| what adblocker extensions can do. [1][2]
|
| You could download the Chromium source and patch it to
| change the extensions APIs (or better, just use Firefox),
| but the majority of users won't do this, and extension
| writers aren't going to make a version for a patched
| Chromium browser unless it has significant market share
| and support.
|
| [1] https://nordvpn.com/blog/manifest-v3-ad-blockers/
|
| [2] https://www.eff.org/deeplinks/2021/12/chrome-users-
| beware-ma...
| ndriscoll wrote:
| You could always provide an extension that loads itself
| as a .dll/.so. I don't see much difference in friction
| between adding an extension through google's website vs.
| download setup.exe from somewhere. Of course like you
| say, using less user-hostile software is preferable.
| ysavir wrote:
| That might work for highly tech savvy people, but that's
| a very small minority of users. Google will still make ad
| blocking near-impossible for 99.99% of its users.
| babypuncher wrote:
| Such extensions would be trivially easy for Google to
| break with Chrome updates. You also cannot distribute an
| extension like that through any of the usual extension
| stores.
|
| Better to just use a browser that actually respects its
| users.
| jonathankoren wrote:
| Firefox has 2.9%. Safari has 18.12%. Everything else is
| Chrome or reskinned Chrome, with Chrome itself being
| 65.3%.
|
| Unless you're running that 20%, Google controls it, and
| they basically write the standards anymore.
| avmich wrote:
| Oh, of course if you run Google-written software without
| modifications, you're not really controlling it. So if
| you want to control it, either go inside and tinker with
| the code, or - easier? - switch to a non-Google browser.
|
| I thought this is rather obvious, at least for those
| worried about experience. Do you think all those who
| realize they're suffering from ads don't think about
| using non-Chromium browser?
| babypuncher wrote:
| Real easy problem to solve by just switching back to Firefox
| shpx wrote:
| The first thing you see when you open Firefox is an ad for
| Amazon and Expedia.
| _al_ wrote:
| there is an entire section in the paper sub-titled: _Comparison
| with uBlock Origin_..
| dale_glass wrote:
| The future is here.
|
| If I recall, in Permutation City there's some part where somebody
| deals with spam with AI. The user tries to use a simulation to
| listen to potential spam to filter it, while the spam tries to
| figure out whether a real person is listening to it and only
| tries to spam when a real person is there.
|
| Or something along those lines, it's been a long time since I
| read it.
| mannycalavera42 wrote:
| https://chromewebstore.google.com/search/adflush
|
| https://imgflip.com/i/8s3nur
| marcod wrote:
| The instructions are on their GitHub page
|
| https://github.com/SKKU-SecLab/AdFlush/tree/main?tab=readme-...
|
| But since the first webpage I tried still had huge ads, I
| turned uBlock back on ;)
| tjpnz wrote:
| I use a combination of UBO, PiHole and AdGuard on my mobile
| devices. Can't say I've seen an ad in the last year. Is this
| trying to solve an existing problem or speculating on where
| things could go in future?
| rgrmrts wrote:
| I'm curious why you're using 3 separate methods. Do you miss
| things with just one? AFAIK all 3 use similar block lists and
| are configurable.
|
| I'm building a pi-hole type solution for myself and essentially
| want all the filtering and blocking to happen at my firewall
| and not on my client (phone, laptop, tablet).
| bluish29 wrote:
| I think pi-hole (Adguard home) is useful dns level ad blocker
| which can be used on network/router level. But it is limited,
| UBO provides you more flexibility to block cosmetics and
| certain ads that cannot be done via dns. There will be
| overlap of course but it is worth it. I agree that adguard
| here seems redundant and UBO itself recommend against using
| another ad blocker to avoid interference and websites adblock
| discovery.
|
| However you might end up using
|
| 1. pi-hole on router
|
| 2. Adguard as device level DNS
|
| 3. UBO on Firefox (android only)
|
| It is possible but not recommended and wasteful. 1/2 and 3 is
| enough.
| tjpnz wrote:
| AdGuard is for things I take off the home network, for
| example when I'm at work. It's true I could use AdGuard for
| both scenarios but I do like the additional visibility and
| configurability Pi-Hole provides.
| zikduruqe wrote:
| Try AdGuardHome. https://github.com/AdguardTeam/AdGuardHome
|
| I basically have all my devices use it when I am on my
| network, and when I am off my network, my Wireguard
| connection (or Tailscale depending...) uses my home DNS
| server.
| Night_Thastus wrote:
| uBlock only works in web browsers. It doesn't work in phone
| apps, smart TVs, anything integrated into the OS, etc.
|
| That's why I use uBlock and PiHole, which I deem is enough.
| rpastuszak wrote:
| Oh boy, that didn't take long. Just last year I made Butter
| https://butter.sonnet.io as an excuse to talk about this:
|
| > This project is a half-serious, half-assed attempt to
| demonstrate that in the next few years the process of blocking
| this type of content could be almost entirely automated. Yes, it
| would be wasteful from a computational and human potential
| perspective, and otherwise completely unnecessary, but hey, more
| money would change hands!
| YmiYugy wrote:
| Without comparison to the accuracy of crowed sourced blocklists
| it's not that valuable. Maybe there is a group of hopelessly
| overworked blocklist maintainers/contributors, that I'm not aware
| of. If so, their cries for help don't seem to make the HN front
| page. From a user perspective, blocking banner ads feels like a
| basically solved problem. I think the real pain point here is
| that for large chunks of the web, there is no distinction between
| ads and content.
| JAlexoid wrote:
| There will never be a solution to native ads. It's part of the
| content you choose to consume, that someone produced.
|
| The only way to avoid native ads is to stop consuming content
| that relies on ads.
| yjftsjthsd-h wrote:
| That really depends on what you mean by "native ads"; if you
| mean "blog posts that appear legitimate but push a product"
| then maybe not (although I wouldn't _totally_ rule it out
| with LLMs), but if you just mean that the ads are inline I
| have to disagree since ex. SponsorBlock already exists.
| cess11 wrote:
| In some jurisdictions advertising has to be named as such,
| there it will be at least theoretically possible to create
| filters if the platform is compliant.
| nemomarx wrote:
| Stuff like sponsor block works pretty well? If the native ad
| is seperable from the rest of it you can just skip ahead, and
| most of those things are still a sign posted sponsor break
| for now. I can imagine extensions to do something similar in
| articles by removing affiliate links, etc.
| 93po wrote:
| or have LLMs recreate the content without the native ad
| beefnugs wrote:
| That is nonsense, if we know about 10 exact brands by name,
| then we can block their mentioning anywhere
| YmiYugy wrote:
| I think it depends on what solution space you are willing to
| explore. There is the possibility for regulatory action that
| restricts native ads. It's seems plausible that a flood of AI
| content tanks the prices for native ads, so some might pivot
| to original content + regular ads, which might also become
| more profitable if regulatory action weakens the oligopolies
| of that space. Aside from high level market shifts and
| regulatory action, there is of course also the possibility of
| technical solutions that can help you to avoid native ads.
| gastonmorixe wrote:
| Nice! I'd love to know if AI-Ad / tracking / telemetry / etc
| blocking could be improved for MITM network layer filtering not
| just the browser.
| seized wrote:
| > We tested AdFlush on a dataset of 10,000 real-world websites,
| achieving an F1 score of 0.98, thereby outperforming AdGraph (F1
| score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score:
| 0.84).
|
| ... Has anyone even heard of these ad blockers before?
| flakiness wrote:
| These are all academic research projects.
| pradn wrote:
| What's fascinating here is AdFlush is a classical feature
| engineering approach: define a bunch of features on the data
| manually, and then use ML to figure out the most useful /
| impactful ones. This is not the "throw terabytes of data and see
| what happens" approach we see with LLMs. It's a bit funny to even
| point this out because I don't recall the last time a feature-
| engineered ML project made it to the HN front page.
|
| Features can be brittle, but they are understandable. The paper's
| appendix [1] lists the 27 features that will likely make a
| request/resource "ad-related". These include interesting ones
| like JS AST depth, average JS identifier length, the "bracket to
| dot notations ration in JS", and a number of graph measures for
| the graph of scripts.
|
| And contrary to what comments in this thread are saying, they do
| compare against a blocklist-based adblocker: uBlock Origin.
| That's in section 5.5. They say they outperform uBlock Origin.
| But even they say they don't reduce overall page time bc their
| algorithm is expensive.
|
| [1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698
| tofof wrote:
| More specifically, page load time was 2.7 seconds without
| adblocker, decreased to 2.1 with uBlock Origin, but increased
| by 250% to 6.6 seconds with AdFlush, or increased to 3.4
| seconds with AdFlush retaining prior predictions.
|
| The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs
| uBlock Origin, and it's not clear to me that this is a
| statistically significant difference. They do not claim it is.
| pradn wrote:
| Thanks for extracting the details. It doesn't seem like
| they'll be competitive with blocklist-based approaches like
| uBlock Origin, because their features are fundamentally
| expensive to compute - parsing JS and such, not just matching
| URLs against a list of regexes.
| aembleton wrote:
| Seems like it could work in the background to build up new
| rules for uBlockOrigin to deploy
| blacksmith_tb wrote:
| That seems to argue for a first pass with a blocklist to
| filter out the well-known ad providers, and then possibly a
| followup step with the ML to catch things that are trying
| harder to slip by? But the extensions would have to cooperate
| to make that possible.
| andirk wrote:
| I like the strategy of using flags to say "look into this
| suspicious part of the code" over a hardcoded block list. And
| also block shitty JS via "JS AST depth, average JS identifier
| length" etc even if it's not an ad but just bad code.
|
| For Brave browser users, you can see what hardcoded lists
| you're using at brave://adblock .
|
| As for the whole cat and mouse game, how to detect an "ad" if
| it's served with the content fully sever-side? Now _that_ needs
| some serious ML to decipher.
| dylan604 wrote:
| > how to detect an "ad" if it's served with the content fully
| sever-side? Now _that_ needs some serious ML to decipher.
|
| This has been my red line on where I will allow ads vs
| blocking them. If a site is hosting their own ads, that's
| acceptable to me. If they are using an ad provider, that is
| not. The newspaper example is my go to. If you wanted your ad
| in a paper, you called the paper and took out an ad. Today's
| equivalent would be every time you opened the paper, a slight
| delay while it randomly chose the highest bids for the ad
| space while potentially also inserting something that would
| slowly eat your hands. That's a nope.
|
| You are obviously in the camp that feels entitled to be able
| to read anything at anytime without allowing for a website to
| earn money by wanting to block all ads regardless of their
| origin.
| cimnine wrote:
| So, this begs the question when we'll see ML put in place to
| avoid AdBlocker detection. Or ads as we know them just disappear
| from the web and are replaced with other kinds of ML-enabled ads.
| I imagine deep-fake models used for interchangeable product
| placement in videos or pictures or so.
| Night_Thastus wrote:
| Always a joy to see efforts in the ongoing battle against
| advertisements.
|
| There are few things I feel radical about, and Ads are one of
| them. I believe they are a drain in several ways:
|
| They waste computational resources and electricity on both ends.
| They compromise the visual design and layout of webpages. They
| distract and take mental energy away from the user. They make the
| internet (and anywhere ads exist) more "ugly" and less
| aesthetically pleasing - which negatively impacts mental health.
| They often sell low-quality services/products or outright scams,
| which harms those least educated and poorest individuals.
|
| Death to advertisement! On billboards! On television! On the
| internet!
|
| Ads are a parasite on the human mind that need to go away,
| forever.
| btbuildem wrote:
| They are a scourge and a tell-tale sign that we've grown far
| beyond excess and into absurd territory where more effort is
| spent on bending our minds to consume a thing that it took to
| make the thing in the first place.
| p3rls wrote:
| Death to small media companies! You should have gotten some VC
| money if you wanted to make products for people, you poor
| pieces of shit.
| karaterobot wrote:
| Blocking image ads seems like a relatively well-solved problem. I
| mean, speaking as someone who can't stand ads, I don't see very
| many of them anymore when I'm on desktop.
|
| The harder, more pernicious type of ads are the modals that pop
| up when your cursor moves toward the back button, or when you
| scroll down a certain distance on the page. "Wait! Before you go,
| take a moment to give us your email address!"
|
| Those can be blocked, but by the time you've seen them, they've
| already done all the damage they can do--which is to say, they've
| annoyed you.
|
| I wish somebody could come up with a way to detect and stop them.
| I spent an afternoon trying to come up with reusable techniques
| to detect these popups, but there are just too many
| possibilities.
| flakiness wrote:
| This can be a Copilot+PC's killer feature :-)
___________________________________________________________________
(page generated 2024-05-29 23:00 UTC)