[HN Gopher] Proof-of-work to protect lore.kernel.org and git.ker...
___________________________________________________________________
Proof-of-work to protect lore.kernel.org and git.kernel.org against
AI crawlers
Author : luu
Score : 37 points
Date : 2025-04-02 21:59 UTC (1 hours ago)
(HTM) web link (social.kernel.org)
(TXT) w3m dump (social.kernel.org)
| skeptrune wrote:
| I am really enjoying seeing this use-case for PoW gain
| popularity. Hopefully it normalizes the technique and it can
| start to become more common for anti-spam systems.
| Sesse__ wrote:
| Why do you assume that spammers and AI crawlers do not have
| access to large amounts of compute? You can make it more
| expensive, but these crawlers already have made it clear that
| they do not care particularly about cost (or they would not
| crawl so completely indiscriminately).
| arccy wrote:
| um no? sending an http request is quite a bit different than
| some forced pow calculation
| charcircuit wrote:
| Spammers are willing to dedicate more processing power than
| regular users. It doesn't make sense to do. It's either
| meaningless or ruins the user experience for normal people.
| cowboylowrez wrote:
| >Difficulty is set at 4 leading zeroes, unless you're coming from
| US in which case there's also a tariff of 5 more leading zeroes.
|
| isn't linux afraid of retaliatory tariffs? should I stock up on
| linuxes just in case? I've already beefed up toilet paper
| reserves.
| perihelions wrote:
| If they go overboard people will start switching to FreeTrade-
| BSD
| lionkor wrote:
| Sanctioning Linux, due to its totalitarian government (bdfl)
| chr15m wrote:
| This is infinitely better than using CloudFlare. I hope it works
| and more people adopt it.
| ToucanLoucan wrote:
| Genuine question: How? Is there a downside to CloudFlare I'm
| not aware of?
| Rebelgecko wrote:
| Cloudflare will just straight up block me sometimes, with no
| way to see the page. For whatever reason this used to happen
| to me a lot with car dealer websites. Maybe checking lots of
| different dealerships' inventory looking for a specific car
| made me look like a bot.
|
| And even in cases where Cloudflare forces a captcha, this POW
| ran much more quickly than I could solve one by hand
| nosioptar wrote:
| It was nearly instant on my shitty old phone.
| megous wrote:
| Blocking me from contributing to any gitlab hosted project
| for ~4 years already. I wanted to send a glib2 patch today,
| again realized that, no, I can't still sign up to CF
| protected gitlab instances. :)
|
| Makes me appretiate the Linux kernel mailing list based
| contribution method. Very open, very simple.
|
| At this point I guess CF will never fix compatibility bugs in
| their interstitial pages, and in captcha, with non-default
| setup of Firefox.
| g-b-r wrote:
| For what it's worth, ensuring that JIT is enabled for
| challenges.cloudflare.com can help a lot.
|
| No, not to the point of making it bearable, but at least it
| becomes rarer for it to take minutes.
| g-b-r wrote:
| It routinely takes at least a minute overall on gitlab, from
| a budget phone.
|
| Other sites with Cloudflare only take some nice twenty
| seconds, others just never ever let you go through.
|
| Those checks are a serious contender for worse thing ever
| happened to the web.
| perihelions wrote:
| Any chance there's some way, going forwards, to dual-purpose
| these webserver PoW's, so they solve some socially beneficial
| compute problem at the same time? I recall reading ideas like
| that in the early days of cryptocurrency, before humans ruined
| it.
|
| - Server: here's a bit of a cancer protein
|
| - Client: okay, here's some compute
|
| - Verifier: the compute checks out
|
| - Server: okay, you are authorized to access cat.gif
| wizzwizz4 wrote:
| It's difficult to break important problems down into NP-hard
| problems. Search problems are, afaik, the current state-of-the-
| art; but to my knowledge "a bit of a cancer protein" isn't
| useful, and "an entire cancer protein" would take a few hours
| at _least_.
| xnx wrote:
| Very true. Perhaps a better system would be to credit
| "points" for solving fewer/larger problems that could be
| spent a bit at a time? That sounds even more complex than
| charging regular money though.
| sva_ wrote:
| > Difficulty is set at 4 leading zeroes, unless you're coming
| from US in which case there's also a tariff of 5 more leading
| zeroes.
|
| > You can see it in action on this recently decommissioned system
| I'm using for testing purposes: https://ams.source.kernel.org/
|
| Something seriously wrong with it. When I run it with my normal
| German/EU home connection, it does ~17k iterations. When I run it
| with a US Atlanta VPN, it only takes ~6k iterations.
| Rebelgecko wrote:
| I think that part was a joke
| lionkor wrote:
| I think OP, like me, wishes it wasn't (it would be very
| funny)
| xena wrote:
| It's luck-based, I'm working on making a check that's more
| deterministic, but I'm also trying to figure out how to not
| lock out big-endian systems in the process.
|
| I may have to just give up on that though :(
| unsnap_biceps wrote:
| It appears they're using https://github.com/TecharoHQ/anubis for
| the proof of work proxy
| stevenhuang wrote:
| I enjoyed their succinct project description:
|
| > Weighs the soul of incoming HTTP requests using proof-of-work
| to stop AI crawlers
| abetusk wrote:
| I think these solutions are really novel and interesting but I'd
| like to point out that this is literally one of the use cases for
| cryptocurrency, or microtransactions in general.
| Cryptocurrencies, at least the PoW ones, offload the proof-of-
| work so that it doesn't need to be done in real time.
|
| Paying fractions of a penny to view websites has minimal impact
| on average users but is punishing to spammers.
| mschuster91 wrote:
| The problem is, it kills anonymity - it allows at the very
| least the government to tie _each_ web page visit, each
| resource load, back to a real person.
|
| And no, "anonymous mixer" services don't work either. They're
| yet another layer of useless profiteering middlemen, which the
| web already has more than enough of.
| pitaj wrote:
| What about Monero/XMR? Isn't is fully anonymous by design?
| solid_fuel wrote:
| This is one of the use-cases of proof-of-work, but the rest of
| what makes something cryptocurrency isn't necessary. There is
| no need for a blockchain here, and the cost can be paid
| directly in compute time.
| sva_ wrote:
| I'd rather just be able to click a link and not have to worry
| about wallet, transaction fees, keeping my keys safe, etc.
| klysm wrote:
| > Paying fractions of a penny to view websites has minimal
| impact
|
| Although true in an ideal financial sense, it's demonstrably
| false because having a pay wall of any kind will severely limit
| usage.
| bhouston wrote:
| I am not sure we want to prevent AI crawlers but rather we want
| the crawlers to just not negatively affect the websites.
|
| We want AI automation everywhere and crawling is important.
| wnoise wrote:
| The DoSes are AI-for-training-data, not AI-for-automation. AI-
| for-automation is for the time being going to be the same order
| of magnitude as standard activity.
| budududuroiu wrote:
| > We want AI automation everywhere
|
| Who is we? I definitely don't want AI automation everywhere
| shanemhansen wrote:
| I wonder how well this will actually work.
|
| The core problem is that alot of crawlers aren't spending their
| money. They are part of a botnet so they are just spending the
| victim's money.
|
| But hopefully most of the crawlers aren't botnets or funded by
| free VC money so they have an economic incentive to avoid
| crawling systems requiring proof-of-work.
| xena wrote:
| This is absolutely surreal to see in action! I hope that I can
| manage to afford to not have to do my dayjob anymore.
| dharmab wrote:
| Context for others: Xe is the author of the software used for
| this (https://anubis.techaro.lol/docs/)
| sakras wrote:
| Maybe I'm missing something, but why do people expect PoW to be
| effective against companies who's whole existence revolves around
| acquiring more compute?
| xmcqdpt2 wrote:
| I was under the impression that the bad crawlers exist because
| it's cheaper to reload the data all the time than to cache it
| somewhere. If this changes the cost balance, those companies
| might decide to download only once instead of over and over
| again, which would probably be satisfactory to everyone.
___________________________________________________________________
(page generated 2025-04-02 23:01 UTC)