[HN Gopher] Proof-of-work to protect lore.kernel.org and git.ker...
       ___________________________________________________________________
        
       Proof-of-work to protect lore.kernel.org and git.kernel.org against
       AI crawlers
        
       Author : luu
       Score  : 37 points
       Date   : 2025-04-02 21:59 UTC (1 hours ago)
        
 (HTM) web link (social.kernel.org)
 (TXT) w3m dump (social.kernel.org)
        
       | skeptrune wrote:
       | I am really enjoying seeing this use-case for PoW gain
       | popularity. Hopefully it normalizes the technique and it can
       | start to become more common for anti-spam systems.
        
         | Sesse__ wrote:
         | Why do you assume that spammers and AI crawlers do not have
         | access to large amounts of compute? You can make it more
         | expensive, but these crawlers already have made it clear that
         | they do not care particularly about cost (or they would not
         | crawl so completely indiscriminately).
        
           | arccy wrote:
           | um no? sending an http request is quite a bit different than
           | some forced pow calculation
        
         | charcircuit wrote:
         | Spammers are willing to dedicate more processing power than
         | regular users. It doesn't make sense to do. It's either
         | meaningless or ruins the user experience for normal people.
        
       | cowboylowrez wrote:
       | >Difficulty is set at 4 leading zeroes, unless you're coming from
       | US in which case there's also a tariff of 5 more leading zeroes.
       | 
       | isn't linux afraid of retaliatory tariffs? should I stock up on
       | linuxes just in case? I've already beefed up toilet paper
       | reserves.
        
         | perihelions wrote:
         | If they go overboard people will start switching to FreeTrade-
         | BSD
        
         | lionkor wrote:
         | Sanctioning Linux, due to its totalitarian government (bdfl)
        
       | chr15m wrote:
       | This is infinitely better than using CloudFlare. I hope it works
       | and more people adopt it.
        
         | ToucanLoucan wrote:
         | Genuine question: How? Is there a downside to CloudFlare I'm
         | not aware of?
        
           | Rebelgecko wrote:
           | Cloudflare will just straight up block me sometimes, with no
           | way to see the page. For whatever reason this used to happen
           | to me a lot with car dealer websites. Maybe checking lots of
           | different dealerships' inventory looking for a specific car
           | made me look like a bot.
           | 
           | And even in cases where Cloudflare forces a captcha, this POW
           | ran much more quickly than I could solve one by hand
        
             | nosioptar wrote:
             | It was nearly instant on my shitty old phone.
        
           | megous wrote:
           | Blocking me from contributing to any gitlab hosted project
           | for ~4 years already. I wanted to send a glib2 patch today,
           | again realized that, no, I can't still sign up to CF
           | protected gitlab instances. :)
           | 
           | Makes me appretiate the Linux kernel mailing list based
           | contribution method. Very open, very simple.
           | 
           | At this point I guess CF will never fix compatibility bugs in
           | their interstitial pages, and in captcha, with non-default
           | setup of Firefox.
        
             | g-b-r wrote:
             | For what it's worth, ensuring that JIT is enabled for
             | challenges.cloudflare.com can help a lot.
             | 
             | No, not to the point of making it bearable, but at least it
             | becomes rarer for it to take minutes.
        
           | g-b-r wrote:
           | It routinely takes at least a minute overall on gitlab, from
           | a budget phone.
           | 
           | Other sites with Cloudflare only take some nice twenty
           | seconds, others just never ever let you go through.
           | 
           | Those checks are a serious contender for worse thing ever
           | happened to the web.
        
       | perihelions wrote:
       | Any chance there's some way, going forwards, to dual-purpose
       | these webserver PoW's, so they solve some socially beneficial
       | compute problem at the same time? I recall reading ideas like
       | that in the early days of cryptocurrency, before humans ruined
       | it.
       | 
       | - Server: here's a bit of a cancer protein
       | 
       | - Client: okay, here's some compute
       | 
       | - Verifier: the compute checks out
       | 
       | - Server: okay, you are authorized to access cat.gif
        
         | wizzwizz4 wrote:
         | It's difficult to break important problems down into NP-hard
         | problems. Search problems are, afaik, the current state-of-the-
         | art; but to my knowledge "a bit of a cancer protein" isn't
         | useful, and "an entire cancer protein" would take a few hours
         | at _least_.
        
           | xnx wrote:
           | Very true. Perhaps a better system would be to credit
           | "points" for solving fewer/larger problems that could be
           | spent a bit at a time? That sounds even more complex than
           | charging regular money though.
        
       | sva_ wrote:
       | > Difficulty is set at 4 leading zeroes, unless you're coming
       | from US in which case there's also a tariff of 5 more leading
       | zeroes.
       | 
       | > You can see it in action on this recently decommissioned system
       | I'm using for testing purposes: https://ams.source.kernel.org/
       | 
       | Something seriously wrong with it. When I run it with my normal
       | German/EU home connection, it does ~17k iterations. When I run it
       | with a US Atlanta VPN, it only takes ~6k iterations.
        
         | Rebelgecko wrote:
         | I think that part was a joke
        
           | lionkor wrote:
           | I think OP, like me, wishes it wasn't (it would be very
           | funny)
        
         | xena wrote:
         | It's luck-based, I'm working on making a check that's more
         | deterministic, but I'm also trying to figure out how to not
         | lock out big-endian systems in the process.
         | 
         | I may have to just give up on that though :(
        
       | unsnap_biceps wrote:
       | It appears they're using https://github.com/TecharoHQ/anubis for
       | the proof of work proxy
        
         | stevenhuang wrote:
         | I enjoyed their succinct project description:
         | 
         | > Weighs the soul of incoming HTTP requests using proof-of-work
         | to stop AI crawlers
        
       | abetusk wrote:
       | I think these solutions are really novel and interesting but I'd
       | like to point out that this is literally one of the use cases for
       | cryptocurrency, or microtransactions in general.
       | Cryptocurrencies, at least the PoW ones, offload the proof-of-
       | work so that it doesn't need to be done in real time.
       | 
       | Paying fractions of a penny to view websites has minimal impact
       | on average users but is punishing to spammers.
        
         | mschuster91 wrote:
         | The problem is, it kills anonymity - it allows at the very
         | least the government to tie _each_ web page visit, each
         | resource load, back to a real person.
         | 
         | And no, "anonymous mixer" services don't work either. They're
         | yet another layer of useless profiteering middlemen, which the
         | web already has more than enough of.
        
           | pitaj wrote:
           | What about Monero/XMR? Isn't is fully anonymous by design?
        
         | solid_fuel wrote:
         | This is one of the use-cases of proof-of-work, but the rest of
         | what makes something cryptocurrency isn't necessary. There is
         | no need for a blockchain here, and the cost can be paid
         | directly in compute time.
        
         | sva_ wrote:
         | I'd rather just be able to click a link and not have to worry
         | about wallet, transaction fees, keeping my keys safe, etc.
        
         | klysm wrote:
         | > Paying fractions of a penny to view websites has minimal
         | impact
         | 
         | Although true in an ideal financial sense, it's demonstrably
         | false because having a pay wall of any kind will severely limit
         | usage.
        
       | bhouston wrote:
       | I am not sure we want to prevent AI crawlers but rather we want
       | the crawlers to just not negatively affect the websites.
       | 
       | We want AI automation everywhere and crawling is important.
        
         | wnoise wrote:
         | The DoSes are AI-for-training-data, not AI-for-automation. AI-
         | for-automation is for the time being going to be the same order
         | of magnitude as standard activity.
        
         | budududuroiu wrote:
         | > We want AI automation everywhere
         | 
         | Who is we? I definitely don't want AI automation everywhere
        
       | shanemhansen wrote:
       | I wonder how well this will actually work.
       | 
       | The core problem is that alot of crawlers aren't spending their
       | money. They are part of a botnet so they are just spending the
       | victim's money.
       | 
       | But hopefully most of the crawlers aren't botnets or funded by
       | free VC money so they have an economic incentive to avoid
       | crawling systems requiring proof-of-work.
        
       | xena wrote:
       | This is absolutely surreal to see in action! I hope that I can
       | manage to afford to not have to do my dayjob anymore.
        
         | dharmab wrote:
         | Context for others: Xe is the author of the software used for
         | this (https://anubis.techaro.lol/docs/)
        
       | sakras wrote:
       | Maybe I'm missing something, but why do people expect PoW to be
       | effective against companies who's whole existence revolves around
       | acquiring more compute?
        
         | xmcqdpt2 wrote:
         | I was under the impression that the bad crawlers exist because
         | it's cheaper to reload the data all the time than to cache it
         | somewhere. If this changes the cost balance, those companies
         | might decide to download only once instead of over and over
         | again, which would probably be satisfactory to everyone.
        
       ___________________________________________________________________
       (page generated 2025-04-02 23:01 UTC)