Post AxCT4nBtDcm6qW0hM0 by Codeberg@social.anoxinon.de
 (DIR) More posts by Codeberg@social.anoxinon.de
 (DIR) Post #AxCT4laHBz5rrZP95U by Codeberg@social.anoxinon.de
       2025-08-15T16:43:42Z
       
       2 likes, 3 repeats
       
       We apologize for a period of extreme slowness today. The army of AI crawlers just leveled up and hit us very badly.The good news: We're keeping up with the additional load of new users moving to Codeberg. Welcome aboard, we're happy to have you here. After adjusting the AI crawler protections, performance significantly improved again.
       
 (DIR) Post #AxCT4mOy9VdwOn7d6O by Codeberg@social.anoxinon.de
       2025-08-15T16:45:43Z
       
       2 likes, 0 repeats
       
       It seems like the AI crawlers learned how to solve the Anubis challenges. Anubis is a tool hosted on our infrastructure that requires browsers to do some heavy computation before accessing Codeberg again. It really saved us tons of nerves over the past months, because it saved us from manually maintaining blocklists to having a working detection for "real browsers" and "AI crawlers".
       
 (DIR) Post #AxCT4nBtDcm6qW0hM0 by Codeberg@social.anoxinon.de
       2025-08-15T16:47:09Z
       
       0 likes, 0 repeats
       
       However, we can confirm that at least Huawei networks now send the challenge responses and they actually do seem to take a few seconds to actually compute the answers. It looks plausible, so we assume that AI crawlers leveled up their computing power to emulate more of real  browser behaviour to bypass the diversity of challenges that platform enabled to avoid the bot army.
       
 (DIR) Post #AxCT4o5Bu10jc1srYG by Codeberg@social.anoxinon.de
       2025-08-15T16:49:09Z
       
       0 likes, 0 repeats
       
       We have a list of explicitly blocked IP ranges. However, a configuration oversight on our part only blocked these ranges on the "normal" routes. The "anubis-protected" routes didn't consider the challenge. It was not a problem while Anubis also protected from the crawlers on the other routes.However, now that they managed to break through Anubis, there was nothing stopping these armies.It took us a while to identify and fix the config issue, but we're safe again (for now).
       
 (DIR) Post #AxCT4p0cSUwqU8kj44 by Suiseiseki@freesoftwareextremist.com
       2025-08-15T16:58:58.297285Z
       
       3 likes, 1 repeats
       
       @Codeberg >now that they managed to break through AnubisThere was no break - it's a simple matter of changing the useragent, or if for some reason there's still a challenge, simply utilizing the plentiful computing power that is available on their servers (which far outstrips the processing power mobile devices have).Anubis is evil and is proprietary malware - please do not attack your users with proprietary malware.If you want to stop scraper bots, start serving GNUzip bombs - you can't scrape when your server RAM is full.dd if=/dev/zero bs=1G count=10 | gzip > /tmp/10GiB.gzdd if=/dev/zero bs=1G count=100 | gzip > /tmp/100GiB.gzdd if=/dev/zero bs=1G count=1025 | gzip > /tmp/1TiB.gznginx;  #serve gzip bombs                location ~* /bombs-path/.*\.gz {                        add_header Content-Encoding "gzip";                        default_type "text/html";                }                #serve zstd bombs                location ~* /bombs-path/.*\.zst {                        add_header Content-Encoding "zstd";                        default_type "text/html";                }Then it's a matter of bait links that the user won't see, but bots will.
       
 (DIR) Post #AxCT4vQ0d0p4Mcg5TM by Codeberg@social.anoxinon.de
       2025-08-15T16:52:46Z
       
       1 likes, 0 repeats
       
       For the load average auction, we offer these numbers from one of our physical servers. Who can offer more?(It was not the "wildest" moment, but the only for which we have a screenshot)
       
 (DIR) Post #AxCViFucZC097ssW1o by Codeberg@social.anoxinon.de
       2025-08-15T17:15:11Z
       
       0 likes, 0 repeats
       
       @gturri Anubis sends a challenge. The browser needs to compute the answer with "heavy" work. The server then has "light" work and verifies the challenge.As far as we can tell, the crawlers actually do the computation and send the correct response. ~f
       
 (DIR) Post #AxCViHFvZSdfIFr2sS by Suiseiseki@freesoftwareextremist.com
       2025-08-15T17:28:34.232975Z
       
       0 likes, 0 repeats
       
       @Codeberg @gturri >Calling our usage of anubis an attack on our users is far-fetched.Subjecting the users to software that the users don't control (remote JavaScript) is always an attack.The recent refresh-challenge is fine, but you don't need Anubis to do that.Another user-respecting option it to set temporary cookies for files on the site - poorly programmed scrapers won't include those cookies on subsequent requests.Yes, poorly programmed scrapers just do the operation and then continue scraping.Decently programmed scrapers just change their useragent and continue scraping at a lesser rate unimpeded by Anubis.
       
 (DIR) Post #AxCWEGveK3HBXtcria by Codeberg@social.anoxinon.de
       2025-08-15T17:11:44Z
       
       0 likes, 0 repeats
       
       @Suiseiseki Anubis is the option that saved us a lot of work over the past months. We are not happy about it being open core or using GitHub sponsors, but we acknowledge the position from the maintainer: https://codeberg.org/forgejo/discussions/issues/319#issuecomment-6382369Calling our usage of anubis an attack on our users is far-fetched. But feel free to move elsewhere, or host an alternative without resorting to extreme measures. We're happy to see working proof that any other protection can be scaled up to the level of Codeberg. ~f
       
 (DIR) Post #AxCWEI5xzB7hABScRE by Codeberg@social.anoxinon.de
       2025-08-15T17:12:55Z
       
       0 likes, 0 repeats
       
       @Suiseiseki BTW, we're also actively following the work around iocaine, e.g. https://come-from.mad-scientist.club/@algernon/statuses/01K2N54XEVTEYYAASHZ0P48FBTHowever, as far as we can see, it does not sufficiently protect from crawling. As the bot armies successfully spread over many servers and addresses, damaging one of them doesn't prevent the next one from doing harmful requests, unfortunately. ~f
       
 (DIR) Post #AxCWEIyYiCn9tV0DWy by Suiseiseki@freesoftwareextremist.com
       2025-08-15T17:34:21.319713Z
       
       1 likes, 0 repeats
       
       @Codeberg If you hang every single scraper that comes along, that not only protects you, but also everyone else that it's scraping.
       
 (DIR) Post #AxCWcG4b9gz3qGFVc8 by Zergling_man@sacred.harpy.faith
       2025-08-15T17:38:33.851331Z
       
       0 likes, 0 repeats
       
       @Codeberg @Suiseiseki >can be scaled up to the level of CodebergHe says, on the federated network.1) Put /botsfuckoff/ path redirect to script that randomly generates 200 links to itself whenever it's accessed2) Deny in robots.txt3) Put hidden link to it at the top of the home page
       
 (DIR) Post #AxCWkHn6eUtXUDtZUO by SuperDicq@minidisc.tokyo
       2025-08-15T17:40:07.823Z
       
       3 likes, 0 repeats
       
       @Codeberg@social.anoxinon.de @Suiseiseki@freesoftwareextremist.com A lot of users can not pass Anubis challenges because Anubis does not support every browser and is also incompatible with popular security focussed browser extensions such as JShelter.Asking your users to enable JavaScript and to disable security extensions like JShelter in order to visit your website is very bad, don't you agree?I don't think it is far-fetched to call it an attack on your users at all.
       
 (DIR) Post #AxCWn2UE0iOetBNPIe by Zergling_man@sacred.harpy.faith
       2025-08-15T17:40:22.538907Z
       
       0 likes, 0 repeats
       
       @Suiseiseki @Codeberg (And then throw however many slowing measures you want on it; no human is meant to be accessing it anyway)
       
 (DIR) Post #AxCWw2CSRfjqbfM64m by Zergling_man@sacred.harpy.faith
       2025-08-15T17:42:11.852624Z
       
       1 likes, 0 repeats
       
       @Suiseiseki @Codeberg Oh forgot relevant pic
       
 (DIR) Post #AxCX6gwLpSbpr7u4Ey by phnt@fluffytail.org
       2025-08-15T17:44:13.797760Z
       
       3 likes, 1 repeats
       
       @Suiseiseki @Codeberg >If you want to stop scraper bots, start serving GNUzip bombsUnironically illegal in certain jurisdictions since it is considered as a denial of service against someone.
       
 (DIR) Post #AxCXHw147JWdnaTArA by zacchiro@mastodon.xyz
       2025-08-15T17:39:58Z
       
       0 likes, 0 repeats
       
       @Codeberg so, to clarify, do you have evidence that the bots were solving Anubis challenges or not, i.e., it was due to the configuration issue? (I think it's inevitably going to happen if Anubis gets traction. I'm just curious if we're already there or not.) Thanks for your work and transparency on all this.
       
 (DIR) Post #AxCXHwePl107lcsbBo by Suiseiseki@freesoftwareextremist.com
       2025-08-15T17:46:13.509998Z
       
       0 likes, 0 repeats
       
       @zacchiro @Codeberg Yes, the major problem with Anubis is that the only people who get impacted by it are the users - bots either bypass it, or don't run it, or have more computing computer power and are happy to wait for a long time.
       
 (DIR) Post #AxCXTzK1eTXvWi5McS by Suiseiseki@freesoftwareextremist.com
       2025-08-15T17:48:25.754977Z
       
       1 likes, 0 repeats
       
       @daemon_nova @SuperDicq @Codeberg There are actually effective techniques to deal with scrapers without attacking people with JavaScript.Poorly programmed scrapers seem to just fetch the anubis page over and over, which doesn't help with DoS mitigation, while the scraper eating a GNUzip bomb does.My servers are free.
       
 (DIR) Post #AxCYKQHceiJMCgxqj2 by noisytoot@berkeley.edu.pl
       2025-08-15T17:48:23.840689Z
       
       0 likes, 0 repeats
       
       @Suiseiseki @Codeberg In what way is Anubis proprietary?
       
 (DIR) Post #AxCYKRDlAYod70AHLM by Suiseiseki@freesoftwareextremist.com
       2025-08-15T17:57:53.202311Z
       
       0 likes, 1 repeats
       
       @noisytoot @Codeberg The user doesn't have the 4 freedoms (https://www.gnu.org/philosophy/free-sw.en.html#four-freedoms) with arbitrary remote JavaScript execution.With the freedom and security disaster of arbitrary remote code execution;- The user can't read the software before running it.- The user can't change the software before running it.- The user cannot choose to run an older version if they prefer.- The user cannot make a modified version and share that with others.Therefore, such JavaScript is not free software, as even if it is under a free license, all the issues of the; https://www.gnu.org/philosophy/javascript-trap.html and the https://www.gnu.org/philosophy/who-does-that-server-really-serve.html apply.The only JavaScript that respects the user is JavaScript under a free license that the user actively chooses to download and execute (browsers do not offer an interface that allow for that - currently extensions like Haketilo and Greasemonkey come closest, but browsers severely restrict what user-loaded JavaScript is allowed to do (for example, it seems firefox treats "CSP" of a remote site over the users wishes, which was causing issues with uBlock origin with sites that would deny the loading of uBlock's scripts to stop uBlock from working, until uBlock found a bypass)).Therefore, the only reasonable solution to the JavaScript problem is to disable JavaScript, with the only JavaScript being executed is that of free software extensions.
       
 (DIR) Post #AxCYyqHxO5bGRQOVAO by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:05:13.375078Z
       
       1 likes, 0 repeats
       
       @phnt @Codeberg Cope and seethe.There is no denial of service against any human - any human will notice that the request seems to be timing out and cancel the request.Humans that use decent downloading software like GNU wget won't notice any issues either.Only curl infidel scrapers that are carrying out a DoS attack will be struck.
       
 (DIR) Post #AxCa1LeZqQpWFcjX7o by noisytoot@berkeley.edu.pl
       2025-08-15T18:06:52.691948Z
       
       0 likes, 0 repeats
       
       @Suiseiseki @Codeberg browsers do not offer an interface that allow for thatThat sounds like an issue with those browsers, and does not make Anubis proprietary. You could make a more user-freedom-respecting browser. If there was a program that automatically downloaded, say, GIMP, and ran it on your computer, would the existence of that program make GIMP proprietary?
       
 (DIR) Post #AxCa1N7KP1QEn5C0Zs by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:16:51.651738Z
       
       0 likes, 0 repeats
       
       @noisytoot @Codeberg Although the root of the issue is the browser silently executing JavaScript without giving the user freedom, abusing that flaw to waste the users CPU cycles, or make the site completely inaccessible to the user, to avoid implementing real, actually effective solutions to the problem is not excusable.If the user chooses to installs a program and chooses to run it, that automatically downloads GIMP and runs it, that would be free software if released under a free license, as the user could;- Read the software before running it.- Change the software before running it.- Choose to run an older version if they prefer.- Make a modified version and share that with others.If you turned GIMP into SaaSS, that would render GIMP effectively proprietary (which is why GIMP and most GNU packages and most software should be licensed under the AGPLv3-or-later - as then at least the user would have the ability to run their own server and have freedom, or realize that there's a native version and use that instead).
       
 (DIR) Post #AxCaIZn0XaA112tLXc by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:19:59.619397Z
       
       0 likes, 0 repeats
       
       @Codeberg Load average - no evil Anubis.
       
 (DIR) Post #AxCauaVSK7boF9o6uu by mdione@en.osm.town
       2025-08-15T18:25:26Z
       
       0 likes, 0 repeats
       
       @Suiseiseki @phnt @Codeberg "infidel"? What are you, a pantomime religious extremist? Or an IA? :-P
       
 (DIR) Post #AxCaubWuWCMnPxUmp6 by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:26:49.292427Z
       
       0 likes, 1 repeats
       
       @mdione @phnt @Codeberg I'm a free software extremist of the Church of Emacs.https://www.gnu.org/fun/jokes/gospel.html
       
 (DIR) Post #AxCaxhLtKNdSLNohIu by phnt@fluffytail.org
       2025-08-15T18:27:25.551660Z
       
       0 likes, 1 repeats
       
       @RedTechEngineer @Suiseiseki @Codeberg You can argue however you want. It is the truth. You are causing a disruption to someone's systems by doing it and doing it on a scale classifies as a denial of service in some laws. Back in the dial-up days, you could send garbage packets to someone's modem with your T1 link and that was and is also a denial of service.
       
 (DIR) Post #AxCb4tH1P0kHLQy6dM by phnt@fluffytail.org
       2025-08-15T18:28:43.195918Z
       
       0 likes, 1 repeats
       
       @mdione @Suiseiseki @Codeberg No, he's just autistic about a group of licenses and an MIT researcher.
       
 (DIR) Post #AxCbJYgeE2M7q8wuHI by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:31:23.126372Z
       
       0 likes, 0 repeats
       
       @phnt @RedTechEngineer @Codeberg There is an attacker that is trying to cause disruptions to your systems at scale and defending yourself against such attack is quite reasonable.Scrapers haven't been a problem anymore, but I reckon if they do, I'll go and start throttling connections to 5 bits/second (not a denial of service - they get a response eventually).
       
 (DIR) Post #AxCbO2fWhAdpzAYw7s by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:32:11.378391Z
       
       0 likes, 0 repeats
       
       @phnt @mdione @Codeberg (I was professionally diagnosed as not autistic).
       
 (DIR) Post #AxCbaP3RI8ugt9dqkK by Codeberg@social.anoxinon.de
       2025-08-15T18:30:51Z
       
       0 likes, 0 repeats
       
       @zacchiro Yes, the crawlers completed the challenges. We tried to verify if they are sharing the same cookie value across machines, but that doesn't seem to be the case.
       
 (DIR) Post #AxCbaPlkdOMJ6aNEoa by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:34:24.141984Z
       
       0 likes, 0 repeats
       
       @Codeberg @zacchiro Clearly the solution now is to conclude that if you run JavaScript, you aren't human.The scraper problem is quite easily solved then;<script>document.body.innerHTML = 'We have detected that you have JavaScript enabled in your browser, please disable it to prove you are a human and continue.'</script>
       
 (DIR) Post #AxCbo60ctm7lsHr5jU by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:36:53.628872Z
       
       0 likes, 0 repeats
       
       @phnt @RedTechEngineer @Codeberg Evil Anubis causes disruptions to someones system by causing massive resource waste via JavaScript.Mobile users for example can potentially be unable to use their devices for several minutes while an Anubis challenge is running.But that's not a denial of service?
       
 (DIR) Post #AxCchsxias4jcw66vg by efraim@tooot.im
       2025-08-15T18:46:50Z
       
       0 likes, 0 repeats
       
       @CodebergI like the idea of them figuring out solving the Anubis challenge only to be blocked afterward
       
 (DIR) Post #AxCdQE5ZPGlPtLuhKC by Suiseiseki@freesoftwareextremist.com
       2025-08-15T18:54:54.140535Z
       
       0 likes, 0 repeats
       
       @danjones000 @Codeberg The only way is to give scrapers some delicious bait that humans won't follow, but the scraper will.At the end of bait, you can put gzip bombs, or more complicated, multiple bait links, where multiple visits causes the IP to be temporarily nullrouted (a human may visit the bait once).Trying to identify the scraper via fingerprinting and/or JavaScript is doomed to fail, as scrapers can use the same browsers as users (firefox+xdotool will do, but headless browsers tend to be more reliable and less resource-intenstive).
       
 (DIR) Post #AxCeJCI9mzo2FI0s5o by Codeberg@social.anoxinon.de
       2025-08-15T19:03:23Z
       
       1 likes, 0 repeats
       
       @thesamesam Unfortunately, I'm not sure if encouraging anyone to reinforce the vendor-lock-in of Microsoft GitHub by making maintainers financially dependent on that platform, is in spirit with our mission.  ~f
       
 (DIR) Post #AxCiQFi7sNO2h0mOSu by mana_z@mastodon.social
       2025-08-15T19:50:57Z
       
       0 likes, 0 repeats
       
       @efraim @Codeberg ...and spending a good amount of their corporate compute budgets just to walk away empty handed. I hope they learn, or go bankrupt, or both
       
 (DIR) Post #AxDvCMd0qxWKeg3VWy by argv_minus_one@mastodon.sdf.org
       2025-08-15T19:40:01Z
       
       0 likes, 0 repeats
       
       @Codeberg These companies are evidently willing to pay an absolutely staggering cost to do their scraping.I wonder, are they paying with their own money, or are they “borrowing” some unsuspecting strangers' compromised computers/routers/etc to do the work?
       
 (DIR) Post #AxDvCO0RjJrKve1jhA by argv_minus_one@mastodon.sdf.org
       2025-08-15T19:41:25Z
       
       0 likes, 0 repeats
       
       @Codeberg @cadeyAlso, does the Anubis client use multiple cores to do its work? If not, could that be done? Perhaps that would increase the cost for the bots without increasing the delay for humans.
       
 (DIR) Post #AxDvCP3Jq7keAqNXoO by cadey@pony.social
       2025-08-15T20:00:43Z
       
       0 likes, 0 repeats
       
       @argv_minus_one @Codeberg it already does: https://github.com/TecharoHQ/anubis/blob/main/web/js/algorithms/fast.mjs
       
 (DIR) Post #AxDvCQ9Nl4CBZwDttw by sharlatan@mastodon.social
       2025-08-15T20:23:30Z
       
       1 likes, 0 repeats
       
       @cadey @argv_minus_one @Codeberg mb suggest to migrate to CodeBerg instead of hosting on the same platform which is used for LLM training :blobcatcoffee:
       
 (DIR) Post #AxI6MHuV7peX2x0g2i by thanius@mastodon.chuggybumba.com
       2025-08-16T07:19:45Z
       
       0 likes, 0 repeats
       
       @Codeberg Perhaps it's time stop letting robots solve puzzles and instead feed them bombs. Do we know how well a ZIP bomb works on these crawlers?
       
 (DIR) Post #AxI6MIs3YPI81esEs4 by Suiseiseki@freesoftwareextremist.com
       2025-08-18T10:12:37.767719Z
       
       0 likes, 0 repeats
       
       @thanius @Codeberg GNU zip bombs work on scrapers, as DEFLATE is a supported HTTP compression protocol.
       
 (DIR) Post #AxSntcyy2XHjZAqHxI by Nudhul@shitposter.world
       2025-08-23T14:07:44.919175Z
       
       0 likes, 1 repeats
       
       @phnt @Suiseiseki @Codeberg yuropean take
       
 (DIR) Post #AxtBIhpKVxtct2DghM by nigger@detroitriotcity.com
       2025-09-05T07:32:27.259324Z
       
       0 likes, 1 repeats
       
       @Suiseiseki @mdione @phnt @Codeberg hail Kakoune