2025-07-23 Testing a reset of the one week jail =============================================== I've had a number of people complaining about being banned and I've been wondering whether I should reduce the ban-time for the escalation jail. Right now, if you've been banned for 1h five times in a 24h period, you're banned for one week. What I f I reduced this one week to three days ("one long weekend"), for example? As a test, I wanted to reset all the one week bans and see whether load explodes or not. If most of the scraping has ceased, then perhaps there would be no problem? So around 10:30 local time, I made a backup of the weekly jail: # fail2ban-client get butlerian-jihad-week banned > butlerian-jihad-week.json # sed s/\'/\"/g < butlerian-jihad-week.json | jq length 2862 # fail2ban-client get butlerian-jihad-week banned [] The backup has 2862 entries, the jail is now empty. Let's watch Munin! The graph shows the one week jail pretty constant at nearly 3000 entries dropping to zero moments ago. 40 minutes later. What are you dooooing‽ 10 minutes after the reset, the 1h jail already has over 700 entries and in another 10 minutes, it has 850 entries. And who's doing it? # asn-find (grep "^2025-07-23 1.* \[butlerian-jihad\] Ban " /var/log/fail2ban.log|cut -c 89-)\ | cut -f2-3 | sort | uniq 16567 NETRIX-16567, US 17035 NBCUNI-17035, US 19855 MASERGY, US 209366 SEMRUSH-AS, CY 35485 MAILUP-SPA, IT 39630 ASPTECH, GB 7922 COMCAST-7922, US 8069 MICROSOFT-CORP-MSN-AS-BLOCK, US 8075 MICROSOFT-CORP-MSN-AS-BLOCK, US The only obvious thing I can see is that Semrush and Microsoft are trying to crawl the site using their bots, one for their marketing and search engine optimisation and the other for their search engine itself. But remember: 2023-10-04 Search engines, the deal is off!. I am no longer convinced that being listed by search engines is in my best interest. A few hours later, the scene appears unchanged. There are four waves of blocks, going up to between 1000 and 1500 banned entries and coming back down again. If that's the image, then perhaps the long term punitive ban doesn't have to be that long. And what's the situation 24h later? The fifth wave generates entries in the one-week ban. But the cycles for the one-hour ban continues and so more entries keep getting added to the one-week ban. And another day later: The situation has stabilised around 1400 blocked entries. And another day later: The bots were back and now we're at 2900 entries blocked for a week. Every day is worse. If you look at the graph below, you'll notice that Saturday morning just after 2:00 there was a huge wave of about 6000 entries that got blocked for an hour. The same wave repeats around 9:00, 13:00, 18:00. And then, nearly 24h later, just before 2:00, the fifth such wave rolls in. At at that point, the bans are added to both the one-hour and the one-week jail. The one-week ban now has 9000 entries after a surge at 1:50 in the morning adding about 6000 new entries #Administration #Butlerian_Jihad 2025-07-25. So now, two days after a reset, I'd like to see who's hovering just below those limits I've set. If they seem to be innocents, that would be an argument to raise the limits. If they seem to be bots, that would be an argument to lower the limits or, given that the system load is OK at the moment, to leave the limit as it is. Let's go through the various test. First, "active autonomous systems". On a good day, the limit is 500 requests per autonomous system. # show-active-autonomous-systems --top 3 count percent ASN AS 234 4.54 45102 ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN 172 3.33 208258 ACCESS2IT Access2.IT Network, NL 158 3.06 12880 DCI-AS, IR total: 5158 Clearly, Alibaba is a bot hoster in China. What about Access2.IT in the Netherlands? # 2h-access-log | asn-access-log 208258 | log-request | rank-lines 176 / Clearly, that is also a bot. 2h is 120 minutes. This company is checking more than once per minute whether my sites are up. That angers me. They probably sell uptime data to other companies. What about the autonomous system from Iraq? # 2h-access-log | asn-access-log 12880 | log-request | rank-lines 164 / The same bullshit! 😲 So my conclusion is that the limit could be lower! 😠 Next, let's look at "expensive end-points". These are the endpoints I'm planning to do away with, eventually. # 2h-access-log \ | egrep '\baction=(rss|rc)\&|\bsearch=' \ | awk '{print $2}' \ | asncounter --top 3 --no-prefixes 2>/dev/null count percent ASN AS 27 4.42 7922 COMCAST-7922, US 11 1.8 6939 HURRICANE, US 8 1.31 7713 TELKOMNET-AS-AP PT Telekomunikasi Indonesia, ID total: 611 Comcast in the United States seems to request all sorts of things: # 2h-access-log | asn-access-log 7922 | log-request | rank-lines 7 /1pdc/2024/ 4 /view/index.rss 4 /osr/feed.xml 4 /osr/atom.xml 3 / 2 /wiki/MontagInZ%C3%BCrich/Discord_Server 2 /wiki/feed/full/ 2 /upload/?filename=2017-04-07_Worldbuilding-1.jpg 1 /zoom-frm.el 1 /zen?action=rss&all=0&days=90&rcfilteronly=%22WikioStyle%22&showedit=1 Note how only one of these requests is one of the "forbidden" URLs. But apparently it's also attempting to upload images! Looks like spiders, if you ask me. What about Hurricane in the United States? # 2h-access-log !^social | asn-access-log 6939 | log-user-agent | rank-lines 17 Feedly 8 Feedbin feed-id:1702619 - 1 subscribers 6 Feedbin feed-id:2621878 - 1 subscribers 6 Feedbin feed-id:1965060 - 1 subscribers 6 Feedbin feed-id:1482607 - 3 subscribers 6 Feedbin feed-id:1482606 - 1 subscribers 6 Feedbin feed-id:1244032 - 8 subscribers 4 Feedbin feed-id:2982258 - 1 subscribers 4 Feedbin feed-id:2584020 - 2 subscribers 4 Feedbin feed-id:1821891 - 1 subscribers What the hell is wrong with these people? Do I really have these many interesting feeds? # 2h-access-log !^social | asn-access-log 6939 | log-request | rank-lines 9 /view/index.rss 8 /osr/feed.xml 6 /view/RPG.rss 5 /files/osr-discord.xml 4 /wiki/?action=journal;title=Roleplaying%20Games;full=1;search=tag:RPG 4 /view/index.rss?action=journal;title=Roleplaying%20Games;full=1;search=tag:RPG 4 /rpg/feed.xml 4 /podcast/hh.xml 3 /wiki/feed/full/RPG 3 /wiki/feed/full/Old_School Maybe. 😬 Four requests means one request every half hour, for every feed. I guess it makes sense. In any case, I seems that maybe these autonomous systems are hitting a lot of expensive end-points but in general, they are all focused on feed processing. The autonomous system from Indonesia looks even more like a bot but in addition to that it also goes through all the archives, for every single page: # 2h-access-log !^social | asn-access-log 7713 | log-request | rank-lines 1 /wiki/Older_Upgrading_Issues 1 /wiki/CategoryWiki 1 /wiki?action=rss&all=1&days=1&full=1&rcidonly=wiki_feeds&showedit=0 1 /wiki?action=rss&all=0&days=7&diff=1&full=1&rcidonly=CommentHabillerUnFilRss&showedit=1 1 /wiki?action=rss&all=0&days=28&rcidonly=2004-07-12&showedit=0 1 /wiki?action=rc&from=1749992400&rcidonly=GermanXpCommunity&showedit=1&upto=1750597200 1 /wiki?action=rc&all=1&from=1750742594&rcidonly=WikiToHTML&upto=1751001794 1 /wiki?action=rc&all=0&days=28&rcfilteronly=%22DifficultPerson%22&showedit=0 1 /wiki?action=rc&all=0&days=14&rcidonly=HoofSmith&rollback=1&showedit=0 1 /wiki?action=admin&id=UserInterfaceValidator Hopefully most of that will be fixed once the forms all change from GET to POST. 😬 In any case, I don't feel like lifting this limit! The last instance involves banning all the autonomous systems that are hitting "no bots" warnings. The web server does this redirect for various requests, including all user agents containing the words "bot", "spider", "crawl" etc. I'm going to change all my robots.txt files to the following: User-agent: * Disallow: / DisallowAITraining: / 2025-07-31. In any case, the adventure continues. A spike at 6:00 in the morning, and the a continual growth of blocked entries. There are now 14000 entries blocked for a week. Netstat confirms 1000 simultaneous connections at 6:00 in the morning, up from an average of 100 simultaneous connections. The victim was once again Community Wiki with at least 15 processes attempting to serve the wiki. Load climbed to 16 for a short moment and then fell back down. The defense seems to have worked.