Post Ax0XQlwEclLhFguAls by TagHunt@infosec.exchange
(DIR) More posts by TagHunt@infosec.exchange
(DIR) Post #Ax0CnQWqIC4kTTzOSG by stux@mstdn.social
2025-08-09T19:00:01Z
0 likes, 0 repeats
So according to this article: https://www.dropsitenews.com/p/meta-facebook-tech-copyright-privacy-whistleblower#Meta is scraping the media proxies of mstdn, masto and .coffee..If this is true, this is very worriying and pisses me very much offNo wonder our media loads so crappy if they are constantly tapping in..Fuck #Meta to hell
(DIR) Post #Ax0CyP9JlGtdNjDipk by stux@mstdn.social
2025-08-09T19:02:01Z
0 likes, 0 repeats
Does anyone know a good way to block them?I do not want anything to do with their shady business and they should stay away from ours
(DIR) Post #Ax0DBuXiLuk4Vm4WZ6 by stux@mstdn.social
2025-08-09T19:04:22Z
0 likes, 0 repeats
@alex Nop unfortLet's nuke FB HQ
(DIR) Post #Ax0DMccHLIvk6DJXrk by loganer@mastodon.social
2025-08-09T19:06:22Z
0 likes, 0 repeats
@stux maybe some kind of rate limiter?would work if their request all originate from the same ip address (which is likely considering they are large companies).
(DIR) Post #Ax0DNHqY39hpyLJ1Ie by Extelec@mstdn.social
2025-08-09T19:06:30Z
0 likes, 0 repeats
@stux not just Meta, none of them respect any content any more. No point in robots.txt. Will not rate limit what so ever. They all chop and change ips and user agents to evade limiting. Regularly kills smaller servers dead, then move on. (If your lucky) they are trashing the Web for small hosts.
(DIR) Post #Ax0DUOLcHFzwA7E2oS by grob@mstdn.social
2025-08-09T19:07:45Z
0 likes, 0 repeats
@stux does that mean Fedi people can take part in the class action law suit that could do "immense harm not only to a single AI company, but to the entire fledgling AI industry and to America’s global technological competitiveness." as stated by said industry reps? :think_bread: https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-largest-copyright-class-action-ever-certified/via @Lazarou https://mastodon.social/@Lazarou/114994400792642672
(DIR) Post #Ax0DkZ2UIJf5xEzQkS by drahardja@sfba.social
2025-08-09T19:10:41Z
0 likes, 0 repeats
@stux I think this is the problem that CloudFlare has been trying to solve: they have a team that watches and profiles bot behavior across their clients and block bots that supposedly belong to AI crawlers.I have a lot of mixed feelings about what this means to the web, and how this creates yet another for-profit intermediary between users and sites, but here you go: modern problems require modern solutions and all that.https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/
(DIR) Post #Ax0Do9QbSQrPgJyp9s by djsf@fosstodon.org
2025-08-09T19:11:20Z
0 likes, 0 repeats
@stux lol. They prohibit, shadowban, or actual-ban any acknowledgement of the existence of the fediverse on their own platforms, while also scraping it for content illegally. Duplicitous and vile.
(DIR) Post #Ax0DrZSN8mF5HCRELg by jorijn@toot.community
2025-08-09T19:11:55Z
0 likes, 0 repeats
@stux yes. Block their user agent. We did: https://github.com/toot-community/platform/blob/main/manifests/applications/ingress-nginx/helm-values.yaml#L23
(DIR) Post #Ax0DtM4Rsc3mL96ByC by stux@mstdn.social
2025-08-09T19:12:19Z
0 likes, 0 repeats
@jorijn oh! is that enough? 😀
(DIR) Post #Ax0E7EoxdCGJuRTTrk by stux@mstdn.social
2025-08-09T19:14:43Z
0 likes, 0 repeats
@alex yes, im gonna test dis:if ($http_user_agent ~* "Meta-ExternalAgent") {return 403;}
(DIR) Post #Ax0EQSNXRqanSZoPc8 by xyhhx@nso.group
2025-08-09T19:18:15Z
0 likes, 0 repeats
@stux what about iocaine, quixotic, etc?
(DIR) Post #Ax0EdVR3OX68hkg8uW by maxheadroom@hub.uckermark.social
2025-08-09T19:20:36Z
0 likes, 0 repeats
@stux maybe have a look at my setup: https://repos.mxhdr.net/maxheadroom/Traefik-Bot-Blocking
(DIR) Post #Ax0EglnN1cySJSfojw by athos@bolha.one
2025-08-09T19:21:11Z
0 likes, 0 repeats
@stux block their entire ASN? it will make at least them spend money by using another network
(DIR) Post #Ax0EkXexWVHhd8XP7I by stux@mstdn.social
2025-08-09T19:21:54Z
0 likes, 0 repeats
Just edited our nginx configs and addedif ($http_user_agent ~* "Meta-ExternalAgent") { return 403;}to the server block
(DIR) Post #Ax0EwwJFhXqbYVIQXw by loganer@mastodon.social
2025-08-09T19:24:08Z
0 likes, 0 repeats
@stux :)https://http.cat/403
(DIR) Post #Ax0Ex6z60pGCfAwz9k by kevinrns@mstdn.social
2025-08-09T19:24:11Z
0 likes, 0 repeats
@stux Needs hashtags. (lol)
(DIR) Post #Ax0FE4hPlICGUylieO by TheMNWolf@furry.engineer
2025-08-09T19:27:13Z
0 likes, 0 repeats
@stux I'm very curious to know what effect this has on the traffic. From the articles I read, they usually aren't that forthcoming about their scraping attempts.
(DIR) Post #Ax0FX3h5iuI6sofa3U by stux@mstdn.social
2025-08-09T19:30:40Z
0 likes, 0 repeats
In addition, i've also enabled CloudFlare's AI scrape block for masto.ai and .coffeemstdn.social DNS is at Hetzner
(DIR) Post #Ax0FYGeB39WXLUsoQS by Viss@mastodon.social
2025-08-09T19:30:34Z
0 likes, 0 repeats
@stux i wonder if adding another line to that stanza, specifically so that the logs of everything hitting that rule goes to a separate log file - so you can harvest all the source IPs out - would be handy?because im sure they will do the anthropic thing soon, and will start moving their scrapers into various other clouds to get around people blocking their asn and user agents. if you can fingerprint their patterns, block them by pattern >:Dreverse-cambrige-analytica them.
(DIR) Post #Ax0Fu9NQlDQxAWU6Do by kevinrns@mstdn.social
2025-08-09T19:34:54Z
0 likes, 0 repeats
@stux Is there an Instance Runner newsletter for all nodes to.learn from your efforts?
(DIR) Post #Ax0FvWAFWUOa8FyGye by TagHunt@infosec.exchange
2025-08-09T19:35:05Z
0 likes, 0 repeats
@stux Maybe use Anubis on everything possible.Also make the robots.text and useragent blocking more aggressive.Also also you can try blocking whole IP ranges of data centers related to meta.
(DIR) Post #Ax0G0ZHVMTB7FOlzF2 by stux@mstdn.social
2025-08-09T19:36:02Z
0 likes, 0 repeats
@kevinrns When Ghost's new federation works fully ill setup a small blog on a sub of mstdn with info, posts and also financial data etc 💪
(DIR) Post #Ax0G72bvEEmFwqPSEq by kevinrns@mstdn.social
2025-08-09T19:37:14Z
0 likes, 0 repeats
@stux Thanks so much for your constant efforts on our behalf, youre a hero and general inspiration.#mastodon
(DIR) Post #Ax0G8hzYrLt4ZzGHsu by paul@oldfriends.live
2025-08-09T19:37:23Z
0 likes, 0 repeats
@stux I've been looking at The Ultimate Nginx Bad Bot Blockerhttps://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker
(DIR) Post #Ax0H06gDngkuUog5vE by nvsr@best-friends.chat
2025-08-09T19:47:06Z
0 likes, 0 repeats
@stux returning http code 444 from nginx just closes the connection, clearing up resources faster :partyblobcat:
(DIR) Post #Ax0JwzVeIsjZFbwGBs by kevinrns@mstdn.social
2025-08-09T20:20:14Z
0 likes, 0 repeats
@stux 👏 👏
(DIR) Post #Ax0KQlAYBWiZSkGTaq by jorijn@toot.community
2025-08-09T20:25:32Z
0 likes, 0 repeats
@stux yes 👍
(DIR) Post #Ax0LaR3nu5rtwjHq3k by WillyECoyote69@mstdn.social
2025-08-09T20:38:32Z
0 likes, 0 repeats
@stux Block those MoFu's !
(DIR) Post #Ax0MiDvwgzbtVNtA36 by swelljoe@mas.to
2025-08-09T20:51:07Z
0 likes, 0 repeats
@stux my theory for why we're in this mess has long been that the leadership and functionaries in the Democratic party prefer fundraising off of the outrageous things Trump and his cronies do than winning elections, because it's simply more profitable for them.The outrageous corruption of the Republicans makes Democrats look normal by contrast, but the motivations are similar. They're rich and like being rich and if losing makes them richer, that's what they've got to do.
(DIR) Post #Ax0N8fL7Q6by25GuKu by lumiworx@mastodon.social
2025-08-09T20:55:53Z
0 likes, 0 repeats
@stux Try the following in htaccess in order to be extremely specific on known domains you'd want to limit. Each domain gets 2 lines, but you can stack as many as you'd like in any order.Remove the # from the comments if you don't already have those 2 enabled earlier in some other directives.<IfModule mod_rewrite.c> # RewriteEngine on # Options +FollowSymlinks RewriteCond %{HTTP_REFERER} explicit-domains\.com [NC] RewriteRule .* - [F]</IfModule>
(DIR) Post #Ax0NlAYnn4s1rl22Yy by nf3xn@mastodon.social
2025-08-09T21:02:49Z
0 likes, 0 repeats
@stux this is why I think instances should be authenticated only. There is an optional authorized-fetch/secure-mode. I had assumed my instance had it enabled but no. :(cloudflare recently called out perplexity for bypassing their AI labyrinth.
(DIR) Post #Ax0QNg4jGbnWWKSIaW by SpaceLifeForm@infosec.exchange
2025-08-09T21:32:13Z
0 likes, 0 repeats
@stux The robots.txt file is just an 'ask' but means nothing.See Cloudflare and Perplexity pointing at each other.
(DIR) Post #Ax0RCJ7mMCNO6n1Ye0 by exostence@mastodon.social
2025-08-09T21:41:21Z
0 likes, 0 repeats
@stux https://techcrunch.com/2025/08/04/perplexity-accused-of-scraping-websites-that-explicitly-blocked-ai-scraping/Once you put information out there, it appears corporations and other unscrupulous actors take that as an invitation to use the "free data" for whatever they want. They have weaponized it more than once (clearview, et al.) and I see nothing that prevents them from doing this again. "Public domain", they'll cry. I've minimized my web presence, but there's nothing to suggest they will stop at what they consider "public domain".
(DIR) Post #Ax0RQUJBZWmqqi7ZxY by timezoneless@mstdn.social
2025-08-09T21:43:57Z
0 likes, 0 repeats
@stux long-term persistent activity? Any ability to allow for #req/h or smth?
(DIR) Post #Ax0TyBrypiwRPsav56 by RVLara23@mastodon.social
2025-08-09T22:12:24Z
0 likes, 0 repeats
@stux 6 million unique sites? My god. I think whatever they're scraping from the Fedi is the LEAST of our worries if we're talking about that many sites. They are a pretty evil company and I don't see any easy way to bring them down.
(DIR) Post #Ax0XQlwEclLhFguAls by TagHunt@infosec.exchange
2025-08-09T22:51:14Z
0 likes, 0 repeats
@stux Fuck the law systems aswellYou can't even sue them for impeding business and theft of intellectual property because they have the pay to win lawyers
(DIR) Post #Ax0YCJOq28Hidnx808 by Owen_G_Richards@writing.exchange
2025-08-09T22:59:47Z
0 likes, 0 repeats
@stux I doubt this will be acceptable to many, but I use a captcha to keep the bots at bay... (mine's homegrown, not a 'free'/'purchase' option).
(DIR) Post #Ax0YOqsTOAfZwinGZE by WideEyedCurious@mstdn.social
2025-08-09T23:02:05Z
0 likes, 0 repeats
@stux I use Cloudflare at work and have enabled every AI-scraping bot feature available in the free plan since we all know they’re ignoring anything we put in the robot.txt file.
(DIR) Post #Ax0YSAD7xtzym929iq by nitinkhanna@mastodon.social
2025-08-09T23:02:40Z
0 likes, 0 repeats
@stux wonder if Anubis or Fail2ban can do more for you…
(DIR) Post #Ax0Yu9fpYUrhDao4q8 by eoinoneill@mastodon.gamedev.place
2025-08-09T23:07:43Z
0 likes, 0 repeats
@stux Wouldn't it be more effective to fight fire with fire? Create a mass amount of fake-ish looking content and then serve that up as real content to the scraper, effectively poisoning the AI? So impersonate a fake user post, a fake image with improper alt text.This way, they might not catch onto it right away. A 403 means they'll instantly change their methodology because they know they're blocked.
(DIR) Post #Ax0a2DhSZNITyayiTg by VulcanTourist@autistics.life
2025-08-09T23:20:23Z
0 likes, 0 repeats
@stux Can't Meta's IP blocks be rejected from accessing Fedi domains? It would probably turn into a game of whack-a-mole, but it's better than doing nothing.
(DIR) Post #Ax0bn20VxatVHDmBl2 by nicdex@techhub.social
2025-08-09T23:40:02Z
0 likes, 0 repeats
@stux Damn it, looks like TechHub is on there too >:(
(DIR) Post #Ax0boJiBcZlgTXogIS by hellosilverpatriot@social.vivaldi.net
2025-08-09T23:40:15Z
0 likes, 0 repeats
@stux Scraping for AI training is fair use, so long as you are polite and don't use too much bandwidth.
(DIR) Post #Ax0gABYp0U8fNo7SgS by Crovanian@mastodon.social
2025-08-10T00:29:03Z
0 likes, 0 repeats
@stux huh, neat. Can zuck fuck off please, thank you :)
(DIR) Post #Ax0hwSX7J4CVsOu8jw by honor2025@mastodon.social
2025-08-10T00:48:59Z
0 likes, 0 repeats
@stux I quit Facebook in 2012 after the NSA/GCHQ data collection leaks.2019 finally quit WhatsApp.Since then, im using Briar messenger & SimpleX Chat & Mastodon
(DIR) Post #Ax0shHgUSVzBNiFthI by unixjunk1e@infosec.exchange
2025-08-10T02:49:29Z
0 likes, 0 repeats
@stux A leak from what orifice ?
(DIR) Post #Ax109yCEh26B19jInI by greensofshade@mastodon.social
2025-08-10T04:13:06Z
0 likes, 0 repeats
@stux nooo, they wouldn't do the thing everyone was worried about from the start
(DIR) Post #Ax117rE3jqkym28gd6 by EchteNachtraaf@mastodon.social
2025-08-10T04:23:55Z
0 likes, 0 repeats
@stux block? We should be suing
(DIR) Post #Ax12TSzQczINvrACXo by Distante@mastodon.social
2025-08-10T04:39:03Z
0 likes, 0 repeats
@stux Does this thing work? https://www.404media.co/the-open-source-software-saving-the-internet-from-ai-bot-scrapers/?ref=daily-stories-newsletter
(DIR) Post #Ax17BKY9lfo6JZZwOG by ferds@metalhead.club
2025-08-10T05:31:46Z
0 likes, 0 repeats
@stux @thomas You know about this?
(DIR) Post #Ax2AKhcLrlp3U6f4M4 by htwj@mastodon.coffee
2025-08-10T17:41:46Z
0 likes, 0 repeats
@stux wonder if this is one of those rare cases one can use a 451…