Post AuGHJ4PQifZF6L0SW0 by errant@mastodon.sdf.org
(DIR) More posts by errant@mastodon.sdf.org
(DIR) Post #AuGBk35d8KZV6SE6L2 by ozzelot@mstdn.social
2025-05-19T18:27:19Z
0 likes, 0 repeats
@prahou anubisamogus
(DIR) Post #AuGFsPb3jjeH2ZAcy0 by ozzelot@mstdn.social
2025-05-19T19:13:41Z
0 likes, 0 repeats
@apophisWe are not allowed to have nice thingsPerhaps the whole web is over. But what way is there to make something unscrapable by design?@prahou
(DIR) Post #AuGGBLRlOFmbfhqmTA by ozzelot@mstdn.social
2025-05-19T19:17:06Z
0 likes, 0 repeats
@prahouThe site police: Hello. Our insectidrones overheard you talking about a website, or a slushie, we are not sure. Either way, you have now mandatorily volunteered to let a robot see it.@apophis
(DIR) Post #AuGGY9D73CEoNjs2wi by ozzelot@mstdn.social
2025-05-19T19:21:12Z
0 likes, 0 repeats
@prahou"Is that a CD burner or is your wallet just happy to see us?"@apophis
(DIR) Post #AuGGu2RurxNvmcEWv2 by ozzelot@mstdn.social
2025-05-19T19:25:11Z
0 likes, 0 repeats
@glitzersachenScraping every piece of copyrighted material known to man is also pretty illegal in some jurisdictions...@apophis @prahou
(DIR) Post #AuGHJ4PQifZF6L0SW0 by errant@mastodon.sdf.org
2025-05-19T18:57:47Z
0 likes, 0 repeats
@prahou wyatt@soc.megatokyo.moe Wait, doesnt this defeat the stated purpose of Anubis, then? If you can just pretent to be links2 or curl, then what is that damned check even good for. Just set your scraper to the curl user agent and away you go.
(DIR) Post #AuGHJ5jfmtM1DPU8hs by eris@p.enes.lv
2025-05-19T19:29:38Z
0 likes, 0 repeats
Many malicious crawlers pretend to be a mainstream web browser, and all the mainstream web browsers say they're Mozilla/5.0 (because historical reasons, see https://webaim.org/blog/user-agent-string-history/ ). There are servers out there that will block you if you use a non-web-browser User-Agent, I once got IP blocked (temporary, I hope, I have a dynamic IP) for using wget recursively on a website of some 20 articles after hitting the second article. After getting a new IP & changing the User-Agent to my browser's one, it worked fine without any hiccups.Basically, it's easier to detect & block crawlers with a non-Mozilla user agent without having to resort to Anubis (automatic heavy rate limiting, auto-blocking known offenders, etc).CC: @prahou@merveilles.town
(DIR) Post #AuGHJYPp9lhyAFQCIK by ozzelot@mstdn.social
2025-05-19T19:29:48Z
0 likes, 0 repeats
@glitzersachenAlso, captcha-style scrambled text, I'm not even sure is a good approach. Hasn't that been retired because robots just got very good at solving it?@apophis @prahou
(DIR) Post #AuGRlRVjQ99OOq7brE by ozzelot@mstdn.social
2025-05-19T21:26:53Z
0 likes, 0 repeats
@apophis @prahou That's a hell of an acoustic coupler