Post ASBfY4lZD66v4UIsCm by magicaltrout@qoto.org
(DIR) More posts by magicaltrout@qoto.org
(DIR) Post #ASBenj9zD1xmjxcnFw by simon@fedi.simonwillison.net
2023-01-31T01:38:44Z
0 likes, 1 repeats
Here's a fun mystery to ponder on a Monday evening... the GitHub robots.txt file at https://github.com/robots.txt includes the following rules:```Disallow: /ekansa/Open-Context-DataDisallow: /ekansa/opencontext-*```https://github.com/ekansa/Open-Context-Data is the ONLY repository on the whole of GitHub that gets explicitly listed in robots.txt like thatThere's clearly a story there! I wonder what it is
(DIR) Post #ASBfY4lZD66v4UIsCm by magicaltrout@qoto.org
2023-01-31T01:46:46Z
0 likes, 0 repeats
@simon whilst i can't see the top tweet this is probably the story... https://twitter.com/ekansa/status/1137052076062650368
(DIR) Post #ASBfp7OOhOd08ianL6 by nmaggioni@mastodon.nmaggioni.xyz
2023-01-31T01:49:23Z
0 likes, 0 repeats
@simon Apparently crawlers made too much noise: https://news.ycombinator.com/item?id=20454327Weird how that's the only repo to get that treatment, though. Maybe it's a relic of the past that nobody ever cleaned up?Some more info in the parent HN comment.
(DIR) Post #ASBg4HEeNzMZ0hGHeS by ryan@social.lol
2023-01-31T01:50:52Z
0 likes, 0 repeats
@simon This one is amusing, too:```Disallow: /Explodingstuff/```That account has a repo hosting active malware! https://github.com/Explodingstuff/WannaCry
(DIR) Post #ASBgLEIYYhyOpAAk9g by j00bar@fosstodon.org
2023-01-31T01:54:05Z
0 likes, 0 repeats
@simon https://mobile.twitter.com/ekansa/status/1137052076062650368
(DIR) Post #ASBgdAaP6CyqWX4hMG by uep@octodon.social
2023-01-31T01:55:50Z
0 likes, 0 repeats
@simon guessing: it's large, hasn't changed in forever, and contains a whole pile of links to opencontext.org that might have caused some reflected load problem there.No idea why it's the only one like that though
(DIR) Post #ASBhkg9qDOewQFnLKi by simon@fedi.simonwillison.net
2023-01-31T02:11:59Z
0 likes, 1 repeats
@j00bar Eric Kansa said: "Ill-conceived experiment to use GitHub for public version control of some of @OpenContext's legacy XML. Web crawlers went from Open Context to a gazillion XML documents in GitHub. Evidently they didn't like it."The number of "ill-conceived experiments" I have running on GitHub now is into three digits... would be pretty legendary to have one permanently honored in their robots.txt file a decade later!
(DIR) Post #ASBiPWOxORDk2MLAQK by edsu@social.coop
2023-01-31T02:18:50Z
0 likes, 0 repeats
@simon @j00bar cc/ @ekansa