* * * * * Still no information on who “The Knowledge AI” is or was > Back in July 2019 I was investigating some bad bots [1] on my website when > I came across the bot that identified itself simply as “The Knowledge AI > (Artificial Intelligence)” that was the number one robot hitting my site > [2]. Most bots that identify themselves will give a URL to a page that > describes their usage like Barkrowler [3] (to pick one that recently > crawled my site). But not so “The Knowledge AI”. That was all it said, “The > Knowledge AI”. It was very hard to Google, but I wouldn’t be surprised if > it was OpenAI. > > The earliest I can find “The Knowledge AI” crawling my site was April of > 2018, and despite starting on April 16th, it was the second most active > robot that month. In May it was the number one bot, and it stayed there > through October of 2022, after which it pretty much dropped—from 32,000+ in > October of 2022 to 85 in November of 2022 (about 4½ years). It was > sporadic, showing up in single digit hits until January of 2024. It may be > still crawling my site, but if it is, it is no longer identifying itself. > > I don’t know if “The Knowledge AI” was an LLM company crawling, but if it > was, not giving a link to explain the bot is suspicious. It’s the rare > crawler that doesn’t identify itself with at least a URL to describe it. > The fact that it took the number one crawling spot on my site for 4 ½ years > is suspicious. As robots go, it didn’t affect the web server all that much > (I’ve come across worse ones), and well over 90% of its requests were valid > (unlike MJ12, which had a 75% failure rate). And my /robots.txt file > doesn’t exclude any robot from scanning, so I can’t really complain about > it. > “My comment on “Mitigating SourceHut's partial outage caused by aggressive crawlers | Lobsters” [4]” Even though the log data is a few years old, I don't think that IPs change from ASN (Autonomous System Number) to ASN all that much (but I could be wrong on that). I checked the IPs used by “The Knowledge AI” in May 2018, and in October 2022, and they didn't change that much. They were still the same /24 networks across that time. Looking up the information today is very disappointing—Hurricane Electric LLC. [5], a backbone provider. So no real information about who “The Knowledge AI” might have been. Sigh. [1] gopher://gopher.conman.org/0Phlog:2019/07/09.1 [2] gopher://gopher.conman.org/0Phlog:2019/07/09.1 [3] https://www.babbar.tech/crawler [4] https://lobste.rs/s/dmuad3/mitigating_sourcehut_s_partial_outage#c_mygeyl [5] https://www.he.net/ Email Sean Conner at sean@conman.org .