Post ASWBk5zEZvlT2TWgOu by roufamatic@hachyderm.io
(DIR) More posts by roufamatic@hachyderm.io
(DIR) Post #ASWBAlkHqIY47vgp3Q by lauren@mastodon.laurenweinstein.org
2023-02-09T23:17:11Z
0 likes, 0 repeats
**** robots.txt for AI? ****I'm beginning to think that robots.txt will need to be expanded to include AI-specific provisions, so that websites can specify which of their content they do not want used by various AI-based systems.It increasingly appears clear what one Achilles' Heel of AI Chatbots may be. Not just accuracy problems, but also their apparent propensity to lift the text of human authors, then obfuscate or just use it as is, as if self-authored. Even including citations and links are unlikely to help, because click-through rates on these from chatbot replies are likely to be extremely low.This is going to be an enormous problem. -L
(DIR) Post #ASWBPlEu0U28tu9tFg by delfuego@mastodon.cloud
2023-02-09T23:19:43Z
0 likes, 0 repeats
@lauren This is a brilliant idea — really, this is phenomenal. We need to figure out how to get critical mass for this!
(DIR) Post #ASWBk5zEZvlT2TWgOu by roufamatic@hachyderm.io
2023-02-09T23:23:31Z
0 likes, 0 repeats
@lauren What happens when AI chatbots start learning from other AI chatbots? Not that I have any problem at all with being able to opt out, but it's wild to think that AI bots could actually reach some peak quality bar and then only get worse over time as they regurgitate the same things that other bots have already spit out.
(DIR) Post #ASWCXErSN718kiJzo8 by byterhymer@mastodon.social
2023-02-09T23:32:25Z
0 likes, 0 repeats
@lauren I don't hold high hopes.Also see: robots.txt already being ignored with abandon, also see: Do Not Track provisions, being ignored with abandon, also see: the utter disaster of cookies.Basically: humans can't be trusted to not give into their worst impulses to mine data for a potential profit. It's probably hardwired in, the Judeo-Christian deity trying to destroy the world in 40 days & nights of flooding even failed to cleanse the world of his own scourge.We seem intractably stuck.
(DIR) Post #ASWFWjRDVCil9pUw9A by lauren@mastodon.laurenweinstein.org
2023-02-10T00:05:58Z
0 likes, 0 repeats
@byterhymer In conjunction with legislation that in this particular case might possibly be bipartisan, it could still be a useful tool, particularly if it gets ahead of the issue. I suspect courts would look favorably on this, even conservative ones.
(DIR) Post #ASWGNn3ul4wRkHtmam by byterhymer@mastodon.social
2023-02-10T00:15:31Z
0 likes, 0 repeats
@lauren I don't mean to be a Negative Nancy, but my experiences with the so-called "Justice" system in the USA have been unmitigated disasters.Despite that: I dig your sentiments. Fight for the users! Not the abusers, right?I feel as if most things of merit are a never ending uphill battle these days?Courts are very regional. I enjoyed 毒舌大狀 (2023) aka "A Guilty Conscience" & Amazing Grace (2006) depicted how England outlawed slavery after 17 years.USA 2023: carceral slavery persists. ;(
(DIR) Post #ASWHxumDeXnG3EPYgK by lauren@mastodon.laurenweinstein.org
2023-02-10T00:33:17Z
0 likes, 0 repeats
@byterhymer The majors could probably be convinced to go along. That would accomplish a great deal.
(DIR) Post #ASWKWMfUevf8JEou6y by bapril@infosec.exchange
2023-02-10T01:01:47Z
0 likes, 0 repeats
@lauren Interesting idea. The thing I think is missing is consent. A default stance permitting all uses unless otherwise marked is to me problematic. What happens if I learn about the ai.txt file after I’ve been indexed, does this mean I gave permission to use my content? Can I force the AI to re-train(no!)? It seems backwards to put the onus on the site owner to specify what they don’t want crawled/used by AI. Given how much resource we spend on SEO, it’s clearly not too much to ask a site operator to list what they want crawled in a file.
(DIR) Post #ASWSnV6nKYp52iGvYW by lauren@mastodon.laurenweinstein.org
2023-02-10T02:34:40Z
0 likes, 0 repeats
@bapril Any kind of robots.txt expansion (as I've outlined) would be a tough slog. But changing the fundamental nature of robots.txt (essentially, what it doesn't tell you not to do you're allowed to do by default within reasonable limits) would be *much* harder in a practical sense I believe.
(DIR) Post #ASWbyzYbMaSgFYpgsS by searchmaven@mastodon.social
2023-02-10T04:17:33Z
0 likes, 0 repeats
@lauren Good recommendations here that I’m starting to apply to my client sites. https://www.searchenginejournal.com/how-to-block-chatgpt-from-using-your-website-content/478384/
(DIR) Post #ASXDfD1VNowiTtLVxI by bapril@infosec.exchange
2023-02-10T11:19:47Z
0 likes, 0 repeats
@lauren The robots.txt ship has sailed (with the wrong defaults IMO). There’s nothing either of us is going to do about that. I read your proposal as either a new capability within on the existing file, or a new file entirely. This would provide the oppurtunity to re-evaluate the defaults and make informed consent by the source the primary unit of measurement rather than “tell me(as an AI operator) which things you are likely to sue me over.” I get it informed consent is not popular. It alters the “scrape everything you can as fast as you can and assert fair-use when challenged” equation, which is how most of these models got to critical mass so fast.
(DIR) Post #ASXa27eHjIWzSqoeWm by SoftwareTheron@mas.to
2023-02-10T15:30:22Z
0 likes, 0 repeats
@lauren Automated checking of link content might amount to opening an attack surface for malware.