https://lwn.net/SubscriberLink/970072/93a5696aa497d415/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Gentoo bans AI-created contributions [LWN subscriber-only content] Welcome to LWN.net Free trial subscription The following subscription-only Try LWN for free for 1 content has been made available to month: no payment or you by an LWN subscriber. Thousands credit card required. of subscribers depend on LWN for Activate your trial the best news from the Linux and subscription now and see free software communities. If you why thousands of readers enjoy this article, please consider subscribe to LWN.net. accepting the trial offer on the right. Thank you for visiting LWN.net! By Joe Brockmeier April 18, 2024 Gentoo Council member Michal Gorny posted an RFC to the gentoo-dev mailing list in late February about banning "`'AI'-backed (LLM/GPT/ whatever) contributions'" to the Gentoo Linux project. Gorny wrote that the spread of the "`AI bubble'" indicated a need for Gentoo to formally take a stand on AI tools. After a lengthy discussion, the Gentoo Council voted unanimously this week to adopt his proposal and ban contributions generated with AI/ML tools. The case against In his RFC, he laid out three broad areas of concern: copyrights, quality, and ethics. On the copyright front, he argued that LLMs are trained on copyrighted material and the companies behind them are unconcerned with copyright violations. "`In particular, there's a good risk that these tools would yield stuff we can't legally use.'" He questioned the quality of LLM output, though he did allow that LLMs might "`provide good assistance if you are careful enough'". But, he said, there's no guarantee contributors are aware of the risks. He minced no words about his view of the ethics of the use of AI. Gorny took issue with everything from the energy consumption driven by AI to labor issues and "`all kinds of spam and scam'". The only reasonable course of action, he said, would be to ban the use of those tools altogether in creating works for Gentoo: In other words, explicitly forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo. He added that this only extended to works created expressly for the Gentoo project, and did not encompass upstream projects using things like ChatGPT. Andreas K. Huttel asked whether there were objections to packaging AI software for Gentoo. This did not elicit a response in favor or against on the list, but the AI policy page expressly mentions that the policy does not prohibit packaging AI-related software. Is this necessary? Rich Freeman wrote that he thought it made sense to consider the use of AI, but suggested the Gentoo developer certificate of origin (DCO) already had the necessary language to prohibit AI-generated contributions. "`Perhaps we ought to just re-advertise the policy that already exists?'" He also poked at the ethical case laid out by Gorny, and suggested it would alienate some contributors even if the majority of the project was in favor. Freeman said it was not a bad idea to reiterate that Gentoo didn't want contributions that were just piped out of a GPT application into forums, bug reports, commits, etc., but didn't think that it required any new policy. Ulrich Mueller replied that there is overlap with existing policy, but did not find it redundant and supported the idea of a clarification on how to deal with AI-generated code. Sam James agreed with the proposal but worried that it was "`slightly performative [...] given that we can't really enforce it'." Gorny wrote that it was unlikely that the project could detect these contributions, or that it would want to actively pursue finding them. The point, he said, is to make a statement that they are undesirable. Oskari Pirhonen wanted to know about cases where a contributor uses ChatGPT to help with writing documentation or commit messages (but not code) because they don't have "`an excellent grasp of English'". If those contributions explicitly called out AI-generated content, would those be acceptable? Gorny said that would not help much, and dismissed the quality of content generated by ChatGPT. Mueller wanted to know where the line was: "`Are translation tools like DeepL allowed? I don't see much of a copyright issue for these.'" In a rare dissent, Matt Jolly responded that Gentoo would always have poor quality contributions, and could simply use common sense to filter out low-quality LLM material. "`We already have methods for weeding out low quality contributions and bad faith contributors - let's trust in these and see what we can do to strengthen these tools and processes.'" He argued in favor of using LLMs for code documentation and asked why he had to type out an explanation of what his code does if an LLM can generate something that only requires some editing. The proposal, he said, was a bad idea and banning LLMs "`at this point is just throwing the baby out with the bathwater'". Guidelines would be fine, even a ban on completely AI-generated works, but he was opposed to "`pre-emptively banning useful tools'". James replied that tools trained on Gentoo's current repository should be OK, as well as using LLMs to assist with commit messages. But, he said, a lot of FOSS projects were seeing too much AI spam and were not interested in picking the "`possibly good'" parts out. David Seifert responded in support of the RFC and asked if it could be added to the next Gentoo Council meeting agenda. Gorny said that he had been asked for a specific motion and provided this language: It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns. Approved Given the ratio of comments in favor of banning AI-generated contributions to objections to such a ban, it is not surprising that the council voted to accept Gorny's proposal. Now the question is how Gentoo implements the ban. In an emailed response to questions, Gorny said that Gentoo is relying on trust in its contributors to adhere to the policy rather than trying to police contributions to see if they were generated with AI/ML tools: In both cases, our primary goal is to make it clear what's acceptable and what's not, and politely ask our contributors to respect that. If we receive contributions that contain really "weird" mistakes, the kind that [do not] seem likely to be caused by a human error, we're going to start asking questions, but I think that's the best we can do. As AI/ML continues to dominate the tech industry's agenda, Gentoo is unusual in looking to shut it out rather than trying to join the party. How well the policy works, and how soon it is tested, will be interesting to see. [Send a free link] Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion. ----------------------------------------- (Log in to post comments) Gentoo bans AI-created contributions Posted Apr 18, 2024 17:31 UTC (Thu) by gmgod (subscriber, #143864) [ Link] Beyond the potential copyright violation, there is also the waste of time associated with these for documentation/commit message purposes. Prompting an AI tool to do "say that the app was missing a feature about how to handle numbers in a commit message" is going to generate a novel's worth of text that the person with broken English won't be able to vet. And anything beyond that half-broken prompt will be assumptions on the AI side that humans are going to waste time reading and finding mostly consistent until they read the code and figure the description might not even match! Wasting even more time. I'm sorry but words have meaning. Using AI as a fluff generator is probably the worst disrespect you can show to your reader. I much prefer broken English. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 17:40 UTC (Thu) by snajpa (subscriber, #73467) [ Link] So much for theory. And now, any practical example of this AI-driven spammy contribution? In the projects I watch closely, the situation you and the article are describing, is mostly purely theoretical. To me it seems like a signal that there are enough contributors to the project, when they can raise barriers to avoid problems that aren't even there. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 17:40 UTC (Thu) by atnot (subscriber, #124910) [ Link] I personally increasingly feel like this is going to solve itself when the companies that offer these services start charging for their actual cost. Instead of subsidizing it as they do now under the assumption that it'll give them a valuable market position, increase their valuation with hype and that the models will soon be obsolete and replaced by magitudes better and cheaper ones anyway etc. It's just kind of hard to imagine someone paying $100/mo or sitting there with their GPU roaring for hours on end while coding just for some moderately improved autocomplete. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 17:54 UTC (Thu) by snajpa (subscriber, #73467) [ Link] Umm, haven't they said the same thing about the shared e-scooters, ride-sharing, couch-sharing, etc.? That it will solve itself? :) As long as there are always new investors ready to pour resources in, it won't solve itself, certainly not in the way you think. They might actually manage to make inference dirt cheap, so they could afford to stay at these subscription levels, while even making profit. I don't see why not. The hardware hasn't even really started moving in the direction of cheaper inference yet, but it will. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 17:58 UTC (Thu) by snajpa (subscriber, #73467) [ Link] btw the improved autocomplete from Github is $100/year, not $100/ month - and so far, at least to me, it's been worth every penny :) [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 18:00 UTC (Thu) by snajpa (subscriber, #73467) [ Link] (*and* I got three RTX 3090 sitting around here just so that I can play around these so-called improved autocompletes :D weren't even that expensive, 2nd hand from a miner) [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 18:55 UTC (Thu) by atnot (subscriber, #124910) [ Link] Sorry, but $100 is just nowhere near enough to cover the cost of running these things. Microsoft charges their enterprise customers roughly 4x that and not even they have remotely turned a profit on it. In fact to my knowledge, not a single company has ever turned a profit with an LLM offering at any price point. And they'd be yelling it from the rooftops if they did. It's also notable that even at that price, they have to give deep discounts to enterprise customers so that they can proudly announce companies like McKinsey getting on board. Not because they have any use for it either mind you, but to be able to "better answer our customers questions about AI". [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 21:17 UTC (Thu) by snajpa (subscriber, #73467) [ Link] At that scale, they also have massive opportunities to optimize and cut the total amount of work they need to do, just by looking at the data that goes through and balancing it against the compute costs (using heuristics, for example, how often is the suggested code accepted, etc.). [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 17:42 UTC (Thu) by Karellen (subscriber, #67644) [Link] From Matt Jolly's email linked in the article: we're always going to have BS/Spam PRs and bugs - I don't really think that the content being generated by LLM is really any worse. Isn't part of the issue with LLMs not just that the quality can be low, but that the quantity of low-quality submissions jumps by orders of magnitude if LLM-powered submissions are allowed? See, for example, Clarkesworld ceasing to accept submissions altogether because of the volume of low-quality LLM-powered dross. Also, doesn't explicitly banning LLM-generated contributions simplify the rejection process? If it's allowed provided the quality is good enough, you could end up spending way too much time arguing with bad-faith actors about whether the contributions they submitted are good enough or not. Whereas being able to just say "Policy says no." makes dealing with such people a lot more straightforward. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 18:59 UTC (Thu) by atai (subscriber, #10977) [ Link] >Isn't part of the issue with LLMs not just that the quality can be low, but that the quantity of low-quality submissions jumps by orders of magnitude not really true in the context of spam generation (so not directly comparable to FOSS contribution but still matters) AI has improved the quality to make it possible to democratize good spam among all spammers. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 19:32 UTC (Thu) by flussence (subscriber, #85566) [Link] Copyright and ethics, sure; it's well documented at this point LLMs will steal entire chunks of GPLed code wholesale with the serial numbers filed off, and their proponents are so far up their own asses that not even a bolt of lightning and voice from the heavens would get them to shut up. The purpose of the system is what it does: which is to steal labour from the undercompensated in novel ways outside the law. Much like Open Source(tm) has become. But I don't think Gentoo has a leg to stand on regarding contribution quality, not while nobody seems to mind all the low-effort automated spam coming from within the house. The bugzilla is littered with tens of thousands of script-generated snowclone "QA" reports (not to be confused with clear, proofread, actionable bug reports), and almost nobody reads them, let alone acts on them, because the signal-to-noise ratio is somewhere between a windows UAC prompt and ph*ronix. Drive-by spamming one of the single digit bug IDs should've been a massive wake up call that this process is FUBAR, but alas. How much does all that *cost*? The project still can't even scrape together enough resources or willing contributors to upgrade or moderate its phpbb2 forums, for so long that it's statistically likely that a few people who once made fun of what a farce it is may have died of old age at this point. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 20:50 UTC (Thu) by kleptog (subscriber, #1183) [ Link] Honestly, this feels like a rerun of the "you can't use a spell/ grammer checker on your school assignment because that's cheating". Right now people are using prompts in chatbots, but in a few years it will be seamlessly integrated into all sorts of products. It's only going to get faster and cheaper as time goes on. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 21:04 UTC (Thu) by mb (subscriber, #50428) [Link] >but in a few years it will be seamlessly integrated into all sorts of products Right. That won't resolve the open questions, though. Just processing copyrighted material through some sort of "AI" filter should not make the Copyright go away. Or alternatively, any program processing any data shall be allowed to remove Copyright. Cannot choose both. [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 21:14 UTC (Thu) by snajpa (subscriber, #73467) [ Link] > Right. That won't resolve the open questions, though. I have a feeling that trend is going to accelerate. Open questions kinda rendered "obsolete" by even more pressing new open questions :-D [Reply to this comment] Gentoo bans AI-created contributions Posted Apr 18, 2024 22:22 UTC (Thu) by Wol (subscriber, #4433) [Link] > Honestly, this feels like a rerun of the "you can't use a spell/ grammer checker on your school assignment because that's cheating". My feeling in all of this is IFF you use an AI to help you write a valid report (of whatever sort) that's fine. The AI is the *assistant*. If, however, the AI is the *author* then you don't want to go near it with a barge pole. In other words, if there is a *human* involved, who has sanity checked it for hallucinations, accuracy, what-have-you, then that's fine. If the human sending it can't be bothered, then why should the human receiving it bother, either? And if it's the AI bot that's sending it, then you REALLY don't want to know! Cheers, Wol [Reply to this comment] Copyright (c) 2024, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds