https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/ Skip to main content * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums Subscribe [ ] Close Navigate * Store * Subscribe * Videos * Features * Reviews * RSS Feeds * Mobile Site * About Ars * Staff Directory * Contact Us * Advertise with Ars * Reprints Filter by topic * Biz & IT * Tech * Science * Policy * Cars * Gaming & Culture * Store * Forums Settings Front page layout Grid List Site theme light dark Sign in the everything machine -- OpenAI says it's "impossible" to create useful AI models without copyrighted material "Copyright today covers virtually every sort of human expression" and cannot be avoided. Benj Edwards - Jan 9, 2024 8:58 pm UTC An OpenAI logo on top of an AI-generated background Enlarge OpenAI reader comments 166 ChatGPT developer OpenAI recently acknowledged the necessity of using copyrighted material in the development of AI tools like ChatGPT, The Telegraph reports, saying they would be "impossible" without it. The statement came as part of a submission to the UK's House of Lords communications and digital select committee inquiry into large language models. Further Reading NY Times copyright suit wants OpenAI to delete all GPT instances AI models like ChatGPT and the image generator DALL-E gain their abilities from training sessions fed, in part, by large quantities of content scraped from the public Internet without the permission of rights holders (In the case of OpenAI, some of the training content is licensed, however). This sort of free-for-all scraping is part of a longstanding tradition in academic machine learning research, but because deep learning AI models went commercial recently, the practice has come under intense scrutiny. "Because copyright today covers virtually every sort of human expression--including blogposts, photographs, forum posts, scraps of software code, and government documents--it would be impossible to train today's leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission. Further, OpenAI writes that limiting training data to public domain books and drawings "created more than a century ago" would not provide AI systems that "meet the needs of today's citizens." Advertisement This statement follows a lawsuit filed last month by The New York Times against OpenAI and Microsoft, a significant investor in OpenAI, for allegedly using the newspaper's content unlawfully in their products. OpenAI responded to the lawsuit on its website on Monday, claiming that the suit lacks merit and affirming its support for journalism and partnerships with news organizations. OpenAI's defense largely rests on the legal principle of fair use, which permits limited use of copyrighted content without the owner's permission under specific circumstances. The company asserts that copyright law does not prohibit the training of AI models with such material. "Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents," OpenAI wrote in its Monday blog post."We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness." Further Reading OpenAI disputes authors' claims that every ChatGPT response is a derivative work This is not the first time OpenAI has claimed fair use regarding its AI training data. In August, we reported on a similar situation in which OpenAI defended its use of publicly available materials as fair use in response to a copyright lawsuit involving comedian Sarah Silverman. OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence." reader comments 166 Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Channel Ars Technica - Previous story Next story - Related Stories Today on Ars * Store * Subscribe * About Us * RSS Feeds * View Mobile Site * Contact Us * Staff * Advertise with us * Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up - CNMN Collection WIRED Media Group (c) 2024 Conde Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1 /20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | [privacyopt] Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Conde Nast. Ad Choices