https://opensource.org/blog/the-open-source-ai-definition-v-1-0-rc1-is-available-for-comments Skip to content Open Source Initiative * About * Open Source AI * Open Source Definition * Licenses * Blog * Programs * Join * About * Open Source AI * Open Source Definition * Licenses * Blog * Programs * Join Open Main Menu October 2, 2024 * News * Stefano Maffulli The Open Source AI Definition RC1 is available for comments A little over a month after v.0.0.9, we have a Release Candidate version of the Open Source AI Definition. This was reached with lots of community feedback: 5 town hall meetings, several comments on the forum and on the draft, and in person conversations at events in Austria, China, India, Senegal, and Argentina. There are three relevant changes to the part of the definition pertaining to the "preferred form to make modifications to a machine learning system." The feature that will draw most attention is the new language of Data Information. It clarifies that all the training data needs to be shared and disclosed. The updated text comes from many conversations with several individuals who engaged passionately with the design process, on the forum, in person and on hackmd. These conversations helped describe four types of data: open, public, obtainable and unshareable data, well described in the FAQ. The legal requirements are different for each. All are required to be shared in the form that the law allows them to be shared. Two new features are equally important. RC1 clarifies that Code must be complete, enough for downstream recipients to understand how the training was done. This was done to reinforce the importance of the training, both for transparency, security and other practical reasons. Training is where innovation is happening at the moment and that's why you don't see corporations releasing their training and data processing code. We believe, given the current status of knowledge and practice, that this is required to meaningfully fork (study and modify) AI systems. Last, there is new text that is meant to explicitly acknowledge that it is admissible to require copyleft-like terms for any of the Code, Data Information and Parameters, individually or as bundled combinations. A demonstrative scenario is a consortium owning rights to training code and a dataset deciding to distribute the bundle code+data with legal terms that tie the two together, with copyleft-like provisions. This sort of legal document doesn't exist yet but the scenario is plausible enough that it deserves consideration. This is another area that OSI will monitor carefully as we start reviewing these legal terms with the community. A note about science and reproducibility The aim of Open Source is not and has never been to enable reproducible software. The same is true for Open Source AI: reproducibility of AI science is not the objective. Open Source's role is merely not to be an impediment to reproducibility. In other words, one can always add more requirements on top of Open Source, just like the Reproducible Builds effort does. Open Source means giving anyone the ability to meaningfully "fork" (study and modify) a system, without requiring additional permissions, to make it more useful for themselves and also for everyone. This is why OSD #2 requires that the "source code" must be provided in the preferred form for making modifications. This way everyone has the same rights and ability to improve the system as the original developers, starting a virtuous cycle of innovation. Forking in the machine learning context has the same meaning as with software: having the ability and the rights to build a system that behaves differently than its original status. Things that a fork may achieve are: fixing security issues, improving behavior, removing bias. All these are possible thanks to the requirements of the Open Source AI Definition. What's coming next With the release candidate cycle starting today, the drafting process will shift focus: no new features, only bug fixes. We'll watch for new issues raised, watching for major flaws that may require significant rewrites to the text. The main focus will be on the accompanying documentation, the Checklist and the FAQ. We also realized that in our zeal to solve the problem of data that needs to be provided but cannot be supplied by the model owner for good reasons, we had failed to make clear the basic requirement that "if you can share the data you must." We have already made adjustments in RC1 and will be seeking views on how to better express this in an RC2. In the next weeks until the 1.0 release of October 28, we'll focus on: * Getting more endorsers to the Definition * Continuing to collect feedback on hackmd and forum, focusing on new, unseen-before concerns * Preparing the artifacts necessary for the launch at All Things Open * Iterating on the Checklist and FAQ, preparing them for deployment. Link to the Open Source AI Definition Release Candidate 1 13 Comments A Journey toward defining Open Source AI: presentation at Open Source Summit Europe Co-designing the OSAID: a highlight from Nerdearla Keep up with Open Source [Your email ][Send me the newsletter] Please leave this field empty. [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] D[ ] We'll never share your details and you can unsubscribe with a click! Join Us * Mastodon * Twitter * LinkedIn * Reddit About * About * Vacancy: US Policy Analyst * Our team * Associations * Sponsors * Articles of Incorporation * Bylaws * History * Trademark Guidelines Open Source AI * What is Open Source AI * Deep Dive * Bi-weekly Townhalls * Online forum * OSAI Definition Roadshow Licenses * Open Source Definition * Licenses * License Review Process * Open Standards Requirement for Software Board * Board of Directors * Minutes * Elections * Organization & Operations * Conflict of Interest Policy * Board member agreement Community * Resources * Become an Individual Member * Events * Become an OSI Affiliate * Affiliate Organizations The content on this website, of which Opensource.org is the author, is licensed under a Creative Commons Attribution 4.0 International License. Opensource.org is not the author of any of the licenses reproduced on this site. Questions about the copyright in a license should be directed to the license steward. Read our Privacy Policy Proudly powered by WordPress. Hosted by Pressable. Manage Cookie Consent To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions. Functional [ ] Functional Always active The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network. Preferences [ ] Preferences The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user. Statistics [ ] Statistics The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you. Marketing [ ] Marketing The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes. Manage options Manage services Manage {vendor_count} vendors Read more about these purposes Accept Deny View preferences Save preferences View preferences {title} {title} {title} Manage consent