Open-source AI must reveal its training data, per new OSI definition
Illustration by Cath Virginia / The Verge | Photos by Getty Images
The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules.
OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that aren’t covered by conventional licenses, like model training data. Now, for an AI system to be considered truly open source, it must provide:
Access to details about the data used to train the AI so others can understand and re-create it
The complete code used to build and run the AI
The settings and weights from the training, which help the AI produce its results
This definition directly challenges Meta’s Llama, widely…