å®ç¦ ç - stock.adobe.com
How open source is shaping AI developments
The Linux Foundation outlines efforts to bolster enterprise AI adoption through a framework for managing and deploying AI applications, standardised tooling and open data alternatives
The Linux Foundation is doubling down on efforts to drive a more open environment for building artificial intelligence (AI) applications and facilitate enterprise adoption of the technology through open source.
“In open source, there are specific areas where there are opportunities to take advantage of collective development – and these aren’t just about source code,” said Jim Zemlin, executive director of the Linux Foundation, at the KubeCon + CloudNativeCon + Open Source Summit in Hong Kong this week.
One of the key projects that the Linux Foundation spearheaded this year is the Open Platform for Enterprise AI (Opea), a framework to make it easier for organisations to deploy and manage AI applications.
Opea, Zemlin explained, was designed to be the “Kubernetes of AI” in the enterprise, providing a standardised, open source platform for companies to build and deploy their AI models more efficiently.
“If we all start using the Opea framework, it will be much easier for you and everybody else to improve this platform and get quickly to what you want, which is the actual AI application in your enterprise.”
Zemlin also touched on the Linux Foundation’s work on the Unified Acceleration (UXL) Foundation, an effort by semiconductor manufacturers and industry players like Google Cloud to create a common hardware abstraction layer for AI workloads.
“Nvidia’s Cuda is the de facto standard for accelerated workloads around AI, but we see an opportunity for open, abstracted APIs [application programming interfaces] that can work across multiple silicon architectures,” he said. “This will help drive more competition and make it easier for developers to create tools for a variety of hardware.”
AI safety
Regarding AI safety, an area open source is suited to address because of the transparent nature of open-source software development, Zemlin pointed to the Coalition for Content Provenance and Authenticity (C2PA), a project focused on ensuring the authenticity of digital content.
“In the world of generative AI, ensuring content authenticity is going to be deeply important,” he said. “C2PA provides a digital immutable watermarking technology that can track content from the creator all the way to the publisher, helping us to understand what’s real and what’s AI-generated.”
Zemlin also highlighted the Model Openness Framework, designed to help organisations evaluate and categorise the level of openness associated with different AI models – even those deemed as open source.
“Because there are so many moving parts in the production and deployment of large language models, we created a grading system to help organisations understand what components are open and included in a model versus what’s not,” he said.
Read more about AI and open source in APAC
- IBM will work with AI Singapore on technical exchanges to enhance Sea-Lion and make the region’s first LLM available to data scientists and engineers through its AI use case library.
- Red Hat is expanding its reach into smaller firms as well as the automotive and other industries to fuel its ‘high-double digit’ growth in the region.
- Data61’s open source seL4 microkernel project will be supported by a new foundation created under the auspices of the Linux Foundation.
- Agoda CTO Idan Zalzberg explains why the online travel agency with a massive technology footprint prefers to run things in-house with open source software and not rely too much on public cloud services.
The framework defines three levels of openness for an AI model, starting with level one, the highest level of openness, where all the data, instructions and components used to create the model are openly available.
The second level is “open tooling”, where the majority of the tooling and infrastructure are available as open source, followed by level three, where the data is not open but data cards that describe the data sets are available.
Zemlin emphasised the importance of this risk-based nuanced approach, as it addresses the complexities of the AI ecosystem, where openness may not be binary, but rather, a spectrum across various components.
In support of model production, he pointed to the LF AI and Data Foundation that’s driving a collection of projects to provide open source implementations for every aspect of the machine learning and large language model production process. These projects include Delta Lake, an open-source storage framework for data lakehouses, and Monocle, an AI observability service.
Zemlin also touched on the Linux Foundation’s efforts in open data, such as the Overture Maps project, a collaboration between Amazon, Microsoft, TomTom and Meta to create the world’s largest shared geospatial dataset.
“You see a whole new world where data is now being sold in order to train large language models, and it’s only available to the people who have the most money to buy that data,” he said.
“There’s an opportunity for us to also have an open alternative to closed data, but it’s something that’s going to require resources and a very focused approach. Our Overture Maps effort is an early example of how that can be done.”