How Kubernetes is becoming a platform for AI

Data scientists and artificial intelligence engineers need a host of tools to build AI-powered applications – this requires a complex IT environment

Kubernetes is fast approaching the point of being ready for the enterprise, according to the Cloud Native Computing Forum (CNCF), and artificial intelligence (AI) is one of the application areas that could benefit from the container orchestration platform.

Opening the first KubeCon conference in China, Dan Kohn, executive director of the CNCF, said: “Graduated projects like Kubernetes have crossed the chasm and are ready for early adoption.”

AI featured highly among the technical areas being showcased at the Shanghai conference.

Containers make it easy to distribute and re-use applications, along with the infrastructure needed to run them. The approach of packaging applications in containers has been used in the scientific community to support peer review, by enabling scientists to share their applications used in their research. Similarly, AI tends to require a full ecosystem of software components and expensive graphics processing units (GPUs) to accelerate AI training. 

During a Kubernetes AI panel discussion, Dave Aronchick, a product manager at Google, said: “One of the fundamental things about orchestration is that it offers a totally different way to do things by using cloud-native technologies to break down monolithic applications into microservices.”

This concept is behind the Kubeflow project, which aims to provide a base layer on which workloads for machine learning and AI can be deployed and run.

The challenge with AI and machine learning, according to Aronchick, is that many different software libraries need to be brought together. For instance, the Tensorflow library, which is used in applications such as image recognition, is just one aspect of a complex framework for machine learning. “Kubernetes is a base layer that lets you build a true end-to-end platform,” he said.

Xin Zhang, CEO of startup Calcloud, said artificial intelligence required a new way to think about operating systems. “Every business is data-driven,” he said. “A new operating system is needed for the AI age.”

Many components are needed to make an effective AI-driven application. For Zhang, the challenge for data scientists is that it is not just about the code. “The algorithm is only a fraction of the solution. You need to manage all the GPUs and CPUs, and retuning of the AI model, as well as distributed training,” he said.

“Every business is data-driven. A new operating system is needed for the AI age”
Xin Zhang, Calcloud

Data scientists are generally not well placed to manage all the infrastructure requirements for machine learning and AI. Kubernetes offers a way to connect these infrastructure components through an open source project called Kubeflow. “By leveraging Kubeflow, you can lower the barrier for data scientists,” he said. 

Xinglang Wang, a principal engineer at eBay, said AI had a high barrier to entry, but packaging tools in a Kubernetes cluster made it easier for businesses to get started on an AI project. At eBay, he said Kubernetes was used to create a unified AI platform, which enables data sharing and sharing of AI models. The AI platform also provides automation to enable eBay to train and deploy AI models

One of the big users at the KubeCon Shanghai event was Chinese e-commerce retailer JD.com. Explaining the use of AI at JD.com, principal architect Yuan Chen described how the the company was running one of the largest Kubernetes clusters in the world.

While it was traditionally used to support a microservices architecture, he said: “Everything is now driven by AI, so we have to use Kubernetes for AI. It is the right infrastructure for deep learning to train the AI models. AI scientists are expensive, so they should focus on their algorithms and not have to worry about deploying containers.”

One example of JD.com’s application of AI is the use of a containerised Kubernetes cluster for machine learning to manage product image quality control, identification and categorisation. Chen said the company uses dynamic workload management to balance workloads between expensive GPUs and processing that can be done on cheaper CPUs. 

Chen said machine learning could also be applied to Kubernetes itself for administering container management. “It is all about efficiency and saving costs by using machine learning to improve Kubernetes,” he said. At JD.com, machine learning is being used for efficient workload scheduling and maximising utilisation of IT resources.

Read more about containers

Read more on Artificial intelligence, automation and robotics