Getty Images

Storage technology explained: Kubernetes, containers and persistent storage

In this guide, we look at the market-leading container platform Kubernetes, how it works, the challenges with persistent storage and backup, and how they have been overcome

Containerisation is synonymous with cloud-native application development, and Kubernetes is key among container orchestration platforms available.

In this article, we look at containerisation, what defines it, how Kubernetes fits with containerisation, how Kubernetes is organised, and how it handles persistent storage and data protection.

We also look at the container storage interface (CSI), which provides Kubernetes driver to link to storage array maker’s hardware.

Finally, we look at the Kubernetes management platforms provided by the key storage suppliers.

What is containerisation?

Containerisation is a form of virtualisation, perhaps best understood by comparing it with “traditional” server virtualisation.

Server virtualisation – think VMware, Nutanix – creates a hypervisor layer that masks server physical resources and is the location in which run numerous logical servers known as virtual machines.

Application containerisation does away with that hypervisor layer and works with the server OS. Containers encapsulate all that’s needed for an application to run, and can be created, spun up, cloned, scaled and extinguished very rapidly.

Containers are “lighter”, without the need for the hypervisor and multiple iterations of the virtualisation OS. They require fewer server resources and are very portable across on-premise and cloud environments. That makes containers well-suited to workloads that see massive spikes in demand, especially on the web.

Containers also work on the microservices principle, in which discrete application functionality is built into small as-code instances built around application programming interfaces (APIs) that link them together – this is in contrast to the large, monolithic applications of the past.

Containers and microservices are also synonymous with the iterative software development methodologies of DevOps.

What is Kubernetes?

Kubernetes is a container orchestrator. It’s not the only one. There’s also Apache Mesos, Docker Swarm, Nomad, Red Hat OpenShift and others. There is AWS Elastic Container Services (ECS), Azure Kubernetes Service and Google Cloud Kubernetes in the cloud. And there are VMware Tanzu products that manage Kubernetes in its virtualisation environment.

Container orchestrators handle functions such as the creation, management, automation, load balancing and relationship to hardware – including storage – of containers. They are organised, in Kubernetes-speak, in pods, which is a collection of one or more containers.

In this explainer, we’ll focus on Kubernetes. As mentioned, it’s not the only container orchestrator, but according to some research, it’s the overwhelming market leader with a 97%-plus share.

How is Kubernetes organised?

The container is the basic unit that contains application runtime and code, plus dependencies, libraries, etc. Containers are stateless in that they don’t store any data or information about previous states. They are supremely portable, clone-able, scalable and so on because they take everything they need with them. That statelessness is also a potential Achilles heel, as we shall see.

Next are clusters, that contain pods, and that host and manage containers. These containers can serve different functions – such as a UI, a backend database – but they are held on the same node (ie, server) and are close to each other and so communicate quickly.

Nodes are physical machines or VMs within them that run pods. They can be master nodes or worker nodes. Master nodes are the control plane that manages deployment of and the state of the Kubernetes cluster.

Component master nodes include: the API server, via which interaction with the cluster takes place; a scheduler that finds and determines the best nodes to run pods; the controller manager, that helps maintain the required state of the cluster, such as the number of replicas to be maintained; and etcd, which is a key-value store that holds the state of the cluster.

Worker nodes run containers with tasks delegated by the master nodes. Worker nodes comprise: Kubelets, which are the primary interface between the worker node and the master node control plane; kube-proxy, which handles network communications to pods; and container runtime, which is the software that actually runs containers.

What is the challenge with storage and Kubernetes?

At its most basic, storage in Kubernetes is ephemeral. That means it is not persistent and won’t be available after the container is deleted. Native Kubernetes storage is written into the container and created from temporary scratch space on the host machine that only exists for the lifespan of the Kubernetes pod.

But, of course, enterprise applications require persistent storage and Kubernetes does have ways of effecting that.  

How does Kubernetes provide persistent storage?

Kubernetes supports persistent storage that can be written to a wide range of on-premise and cloud formats, including file, block and object, and in data services, such as databases.

Storage can be referenced from inside the pod, but this is not recommended because it violates the principle of portability. Instead, Kubernetes uses persistent volumes (PVs) and persistent volume claims (PVCs) to define storage and application requirements.

PVs and PVCs decouple storage and allow it to be consumed by a pod in a portable way.

A PV – which is not portable across Kubernetes clusters – defines storage in the cluster that has been profiled by its performance and capacity parameters. It defines a persistent storage volume and contains details such as performance/cost class, capacity, volume plugin used, paths, IP addresses, usernames and passwords, and what to do with the volume after use.

Kubernetes at 10: Talking to the engineers that built persistent storage and backup

Kubernetes is 10! Mid-2024 saw the 10th birthday of the market-leading container orchestration platform.

We marked the first decade of Kubernetes with a series of interviews with engineers who helped develop Kubernetes and tackle challenges in storage and data protection.

Here are all six interviews with engineers from Google, DataStax, Percona, Red Hat and VMware who talk about the early storage and data protection challenges in Kubernetes.

Meanwhile, a PVC describes a request for storage for the application that will run in Kubernetes. PVCs are portable and travel with the containerised application. Kubernetes figures out what storage is available from defined PVs and binds the PVC to it.

PVCs are defined in the pod’s YAML configuration file so that the claim travels with it and can specify capacity, storage performance and so on.

The StatefulSet duplicates PVCs for persistent storage across pods, among other things.

A collection of PVs can be grouped into a storage class, which specifies the storage volume plugin used, the external – such as cloud – provider and the name of the CSI driver (see below).

Often, one storage class will be marked as “default” so it doesn’t have to be invoked by use of a PVC, or so it can be invoked if a user doesn’t specify a storage class in a PVC. A storage class can also be created for old data that may need to be accessed by containerised applications.

What is CSI?

CSI is container storage interface. CSI describes drivers for Kubernetes and other container orchestrators provided by storage suppliers to expose their capacity to containerised applications as persistent storage.

At the time of writing, there are more than 130 CSI drivers available for file, block and object storage in hardware and cloud formats.

CSI provides an interface that defines the configuration of persistent storage external to the orchestrator, its input/output (I/O), and advanced functionality such as snapshots and cloning.

A CSI volume can be used to define PVs. For example, you can create PVs and storage classes that point to external storage defined by a CSI plugin, with provisioning triggered by a PVC that specifies it.

What do storage suppliers offer to help with K8s storage and data protection?

The components of Kubernetes are numerous and modular. Perhaps unsurprisingly, storage array vendors have taken advantage of the possibility to wrap a further management layer over that and to make provision of storage and data services simpler for admins. Here, we look at storage supplier products in that space.

Requirements here range from configuration of resources according to the profile of storage required by applications, as well as the source and target of backups and other data protection functionality, all of which can rapidly change.

Dell EMC, IBM, HPE, Hitachi, NetApp and Pure Storage all have container management platforms that allow developers to write storage and data protection requirements into code more easily while also allowing traditional IT functions such as data protection to be managed without deep skills.

All use CSI drivers in some form to offer provisioning and management of storage and backup to their own, and, in some cases, any storage environment, including those in the cloud.

What do Dell Container Storage Modules do?

Dell’s Container Storage Modules (CSM) are based on CSI drivers. While basic CSI drivers help in provisioning, deleting, mapping and unmapping volumes of data, Dell CSMs aid automation, control and simplicity.

Several CSMs allow customers to access storage array features to which they normally wouldn’t have access. These CSM plug-ins target specific functionalities or data services, including replication, observability, resiliency, app mobility (cloning), snapshots, authorisation (ie, access to storage resources), and encryption.

Dell’s CSMs allow customers to make existing storage container-ready by providing access to Dell’s storage arrays rather than using additional software to access those features.

What does IBM’s Red Hat Openshift do for containers?

IBM’s acquisition of Red Hat in 2018 gave it the OpenShift portfolio, which is the main site of its containerisation management efforts.

OpenShift uses Kubernetes persistent volume claims via CSI drivers to allow developers to request storage resources. PVCs can access persistent volumes from anywhere in the OpenShift platform.

The OpenShift Container Platform supports many popular PV plugins on-site and in the cloud, including Amazon EBS, Azure Files, Azure Managed Disks, Google Cloud Persistent Disk, Cinder, iSCSI, Local Volume, NFS and VMware vSphere.

Hyper-converged infrastructure provider Nutanix also uses OpenShift as a container deployment platform.

How does HPE’s Ezmeral Runtime Enterprise help manage containers?

HPE has developed its own Kubernetes management platform, HPE Ezmeral Runtime Enterprise, which can be deployed via HPE’s Synergy environment.

It’s a software platform designed to deploy cloud-native and non-cloud-native applications using Kubernetes and can run on bare-metal or virtualised infrastructure, on-premise or in any cloud. It goes further than just app deployment, with data management including out to the edge.

Ezmeral delivers persistent container storage and configuration automation to set up container HA, backup and restore, security validation and monitoring to minimise manual admin tasks 

What does Hitachi Kubernetes Service do for container deployments?

In 2021, Hitachi joined the Kubernetes storage fray with Hitachi Kubernetes Service (HKS), which allow customers to manage container storage in on-premise datacentres and the three main public clouds.

HKS allows deployment of Hitachi Unified Compute Platform as a Kubernetes-managed private cloud across local and hybrid cloud environments. 

HKS uses CSI drivers to manage persistent volumes directly on Kubernetes nodes, which distinguishes it from the container-native offerings of other suppliers.

How does NetApp Astra help deploy and manage containers?

NetApp’s Astra is its container management platform. It comprises a number of components, including Astra Control, for management of Kubernetes application lifecycle management; Astra Control Service, for data management of Kubernetes workloads in public clouds; Astra Control Centre for on-premise Kubernetes workloads; and Astra Trident for CSI storage provisioning and management. There is also Astra Automation and its APIs and SDK for Astra workflows.

What functionality does Pure Storage Portworx provide to container deployments?

Portworx is Pure Storage’s container platform, and gives it container-native provisioning, connectivity and performance configuration for Kubernetes clusters. It can discover storage and provide persistent capacity for enterprise applications with access to block, file and object and cloud storage.

Customers can use Portworx to build pools of storage, manage provisioning and provide advanced functionality such as backup, disaster recovery, security, auto-scaling and migration local or cloud storage in the main cloud providers.

Read more about Kubernetes and storage

Read more on Cloud storage

CIO
Security
Networking
Data Center
Data Management
Close