What are ‘mature’ stateful applications?

BlueK8s is a new open source Kubernetes initiative from ‘big data workloads’ company BlueData — the project’s direction leads us to learn a little about which direction containerised cloud-centric applications are growing.

Kubernetes is a portable and extensible open source platform for managing containerised workloads and services (essentially it is a container ‘orchestration’ system) that facilitates both declarative configuration and automation.

The first open project in the BlueK8s initiative is Kubernetes Director (aka KubeDirector), for deploying and managing distributed ‘stateful applications’ with Kubernetes.

Apps can be stateful or stateless.

A stateful app is a program that saves client data from the activities of one session for use in the next session — the data that is saved is called the application’s state.

The company reminds us that Kubernetes adoption is accelerating for stateless applications and microservices… and the community is beginning to evolve and mature the capabilities required for stateful applications.

Mature stateful apps?

What they (it, the company) really means here are large-scale distributed typically complex stateful applications

These large-scale distributed stateful applications – including use cases in analytics, data science, machine learning (ML) and deep learning (DL) applications… plus also for AI and big data use cases – and the problem is that these apps are still complex and challenging to deploy with Kubernetes.

Typically, stateless applications are microservices or containerised applications that have no need for long-running [data] persistence and aren’t required to store data.

But, that being said, cloud native web services (such as a web server or front end web user interface) can often be run as containerised stateless applications since HTTP is stateless by nature: there is no dependency on the local container storage for the workload.

Stateful applications, as stated above, are services that save data to storage and use that data; persistence and state are essential to running the service.

Example uses

These mature stateful apps include databases as well as complex distributed applications for big data and AI use cases: e.g. multi-service environments for large-scale data processing, data science and machine learning that employ open source frameworks such as Hadoop, Spark, Kafka, and TensorFlow as well as a variety of different commercial tools for analytics, business intelligence, ETL and visualization.

Kumar Sreekanti, co-founder and CEO of BlueData explains that in enterprise deployments, each of these different tools and applications need to interoperate in a single cohesive environment for an end-to-end distributed data pipeline. Yet they [mature stateful apps that is] typically have many interdependent services and they require persistent storage that can survive service restarts. They have dependencies on storage and networking, and state is distributed across multiple configuration files.

Sreekanti points out that the Kubernetes ecosystem has added building blocks such as Statefulsets – as well as open source projects including the Operator framework, Helm, Kubeflow, Airflow, and others – that have begun to address some of the requirements for packaging, deploying, and managing stateful applications.

But, claims BlueData, there are still gaps in the deployment patterns and tooling for complex distributed stateful applications in large-scale enterprise environments.

BlueData recently joined the Cloud Native Computing Foundation (CNCF) – the organisation behind Kubernetes and other cloud native open source projects – in order to foster collaboration in this area with developers and end users in the Kubernetes ecosystem.

KubeDirector is currently in pre-alpha and under active development.