Containerisation in the enterprise - New Relic: MELT-ing Kubernetes observability

As businesses continue to modernise their server estate and move towards cloud-native architectures, the elephant in the room is the monolithic core business application that cannot easily be rehosted without significant risk and disruption. 

These days, it is more efficient to deploy an application in a container than use a virtual machine. Computer Weekly now examines the modern trends, dynamics and challenges faced by organisations now migrating to the micro-engineered world of software containerisation.

As all good software architects know, a container is defined as a ‘logical’ computing environment where code is engineered to allow a guest application to run in a state where it is abstracted away from the underlying host system’s hardware and software infrastructure resources. 

So, what do enterprises need to think about when it comes to architecting, developing, deploying and maintaining software containers?

This post is written by Kevin Downs in his capacity as solutions strategy director at New Relic — the company is known for its digital intelligence observability platform built to help software engineers instrument everything, then analyde, troubleshoot and optimize the software stack.

Downs writes as follows…

In the software delivery lifecycle, observability is the practice of collecting, visualising and applying intelligence to all of your metrics, events, logs and traces in order to gain an understanding of your software system. If monitoring tells you when something is wrong, observability lets you understand why.

Observability is not a new concept, but as organisations adopt Kubernetes and move towards more complex and interconnected sets of computing resources, it is increasingly important. That is because in modern IT shops, tools can sprawl across multiple teams and sources, which can create challenges in managing and instrumenting assets in the IT estate. Consolidating all of your data into a single observability platform allows you to organise your toolbox and find the right tool for a job. Plus, by bringing together a set of disparate tools, engineering teams can achieve observability into their Kubernetes environment. This means teams can pinpoint issues in software.

Many recent technologies and practices allow software professionals to deploy their production code much more frequently, including those in microservices, cloud, containers, serverless, DevOps, service reliability engineering (SRE) and infrastructure-as-code. The gains are real, but these approaches also introduce a higher degree of complexity, volatility and fragmentation of the software architecture, especially when it comes to ensuring availability, quality, performance and end-user experience.

Observability addresses these challenges by rethinking monitoring techniques, an especially useful advantage when working with containers.

By having a connected view of all software telemetry data, real-time observability allows us to monitor the performance of a digital architecture. It also enables transparency. IT operations teams can achieve Kubernetes observability with a set of disparate tools that can be referred to as ‘MELT’, metrics, events, logs and traces, as well as the open source instrumentation framework Prometheus.

So let’s take each of these in turn.

Metrics

The first element required for IT teams to achieve Kubernetes observability is metrics. Observability software needs to be able to consume metrics that diverse teams have adopted. For Kubernetes, IT teams should capture metrics for the cluster, pods and available nodes for the Kube state. Metrics are a good starting point for observability as they are low overhead to collect, inexpensive to store, dimensional for quick analysis and a great way to measure overall health.

Events

New Relic’s Downs: Ice cool MELT about container orchestration.

The next element, events, is often the most overlooked telemetry type, but is actually the most critical and must be part of every observability solution. Events are discrete, detailed records of significant points of analysis. Kubernetes offers a wealth of events you need to capture with our observability solution. Examples include alerts, deployments, transactions, and errors, all of which provide the ability to undertake fine-grained analysis in real time.

Logs

While events and logs share some similarities, the two are often mistakenly confused. Events contain a higher level of abstraction than the level of detail provided by logs. Logs are important in observability when an engineer is in deep debugging mode and trying to understand a problem. Logs provide high-fidelity data and detailed context around an event, so engineers can recreate what happened millisecond by millisecond, providing detailed context.

Adding Kubernetes logs to the mix enhances opportunities for operators to solve problems as-and-when needed. You should make sure your observability solution allows you to filter your logs based on your need. For example, if you are troubleshooting a pod, you only really need to view logs about that pod.

Traces

Traces are valuable for showing the end-to-end latency of individual calls in a distributed architecture. These calls give specific insight into the myriad customer journeys through a system. Traces enable engineers to understand those journeys, find bottlenecks and identify errors so they can be fixed and optimised.

A microservices environment demands that operators, developers and site reliability engineers have access to application traces. Metrics, events and logs are all related to the individual components of your overall application environment. Traces tie all of the components together and allow for an understanding of how an application is connected and help pinpoint issues or optimisation opportunities. 

Prometheus

Any container-focused Kubernetes operator worth their salt is familiar with Prometheus. Prometheus is an open source instrumentation framework that can absorb massive amounts of data every second, making it well suited for complex workloads. Make sure you take advantage of Prometheus to monitor your servers, VMs, databases and use that data to analyse the performance of your applications and infrastructure.

It doesn’t take much to achieve observability for a Kubernetes environment. Remember, MELT, by its very nature, is a set of disparate tools.

In modern IT shops, these tools can sprawl across multiple teams and sources. Consolidating all of this data into a single observability platform allows you to organise your toolbox and quickly find the right tool for every job. Appling MELT and including available Prometheus telemetry data allows you to bring your Kubernetes environment into an overall observability solution. In doing so, your team can be confident that you can properly manage and instrument assets across the organisation. Full Kubernetes observability enables IT teams to fix issues quickly and proactively, and have a better understanding of how to prevent further issues from occurring. This means better software performance overall for colleagues and customers.