Green coding - Cohesity: Less is more, why & how a lean data diet is tastier

This is a guest post for the Computer Weekly Developer Network written by Mark Molyneux in his role as EMEA CTO at data security and data management company Cohesity.

Molyneux writes in full as follows…

The ‘greed is good’ era of the eighties and nineties are far behind us now, but some legacy practices have remained because it’s been easier to throw more resources (and money) at problems rather than tackle them in earnest.

In the new age of sustainable awareness, we now have a responsibility to react to what happened somewhere between the turn of the millennium and today. While the age of plenty might have been over in cultural terms, we have lived through an era of cheap compute and storage that has subsequently nestled up against the AI renaissance brought about by generative artificial intelligence and the rise of Large Language Models (LLMs) and it is these technologies that have fed off the increasing adoption of hybrid cloud and a new and much wider distributed computing landscape.

All of which has led us to the point where data volumes have reached dizzying heights.

Dizzy & drowning

That dizzying effect caused by the mushroom cloud of data storage sees companies starting to drown in their own data reserves (hint: it’s called a data lake for a reason, the unstructured morass is murky and deep) with so much data being unknown, unclassified or at best under-utilised. There is of course a direct environmental impact here and the climate emergency lobby has the IT department’s number.

It’s now time for a data diet, and organisations need to learn that they can (still very competently) do more with less. This is a two-tier programme.

Step #1

Our first step on the data diet sees organisations work to consolidate data onto a common platform, thereby eliminating the existence and wider proliferation of separate data silos. They achieve this through focused indexing and classification of data based on its content and value to the company, and its predefined relevant record strategy. That process allows firms to see the wood for the trees (or the fish for the water, the data for the lakes) giving them the ability to centralise only that data which they need to, rather than everything. It also enables a business to then harness ‘sister’ techniques in this space including data deduplication and compression. A leaner beast emerges, as this consolidation means firms see that they can achieve data reduction rates approaching 96 percent whilst subsequently also optimising storage resources and saving money as they boost operational efficiency from the start.

Step #2

The second part of the data diet (and this is not just for now, it’s an ongoing commitment) sees the business start to harness AI against their lean data mountain. Producing valuable insights from the data, bringing the ability to gain intelligence using natural language, at pace, enabling businesses to flourish. Valuable data starts to become even more valuable and that’s because we’ve worked out how much value it materially has. As the firm now predominantly only holds onto valuable, useful, profitable data (or that which can be used for the greater good of society), it can start to reduce overall data volumes and by automating the classification process, it can make smart decisions about what data to keep and what to delete, including retention to defined periods and no longer/shorter. This also helps organisations in any industry vertical to align with compliance regulations and to minimise unnecessary storage costs.

Throughout the second part of this decade and beyond, we are now at the point where organisations need to take a more conscientious approach to computing as they shoulder an amplified responsibility for their operational data footprint.

Bye-bye data duplicates

The processes that define this responsibility are clear. Organisations should say goodbye to obsolete data, systems that are rife with duplicates, data repositories populated with orphans… and outdated test systems that serve no practical or functional purpose. The sum result of these actions can be massive i.e. data mountains crumble, data lakes dry up (or at least dissipate somewhat) and entire data ecosystems start to circulate with a new order with less chaos and fewer instances of unpredictability.

Cohesity’s Molyneux: Get on the data diet, you’ll feel like a new you… and your company too.

As a business sees its data universe take on a new form, there is a unique opportunity to gain insights during cyber events to identify compromised, encrypted, or stolen data, which now becomes more robust when backed by strong data intelligence powered by AI and the Machine Learning (ML) that drives the models and engines being deployed. Data insights can also accelerate legal, regulatory, criminal, and auditory timelines.

Leaner (and keener)

The medium to longer-term effects of embracing a data diet extend beyond cost savings and operational efficiency. Organisations can now reduce total data volumes as they take the opportunity to optimise storage resources and improve a firm’s sustainability score.

As a business carries this data lifestyle change into the future, it should take this opportunity to prioritise data quality over data quantity. With this new ‘healthier’ (data) lifestyle under our belts, we see the business now ready to streamline its operational fabric, achieve compliance, reduce costs and become a greener organisation as a whole. By classifying, indexing, and consolidating data, then embracing AI and ML for intelligent data insights and maintenance – while also understanding that we need to adopt responsible data management practices – we can move to a more sustainable and productive intelligent data landscape.