Antony Adshead/R

Cloud the rising star of storage: Eight years of text analysis

Tracking word frequency in nearly a decade’s worth of storage content shows how cloud has come of age, flash has gone mainstream and how we don’t talk about virtualisation any more

Has cloud storage achieved previously unseen levels of maturity and development?

It looks that way, by breaking out of uses restricted to secondary storage, such as backup, and becoming a viable option for production storage workloads.

That’s one conclusion we can draw by carrying out text analysis on nearly a decade’s worth of ComputerWeekly.com storage content.

By looking at the frequency of keywords in several thousand articles written since the beginning of 2011 we’ve been able to spot some patterns.

First, in terms of the frequency of particular words. There are some clear cases of stars that have risen and fallen to relative obscurity (server virtualisation), and some that have shone very brightly for a short while (NVMe).

Second, by tracking word frequency it is possible to hypothesise on relationships between technologies that emerge when we visualise data gained in text analysis.

More of that in a moment.

The ups and downs of storage trends

The chart below shows the big picture of word frequency for selected key words in ComputerWeekly.com’s storage content since 2011.

Word frequency for selected keywords in ComputerWeekly.com’s storage content since 2011

Going back to 2011, storage concerns look quite different from today (see the word cloud below for a snapshot).

Backup was a major topic in ComputerWeekly.com’s storage content, in particular with the then newly productised data deduplication functionality, and for virtual servers which were very much sweeping datacentres at the time. The cloud was beginning to be talked about more frequently at this time also. Disk is prominent, while flash is nowhere to be seen in the more highly ranked words.

Storage keyword frequency in 2011

Looking at the line chart, what we see is that from 2011 to 2014, virtualisation and backup track each other quite closely in terms of word frequency. That could indicate a relationship in the content which reflects that many IT departments were getting to grips with the knock-on effects of virtualisation, and dealing with vital but mundane tasks such as data protection.

At the same time, it is noticeable that mentions of cloud storage mirror the ups and downs of backup too. This is likely to be because cloud storage for some time – due to concerns over latency and availability – was mostly viewed as reliable only for secondary storage use cases.

The fast rise of flash storage

The rapidly rising star in that period is flash storage, going from 586 mentions in 2012 to 1,804 in 2014.

We can see this in the word cloud for 2014, in which flash dominates while cloud storage, backup and virtualisation still retain some prominence and disk has dropped away noticeably.

Storage keyword frequency in 2014

From 2016, backup and virtualisation decline as topics in the content. On the one hand, this is possibly the result of many organisations having completed infrastructure improvements around virtualisation. On the other, this is the time we see the rise of hyper-converged infrastructure and its virtualisation-in-a-box. Meanwhile, cloud storage loses some prominence alongside backup and virtualisation.

A noticeable trend is that flash storage declines in word frequency to sub-2012 levels by 2018.

First, this is likely to be the result of flash storage – in its initial incarnations at least – going mainstream and the early-phase explainers and case studies in our content dying away. On the other hand, a new and interesting manifestation of flash storage began to rise in prominence.

That was NVMe flash, which we started to hear a lot about in 2017. It achieved a huge prominence in that year, coming from nowhere to 600 mentions in ComputerWeekly.com content. In 2018, however, we heard less about NVMe, which may reflect the subsiding of initial excitement as suppliers failed to settle on a standard way of productising the technology.

Cloud storage the latest star

Finally, the real big hitter of the past year or so has been cloud storage. At the beginning of 2018, we started to hear far more about it, and in ways that seemed more diverse and mature than previously. That coincides in the chart with cloud storage diverging from the ups and downs of backup during 2017 and setting off on its own course in 2018.

Storage keyword frequency in 2018

Where the cloud was for some years only really viable as secondary or archive storage, we started to hear more about production workloads being run in the cloud, or at least of the potential to do so.

Phrases such as cloud bursting entered the lexicon, while the key cloud providers begun to offer well-developed native file, block and object storage. Meanwhile, array makers now offer the cloud as a tier and have virtual cloud versions of their products, while a range of suppliers now offer file and object storage that could span on-premise and cloud locations.

More than a snapshot

It could be argued that text analysis of one section of one publication’s content can’t give a full picture of reality. Further to that, there will undoubtedly have been an element of selection by relatively few people involved in commissioning and writing that work. There’s certain to be a “meta” element to this analysis.

But, nearly a decade’s worth of words does provide a decent sample size and the kind of conclusions drawn here do seem to accord with how we’ve seen storage and backup develop since 2010.

Read more about cloud storage

Word clouds and chart generated with tm and tm_map packages in statistical programming environment, R.

Read more on Cloud storage