vege - stock.adobe.com
Unstructured data storage – on-prem vs cloud vs hybrid
We look at storage for unstructured data on-premise, in the cloud and across multiple locations. There are advantages to a hybrid approach, but there can be hidden costs, too
Businesses face the need to store ever-larger volumes of information, across a growing number of formats.
Business data is no longer confined to structured data in orderly databases or enterprise applications. Instead, businesses may need to capture, store and work with documents, emails, images, videos, audio files and even social media posts. All contain information that has the potential to improve decision-making.
But this presents challenges for IT systems that were designed with structured rather than unstructured data in mind.
That is because technologies that efficiently store databases, for example, are not well suited to the larger file sizes, data volumes and long-term archival needs of unstructured data.
Industry analysts IDC and Gartner estimate that about 80% of new enterprise data is now unstructured. Clearly, there is a business benefit in being able to keep and analyse that data, and in some cases long-term storage is mandated for compliance reasons.
But traditional storage technologies were not designed for either the volume or variety of such data.
As Cesar Cid de Rivera, international VP of systems engineering at supplier Commvault, points out, differing file sizes alone – say a video file versus a text document – present issues for storage. And enterprises face dealing with what he describes as “dark pools of data”, generated or moved automatically from a central system to an end-user’s device, for example.
Also, data is generated in other systems outside conventional IT, such as software-as-service (SaaS) applications, internet of things (IoT) endpoints, or even potentially from machine learning and artificial intelligence (AI). This data also needs to be found, indexed and stored.
This puts pressure on storage infrastructure. And enterprises are increasingly finding that a single approach to storage – all on-premise or all-cloud – fails to deliver the cost, flexibility and performance they need. This is leading to growing interest in hybrid solutions or even technologies, such as Snowflake, that are designed to be storage agnostic.
“The criteria to consider are the volume, the data gravity – where it is being generated, where it is being used, computed or consumed – security, bandwidth, regulations, latency, cost, change rate, transfer required and cost,” says Olivier Fraimbault, a board director at SNIA EMEA.
“The main issue I see is not so much storing massive amounts of unstructured data, but how to cope with the data management, rather than the storage management of it.”
Nonetheless, firms need to consider conventional storage performance metrics, especially I/O and latency, as well as price, resilience and security for each possible technology.
Managing unstructured data on-site
The conventional approach to storing unstructured data on-site has been through a hierarchical file system, delivered either through direct-attached storage in a server, or through dedicated network-attached storage (NAS).
Enterprises have responded to growing storage demands by moving to larger, scale-out NAS systems. The on-premise market here is well served, with suppliers Dell EMC, NetApp, Hitachi, HPE and IBM all offering large-capacity NAS technology with different combinations of cost and performance.
Generally, applications that require low latency – media streaming or, more recently, training AI systems – are well served by flash-based NAS hardware from the traditional suppliers.
But for very large datasets, and the need to ease movement between on-premise and cloud systems, suppliers are now offering local versions of object storage.
The large cloud “superscalers” even offer on-premise, object-based technology so that firms can take advantage of object’s global namespace and data protection features, with the security and performance benefits of local storage. However, as SNIA warns, these systems typically lack interoperability between suppliers.
The main benefits of on-premise storage for unstructured data are performance, security, plus compliance and control – firms know their storage architecture, and can manage it in a granular way.
The disadvantages are costs, including upfront costs, a lack of ability to scale – even scale-out NAS systems hit performance bottlenecks at very large volumes – and a lack of redundancy and, possibly, resilience.
Moving to the cloud?
This has led firms to look at cloud storage, for reasons of lower initial costs and its ability to scale.
For object storage – and almost all cloud storage is object-based – there is also the ability to handle large volumes of unstructured data efficiently. A global namespace and the way metadata and data are separate improves resilience.
Also, performance is moving closer to that of local storage. In fact, cloud object storage is now good enough for many business applications where I/O and especially latency are less critical.
Read more on unstructured data
- Five key points about unstructured data storage on-prem and cloud. We look at unstructured data, the myriad forms of data it comprises and the key storage options available, which include NAS and object storage on-prem and in the cloud.
- Pure says unstructured data needs storage scale and performance. Podcast: Pure Storage says huge growth of unstructured data and its diversity means storage has to be able to scale with it and offer performance to gain insights.
Cloud storage cuts the (up-front) cost of hardware and allows for potentially unlimited long-term storage. Nor do firms need to build redundant systems for data protection. This can be done within the cloud provider’s services or, with the right architecture, by splitting data across multiple suppliers’ clouds.
Because data is already in the cloud, it is relatively straightforward to relink it to new systems, such as in a disaster recovery scenario, or to connect to new client applications via application programming interfaces (APIs). With Amazon’s S3 the de facto object storage technology, business applications are easier than ever to connect to cloud data stores.
And with data in the cloud, users should see little or no practical performance hits as they move around their organisation or work remotely.
Disadvantages of cloud storage include lower performance than on-premise storage, especially for I/O-heavy or latency-intolerant applications, potential management difficulties (anyone can spin up cloud storage) and potential hidden costs.
Even though the cloud is often viewed as a way to save money, hidden costs such as data egress charges can quickly erode cost savings. And, as SNIA EMEA’s Fraimbault cautions, although it is now fairly easy to move containers between clouds, this becomes harder when they have their own data attached.
Hybrid options
As a result, a growing number of suppliers now offer hybrid technologies that can combine the advantages of local, on-premise storage with object technology and the scalability of cloud resources.
This attempt to create the best of both worlds is well suited to unstructured data because of its diverse nature, varied file sizes, and the way it might be accessed by multiple applications.
A system that can handle relatively small text files, such as emails, alongside large imaging files, and make them available to business intelligence, AI systems and human users with equal efficiency is very appealing to CIOs and data management professionals.
Also, organisations also want to future-proof their storage technologies to support developments such as containers. SNIA’s Fraimbault sees the way hybrid cloud is moving to containers, rather than virtual machines, as a key driver for storing unstructured data in object storage systems.
Hybrid cloud offers the potential to optimise storage systems according to their workloads, retaining scale-out NAS, as well as direct-attached and SAN storage, where the application and performance needs it.
But lower-performance applications can access data in the cloud, and data can move to the cloud for long-term storage and archiving. Eventually, data could move seamlessly to and from the cloud, and between cloud providers, without either the application or the end-user noticing.
This is already happening through data storage technologies such as Snowflake, which makes use of local and cloud storage and last year upgraded its product to support unstructured data.
Meanwhile, other suppliers, such as Microsoft, are increasing their support for hybrid storage through its Azure Data Factory data integration service.
Best of all worlds?
However, the idea of truly location-neutral storage still has some way to go, not least because cloud business models rely on data transfer charges. This, the Enterprise Storage Forum warns, can lead to bloated costs.
Indeed, a recent survey by supplier Aptum found that almost half of organisations expect to increase their use of conventional cloud storage. As yet, there is no one-size-fits-all technology for unstructured data.