Maksim Samasiuk - Fotolia

NAS vs object: Which one for large volumes of unstructured data?

Both NAS and object storage offer highly scalable file storage for large volumes of unstructured data, but which is right for your environment?

Object storage is a fashionable topic, boosted by its massive scale-out capability and its related ability to handle very large amounts of unstructured data – object technology now underpins much cloud storage, for example.

However, file-based network-attached storage (NAS) remains widely used and sees continued development with the advent of clustered NAS, and it too is targeted at use cases that involve large amounts of unstructured data

So how do you differentiate and choose between the two? Will everything trend towards object storage, or are there application areas where NAS will remain supreme? Or is this a false dichotomy, with object storage and NAS merely being two views on the same thing?

We are seeing the two overlap more and more. Many object storage systems also offer file (and block) interfaces, while high-end NAS employs many of the same infrastructure elements that make object storage possible, most notably scale-out technology. We even have systems, such as NetApp’s latest iteration of StorageGrid, that allow you to write data as a file and read it back as an object.

Indeed, there are strong grounds to argue that object storage is merely file storage done right. After all, the original NAS file systems were something of a bodge and still have issues, despite being upgraded and updated over the years.

For example, even though we have migrated from the 8.3 filenames format imposed by MS-DOS to the flexible formats allowed today, we can still fool computers into running malware by giving a file a different extension.

Some in the business have even suggested the proponents of object storage did it a disservice by giving it a new name. Had they instead called it an enhanced file system, it would have looked a lot less scary and unfamiliar to many potential users. Of course, it might also have looked a lot less innovative and intriguing to others.

NAS vs object: Balancing the scales

The first scale-related aspect to consider is that the larger, older and more unstructured your data store is, the more likely it is to be suited to object storage. Conversely, NAS may be a simpler and better-performing option for fast-changing data or small stores.

Object storage enables enterprises and service providers to manage multi-petabyte secondary storage with relative ease. This does not directly compete with traditional file and block storage for serving frequently-accessed data and transactional workloads.

In addition, when we refer to storage performance we usually think in terms of speed, latency and throughput in the datacentre. This is very different to the cloudy world of distributed applications and clients, where mobile devices typically access data over long distances and from widely disparate locations.

The second differentiator is geographic scale. In the distributed world we need distributed storage performance and throughput. This is something that distributed object storage architectures can supply effectively, thanks to a combination of fast and reliable object streaming, load balancing and various caching mechanisms that enable support for a multitude of concurrent clients simultaneously. Add Rest-based protocols such as Amazon S3, and it makes object particularly efficient as storage for remote devices.

Meanwhile, however, there is no doubt that scale-out NAS deployment to very large volumes of data is thoroughly achievable. Indeed, in many cases it is now the primary option for huge volumes of file data in a highly-scalable clustered file system.

Scale-out NAS offers significant advantages over traditional, or scale-up, NAS. Traditional NAS is based on discrete file system instances, and is limited in terms of hardware scalability. Meanwhile, scale-out NAS allows expansion of its parallel file system across clusters of hardware nodes, with the ability to grow capacity and performance independently, often to petabyte scale.

An object lesson in fault-tolerance

Scale-out capability, therefore, keeps NAS competitive for larger data volumes. Of course the scale-out metaphor is also the norm for object storage, although object platforms such as Ceph and Scality operate in somewhat different ways from scale-out NAS.

Where NAS uses Raid to stripe and mirror data for data protection, they instead distribute and replicate objects (file data plus associated metadata) across storage nodes available to them, using fault-tolerant technologies known as forward error correction (FEC) or erasure coding.

An issue for NAS and Raid is that as disk drives grow in capacity to meet the ongoing data explosion, the system’s ability to survive loss of drives becomes ever more tenuous. In the days when rebuilding a drive meant reassembling a few gigabytes of data, the time required was tolerable.

But with drives now in the terabytes, a rebuild can mean pulling several hundred times more data over an interface only 10 or 20 times faster than it was in the days of LVD parallel SCSI. As rebuild times grow, so does the risk of a second drive failure, and protecting against that also greatly increases the cost and complexity involved.

In contrast, object storage is generally less efficient in its use of physical storage capacity and typically stores each object three times for resilience. It can use distributed nodes, however, and distributes the data (which can improve performance) and the risk. Data can also reside on commodity storage, which brings down the overall cost.

Data in the cloud

So, there are considerable attractions to an object infrastructure, even if you then use it to provide a file system interface – as indeed many cloud storage providers do.

Having said that, NAS is tried and tested. For smaller sites and data volumes, scale-up NAS will remain effective and simple to implement. Similarly, where you need outright performance and low latency in the datacentre, and of course for compatibility with today’s applications which expect CIFS or NFS, scale-out NAS is likely to remain king. NAS is also a good option where you have frequently changing data, because object storage is built with relatively static data in mind.

But while scale-out NAS can provide high performance, it is limited to perhaps a few petabytes and it comes at a cost. In particular there is networking complexity and expense, with some already implementing 40Gbit Ethernet or InfiniBand for storage traffic.

Once you hit scale, whether in terms of capacity, geographic coverage or both, object storage can provide many of the benefits more simply. It can also be more resilient – self-healing erasure coding is faster and more efficient than legacy technologies such as Raid – and will be more useful if you plan private cloud-type applications.

So, for many users a shift to cloud-oriented object storage, perhaps with a file-oriented overlay, will pay dividends for much of their unstructured data.

Read more about storage and scale-out NAS

Read more on SAN, NAS, solid state, RAID