Computational Storage - VAST Data: Practical truths for pragmatic file systems
In a number of follow-up pieces to the Computer Weekly Developer Network (CWDN) series focusing on Computational Storage, CWDN continues the thread with some extended analysis in this space.
The below commentary is written by Andy Pernsteiner in his capacity as field CTO for North America at VAST Data — the company is known for its technology approach designed to simplify the datacentre and redefine how organisations interact with data with its branded ‘Universal Storage’ concept.
Pernsteiner agrees with the suggestion that, on the face of it, Computational Storage Functions (CSF) and Computational Storage Disks (CSD) appears like a great step forward. After all, it has the promise of improving performance and efficiency by bringing the compute and storage closer together.
However, he advises, on the flip side, there are some considerations that must be made:
- Applications must be re-factored, both for API compatibility, but also to effectively partition data in a ‘node-local’ context.
- Hardware for CSDs (so far) is niche and does not fit into the trend towards commodity components.
Pernsteiner writes in full as follows…
VAST Data is seeing a shift away from the ‘node-local’ model. As networks and system bus speeds get faster, compute (both CPU and GPU) gets cheaper and applications become more varied, storage architects are increasingly adopting a more disaggregated model, where all compute can access all storage remotely.
There are definitely use cases that can take advantage of CSD, such as:
- Encryption
- Compression
- Routine processing (such as transcoding)
However, keep in mind that the way that CSD accomplishes this is to shrink the namespace addressable by compute down to a single SSD or single controller attached to SSDs.
This may make it well suited for doing processing on small silos of data (10s of TBs), but it does not lend itself to computing against a large data set (100s of TBs or PBs). Consider that Deep Learning often requires analysing large corpora [bodies] of data. CSDs don’t allow for this directly, rather one would have to split the task up into smaller units and then apply this across a large array of CSD silos.
Silos equate to complexity
In general, creating more silos leads to more complexity. The vast majority of large-scale compute environments are leveraging distributed file systems and object stores which allow all compute to access all data. This consolidation has been increasing, rather than decreasing. Applications (and the communities that consume them) are accustomed to being able to access more data than can fit on a single server’s worth of capacity.
Now, because of CSDs potential for lower latency and increased bandwidth, it can be well suited to embedding into larger-scale systems, but only as a means to an end.
Put another way, CSDs may be deployable as a building block for a scalable file system, such that each CSD was a ‘node’ and there was another layer of logic that allowed the nodes to participate in a global file system or objectstore. It does not seem suitable to be an application-facing storage solution on its own. This is somewhat akin to the ‘SmartNIC’, where SOCs can be embedded onto a NIC, allowing for more complex processing to occur as data passes through the network.
This can allow for acceleration of specific tasks (encryption, compression, etc) and allow the host CPU to be free to work on more ‘interesting’ work, however it only applies locally (to the data which it sees), not globally.
Refactoring with resilience
In order to make effective use of CSDs, not only must applications be refactored to interact with the appropriate (new) APIs, but they must also be constructed in a manner which assumes that it only will interact with data which is locally available.
One such example of this paradigm is the Hadoop Filesystem (HDFS). The original goals of that filesystem (and of the MapReduce processing framework which was typically paired with it) were to:
- Provide resilience in case of disk or node failure, through triplication.
- Reduce latency and increase throughput by keeping the storage close to the compute — primarily because network fabrics were not fast enough (1gigE) to satisfy the I/O requirements.
Breaking the bottleneck
But times have now changed and networks have now caught up with storage (even NVMe storage) in terms of bandwidth and significant strides have been made to reduce latency. The network is no longer a primary bottleneck in modern datacentres for moving data to compute. Plus, with the push for all-flash datacentres, storage is also up to the task of feeding even the most demanding GPU and CPU applications with I/O, even when they are across the network.
Further advancements such as GPU Direct Storage, which allow remote data to bypass main memory in route to a GPU, further narrow the gap.
As such, institutions and enterprises are migrating away from HDFS and deploying scale out NAS and ObjectStorage systems to house their analytics data. This enables them to build compute farms without the need to size for storage capacity.
Coupled with a resilient, highly available NAS or ObjectStore, it also allows administrators and application developers to worry less about data partitioning and protection and spend more time on optimising and enhancing their applications… which, has to be a good thing, right?