Panasas on data architecture - filesystem finesse in the CS Wild West
This is a guest post for the Computer Weekly Developer Network written by Curtis Anderson in his capacity as senior software architect at Panasas — a company known for its PanFS parallel file system on the ActiveStor ‘ultra turnkey’ appliance for HPC workloads.
Anderson is responsible for advanced projects and future planning at Panasas, concentrating on OSD (Object Storage Device) architecture and client strategy… and his comments come in the wake of Computer Weekly’s series and feature devoted to examining computational storage.
This discussion follows on from Anderson’s earlier discussion: Panasas on data architecture – haystack & hammers, choose your poison here.
Anderson writes as follows…
In the world of Computational Storage (“CS”), the metric that I like when comparing CS device architectures is the ratio between the bandwidth and latency that the CPU cores inside the CS device have compared to the bandwidth and latency that the host CPU has.
So for example, if the product can offer 10x higher bandwidth or 50% lower latency to onboard applications compared to host-CPU based applications, then the value is high enough that developers will take the trouble to port to the CS device.
If the ratio is only 2x, I’m not sure there’s enough value there to warrant the effort.
The savings in the costs of energy and cooling are significant, especially when deployed at scale, but I believe that the decisions around adopting CS devices will be more driven by direct economics than indirect energy costs. Let me draw an analogy to GPUs in AI/ML.
The CS killer app
CS devices will take off like a rocket if a ‘killer app’ can be found, but they’ll languish if the primary advantage they offer is a cost reduction in energy consumption, while requiring significant investment in software to take advantage of them.
Looking wider at implementation issues, I don’t think I agree with the notion that media streaming is the killer application for CS functions.
Media is an extremely read-intensive application, which matches with CS device capabilities, but media is also a low-compute task.
Streaming is (just) shovelling data out the network pipe at metered intervals. Given the need for a host, since the CS devices don’t natively have Ethernet interfaces, you may as well use the host CPU to do the shovelling.
The 10x performance advantage of putting your application on the CS device is not usable in that scenario. The device is limited by its connection to the host, so the CPUs in the CS device will be idle unless you’ve got something computational for them to do.
I have not done the math[s], but I suspect that real-time transcoding of a master media image to match the end user’s display resolution might use the CS device’s CPUs, but (again without doing the math[s] here) I suspect you’re still better off storing pre-scaled copies of the media stream instead.
Wobble factor: co-processors & centralisation
The industry as a whole has wobbled back and forth between specialised attached co-processors and centralisation on the CPU for decades, based upon asymmetric advances in one technology pool versus the other.
I see no reason to think that that will change over the next decade.
The GPUs that completely dominate AI/ML today are ‘attached co-processors’ that enable algorithms that are practically impossible on general-purpose CPUs. Eventually, the CPU vendors will engineer responses to the dominance of GPUs and may be able to take back control of the market.
Remember, the only constant is change.
Two decades ago the industry had (I think) something like a dozen commercially significant operating systems, every system vendor had their own: Solaris, IRIX, NetWare, etc. At this point, I will argue that we are down to 1.5: Linux and some remaining use of Windows. As much as I like Apple products, does MacOS play any role in the datacentre?
Filesystem finesse
As a filesystem designer, I agree completely that a filesystem knows orders of magnitude more about the data it is storing than a block device does. The type of data being stored can be inferred from the file name/extension, as can the access pattern (sequential versus random), the relationship between two files can be inferred from their presence in the same directory etc.
Well, ‘inferred’ is not a guarantee, of course, but is much better than the ‘I have no idea if those two disk sectors are related’ that you get with a block device. Those inferences are used by filesystems to improve performance.
The Panasas parallel filesystem product, PanFS, is based upon Object Storage Devices (OSD). They’re essentially an early generation of CS device that had a fixed-function firmware image preloaded rather than being programmable via downloadable Containers. That has allowed us to implement valuable performance optimisations for the filesystem as a whole, based upon the context that each OSD has for what it is storing and its ability to compute optimal strategies for storing and retrieving data.
There are of course negatives with computational storage i.e. the network architecture required to run computational storage is inherently more complex and software engineers will need to think about the number of Application Programming Interfaces (APIs) that will need to be informed.
That is a significant challenge, to be sure.
Remember the promises of ‘automatically parallelising compilers’, taking existing source code and automatically spreading the sequence of instructions across multiple cores, without the programmer needing to be aware of it. New languages that are inherently parallel are what it took to achieve that goal, it was intractable with then-current programming languages. There are literally billions, if not trillions, of lines of source code written to the POSIX API standard. Do we think that we can transparently map those applications to use CS devices?
Kubernetes and the micro-services architectures come very close, I think, to what is needed and will be the foundation upon which CS devices are adopted and used.
Let’s stay Wild West a while
Overall, I agree that we’re early in the days of CS devices, but I think that standardisation should follow development, not precede it.
Standardisation is inherently and unavoidably, a ‘design by committee’ proposition. I would like to see the Wild West of CS devices exist for a while longer so that our industry can explore all the possible innovations, then standardize on the architecture that works the best.