An opportunity to redesign computer architectures

Over the last 70 years business computers have evolved a standard model of computing based around a central processing unit (CPU), main memory (RAM) and data storage. In effect, data is fetched from storage, copied into RAM, where the CPU runs a program to perform data manipulation tasks.

Copying data back and forth between RAM and storage is computationally inefficient, so system designers use memory caches to store frequently used data in RAM. This avoid unnecessary input/output (I/O) operations.

In the age of big data, in-memory databases attempt to overcome the I/O limitation by pushing as much data as possible into memory. But there are now a handful of companies looking at whether it makes more sense to go the other way, and have data processing run directly on storage devices. In doing so, these devices avoid having to tie up the CPU with the task of copying data back and forth to the storage device.

This is not as radical an idea as it seems. Storage controller hardware does include firmware that runs tasks like encryption, decryption and deduplication. Offloading tasks from a CPU to a coprocessor is not a new idea either. Graphics processors (GPUs) are routinely used to data processing tasks that can take advantage of the highly distributed computing made possible through hundreds of processing cores on a modern GPU. This makes them ideal for applications like machine learning and AI inference engines.

Putting processing closer to data

Can a computational storage device (CSD) provide similar efficiency gains for data processing?  Xilinix and Samsung have developed a smart SSD, which uses a field programmable gate array (FPGA) chip to accelerate certain data processing functions. One example where this is being used is for anomaly detection, where computational storage is used to scan 25 Tbytes of data in just 25 minutes.

Such functions are very application-specific. NGD Systems has developed a CSD that uses an ARM processor, which means it can run the Linux operating system. This makes it possible to program more general purpose functions into a CSD.

It’s very early days in the evolution of computational storage. Existing enterprise storage providers do not see many indicators that it is set to take off. But data is exploding, and needs processing. Given the prospect that Quantum computing will disrupt classical computing, CSDs may well offer a way to satisfy voracious data appetite of a quantum computer.