Claudio Caridi - stock.adobe.com
Vast Data’s ‘Act 2’ builds data management layer on QLC flash
Vast Data offers QLC flash bulk storage with a rapid cache I/O layer. Now, it has embarked on offering big data, analytics and application access to highly available data stores
“The fastest-growing storage company in history.” That’s the claim made by Vast Data, which has announced it has gone from annual recurring revenue of $1m to $100m in three years.
Meanwhile, the company has embarked on what regional director for EMEA, Alex Raistrick, calls “Act 2” of its story, in which Vast plans to continue its growth by offering its own data layer to provide easy visibility for applications, databases and analytics tools (think Hadoop and Spark) and make data available “at exabyte scale”.
“Act 1” is where Vast started with the hardware architecture that underpins this, based on high-density quad-level cell (QLC) flash drives.
Flash technology has evolved from single and multi-level cell (SLC, MLC) NAND via triple-level cell (TLC) – all of which indicate the number of charges in a flash cell – to quad-level cell flash storage. QLC stores four bits per cell and provides 16 possible binary states, which is how it boosts capacity over previous generations.
But there’s a catch. With all those voltage levels packed into smaller volumes of silicon, there is scope for more wear and more things that can lead to data corruption.
To get around this, Vast smooths out and optimises input/output (I/O) using Intel or Kioxia storage-class memory (SCM). It calls this “write-shaping”, in which the SCM handles reads and writes, and sends data to bulk storage in 1GB stripes as is optimal. This way, it guarantees a 10-year lifespan for QLC flash drives.
But, says Raistrick: “We’re a software company using commodity hardware. We add value with software, and use software to drive down the price of hardware. What we are aiming at is giving customers the ability to deploy 30PB, for example, and to be able to gain insight from that data and consume it.”
Backup data stores
That insight could be for use in long-term backup data stores, as a repository for AI/ML and big data analytics, or for security functionality – in other words, secondary data stores, but with requirements for occasional rapid access and/or throughput.
Per enclosure capacities can be 338TB, 675TB and up to 1.3PB with QLC drive sizes up to 30TB.
“Often it is less about latency and more about bandwidth,” says Raistrick. “A large percentage of our customers run GPU compute for HPC.” Average sale is more than $1m and average deployment over 1PB.
Data for analysis
The core idea of Vast Data’s “Act 2” is that lots – and it means lots, up to 100-plus PB – of varied data held in Vast Data storage can be made available to applications and analysis.
Its Element Store is where a potentially unlimited volume of files and objects – the system is multiprotocol – are kept along with their metadata.
Here it is indexed by the company’s “Vast Catalog” over a large range of attributes, and made available to applications, databases and analytics engines via its Vast Data Platform.
The key benefit here, says Raistrick, is that Vast Data Platform makes data easily available and useable to all big data environments and gets around the tendency for it to live in silos.
“Open file formats come with certain trade-offs that can restrict simplicity,” says Raistrick. For example, Parquet can impact performance, CPU usage and compression efficiency of systems that use it.
“Also, Parquet does not support ACID transactions, so users often opt for other file formats like Iceberg to overcome its limitations,” he says. “VAST offers millions of transactions per second with ACID support, so it eliminates the need for users to make an upfront decision on partitions.”
What’s on the horizon for Vast? There’s a cloud story to be told, says Raistrick. Though it’s not suited to all customers doing intensive work with large amounts of data, there is demand for the ability to work across on-premise and cloud, and for collaboration across locations. What’s likely to emerge is the idea of “data that exists everywhere”.