kwarkot - stock.adobe.com
Vast Data Platform aims at storage everywhere for AI/ML workloads
Vast Data to offer storage with data lake and warehouse functionality built in natively, in anticipation of a huge surge in AI/ML workloads and a need for ever-larger data stores
Vast Data aims to get vaster with the launch of the Vast Data Platform that will provide customers with AI/ML-focused storage infrastructure that is intended to replace existing database, data lake and data warehouse functionality.
In terms of scale and outcomes, it aims at “global learning” and “constant realisation” from the AI systems it will support.
The Vast Data Platform marries its QLC flash-and-fast-cache storage subsystems with database-like capabilities at native storage input/output (I/O) level, plus AI compute functionality focused on continual learning and the ability to link numerous instances into a global grid. It comprises:
Vast Data Store: The company’s storage subsystem range, which has been shipping for some years now. This is built on high-density QLC flash that runs to enclosures of PB capacity, with storage-class memory (SCM) in what it calls “write-shaping”. Here, the SCM handles reads and writes to bulk storage in 1GB stripes to guarantee a 10-year lifespan for its QLC drives.
Vast DataBase: This brings database-like functionality to the fundamentals of how data is stored in Vast. As described by technical sales and marketing lead Jeff Denworth, this “next generation database” adds a tabular access method, with metadata describing how blocks of data on storage media are organised into files, objects and tables. According to Vast, this allows for rapid ingest of data as well as large volumes of query requests.
Vast Data Engine: A containerised, Python-based layer that brings AI processing on top of storage functionality. Here, the engine stores and handles functions and triggers that can, for example, bring the ability to rewrite queries based on the results of existing AI processing. In this, Vast wants to provide a platform that can provide AI learning that acts on the basis of what it had already learned.
Vast Data Space: The extensive geographical grid that brings a customer’s Data Platform instances together in a single namespace across on-premise and all the big three (AWS, Azure and GCP) public clouds. The idea is to create a mesh of computational resources (CPUs, GPUs and DPUs) that can move the data to compute or compute to data according to the gravity of either.
Read more on unstructured data storage
- Unstructured data and the storage it needs. We look at unstructured and semi-structured data and find increasing amounts of production workloads that have their own storage hardware – file and object – needs, including flash.
- Five key points about unstructured data storage on-premise and cloud. We look at unstructured data, the myriad forms of data it comprises and the key storage options available, which include NAS and object storage on-premise and in the cloud.
Vast’s initiative is a bold one. In offering QLC-based flash storage aimed at secondary data it is not alone, with the likes of Pure Storage and NetApp also in the market. The addition of an AI-focused layering of functionality and services on top that is Vast’s push for a distinguishing feature.
“What we aim for is to roll database capability, data lakes, warehouses and storage all into one system,” said Denworth. “Vast DataBase adds a tabular data structure and an SQL interface to what customers are already buying,” he added, regarding the functionality which has existed in Vast products since early 2023 but has only now been given a public airing.
Vast expects structured data derived from AI-driven analysis of unstructured data to be retained and accessed via this tabular format. It also hopes to replace complex AI pipelines made of multiple products with its single stack.
“Consolidation is happening, and you can see Snowflake, Databricks etc.,” said Denworth. “We’re building in integration at the storage layer. We’ve spotted a gap in the market. There’s a dichotomy between where these companies started and deep learning today, and they don’t have the capability to deal with unstructured data.
“We’re not trying to be the new OpenAI. We want to be the glue between the hardware and the application layer.”
Of the product elements announced, only Data Engine is unavailable now, and is planned for release in 2024. “We don’t have the business model fully sorted yet,” he said.