Arjuna Kodisinghe - Fotolia
VMware’s vSAN 8 gets a rebuild for the solid-state storage era
At VMware Explore 2022, we interview VMware’s John Gilmartin about the new, faster vSAN, plans to use DPUs, the vSAN+ cloud offering, and VMware Ransomware Recovery
Key among the announcements made at VMware Explore 2022 was the launch of version 8 of vSphere and vSAN. With that backdrop, ComputerWeekly.com’s French sister website, LeMagIT, met with John Gilmartin, VMware’s senior vice-president and general manager in its cloud infrastructure group, responsible for storage product updates.
vSAN is an extension of vSphere that virtualises server disk into a global storage pool for use by virtual machines. In this way, it forms the basis for a type of hyper-converged infrastructure.
In this interview, Gilmartin explains VMware’s enhancements to vSAN and the functionality customers should expect, as well as answering questions about other launches at the event, including Cloud Flex Storage and VMware Ransomware Recovery.
You claim vSAN 8 is faster than previous versions. Can you explain the technical changes behind this?
Gilmartin: We have fundamentally redesigned the way data is handled, that is, by the vSAN engine and that has brought significantly better performance, almost 4x faster. But there is also greater efficiency, with better compression, better snapshots. Altogether, it’s a great step forward in version 8.
vSAN was conceived of in an era when data would go on hard drives because SSDs just didn’t have the capacity. That initial architecture optimised performance with cache disks, which weren’t actually that fast either. All the genius lay in our algorithms for sharing data between different tiers and caches.
Now, SSD is the standard and 70% of datacentres have nothing but SSD. So, we optimised vSAN over time for solid state but it wasn’t natively conceived as such, as we were still working with a system of upstream cache and downstream capacity.
In version 8, there there is no cache. All the disks – ie all the SSD – are at the same level. In other words, the whole pool of storage is the cache and access is via the NVMe protocol.
With this reset, we have also redesigned the underlying data structure. From here, access is calculated upstream according to metadata taken from system logs. For example, it is possible to choose the extent of compression and encryption of data by VM [virtual machine], while maintaining global performance for all applications and not just for one.
Elsewhere, we have integrated a real-time snapshots function, which is consolidated gradually by the file system itself. That means it is no longer necessary to pause a VM to back up its contents elsewhere in the storage pool. Mirroring of data happens at the moment of writes. That means also that you can restore very recent data. And all this doesn’t penalise application performance because the snapshot system is not separate, but a part of the file system itself.
In addition to the datacentre version, vSAN 8 is already available as part of the VMware Cloud on AWS offer.
Will vSAN use DPU acceleration as vSphere 8 does?
Gilmartin: No, we’re not doing that at this time. For now, VMware’s work with DPUs is in network security. Of course, we are looking at the possibility to offload storage functions to DPUs and it looks like it would be interesting to do that for third-party external storage connected via NVMe-over-Fabrics. In short, there are lots of possibilities on the table and we are studying them.
Similarly, will vSAN benefit from the new Aria monitoring services announced this week?
Gilmartin: In vSAN monitoring, there is nothing fundamentally new. vSAN is a key component for datacentres. It is currently monitored in vRealize Operations as a resource – of cost, activity, automation – in large multicloud deployments and will still be in Aria Operations. The aim of the Aria tools is to show and visually map the connections between resources.
The SaaS monitoring tools delivered with vSAN are already very performant. They can monitor storage traffic and provide proactive alerting. That said, we are working with the Aria teams to expose vSAN data more proactively in their tools and therefore to anticipate more situations.
You are marketing vSAN 8 by subscription with the vSAN+ offer. What is this?
Gilmartin: The key interest here is to entrust VMware tools with vSAN maintenance. In vSAN+, your environment in your datacentre is connected to cloud services that aggregate metrics and connect your vSAN to backup tools, to lifecycle management and data restore processes. It hugely simplifies the connection between datacentre and cloud services and it is less costly than subscribing to each product separately.
Talking of storage in the cloud, as part of vSAN 8, you have announced Cloud Flex Storage. Is this just another cloud NAS offering?
Gilmartin: This service on VMware Cloud on AWS is based on the same file system as our disaster recovery service, VMware Cloud Disaster Recovery (CDR). This file system comes from technology we acquired with Datrium two years ago. What they did was to build an architecture that exploited EC2 and S3 in a way that minimises storage costs and gets high-performance NAS for the cost of object storage.
However, we want to leave the choice to our customers. We have already announced a partnership with NetApp for similar technology, which works directly with our ESXi hypervisor. Our aim is to work in the cloud with all the storage suppliers’ ecosystems exactly as we do in the datacentre.
The ecosystem is really important to what we do. NetApp has been an excellent first partner for us. It is very important for us to continue to gain new partners, above all in storage, where there has never been a sole product that responds to all needs. To have choice is the best way for customers to be sure that we make our platform global, and that our cloud services are as attractive as possible.
Why have you developed the new service VMware Ransomware Recovery? Is Cloud Disaster Recovery not sufficient?
Gilmartin: First of all, there has been very good adoption of Cloud Disaster Recovery, but all the conversations we’ve had with customers on the subject were less about disasters and more about ransomware. The questions is, how to protect efficiently against ransomware? The answer is not only to do backups, but to treat the problem as an end-to-end one.
To restore activity after a ransomware attack is a very difficult process that requires a lot of tasks. For example, how to choose the best backup copy, how to recover a protected environment and analyse its health, how to redeploy it to my infrastructure. All these workflows need numerous tools and products. What we’ve announced with VMware Ransomware Recovery is a complete workflow in the form of a cloud service. And that’s very practical when you’re in an emergency situation.
If you use on-site infrastructure and you want to use this service, we replicate data and snapshots in the cloud as you would with as-a-service backup in Cloud Disaster Recovery. So, if you have a ransomware incident on site, you can run workloads in the cloud. That’s likely the first thing you’d have to do anyway because your datacentre will be shut down by the authorities to stop the spread of the infection and for investigations to take place.
So, our service restores you applications in the cloud, but in an environment partitioned and protected at the network level by NSX. After your applications and data are restored, our service analyses their behaviour. It allows you to discover which snapshots are infected, but also to choose which applications to run to moderate the use of cloud resources until you can restore everything in your datacentre.
And, concerning storage, we use the Datrium technology, which puts capacity drawn from S3 services into production. So, costs are moderated compared to other systems that restore from ransomware.
Finally, can you tell us about the next projects you are working on?
Gilmartin: We are working on projects in many areas connected to storage, but our main objective is to improve vSAN. Our main axis of research concerns how to further disaggregate datacentre storage. I think that hyper-converged infrastructure has positively improved many operating models around storage.
Nevertheless, the fact of storage capacity and compute capacity being linked is a limiting factor. To resolve this problem, two years ago we began working on something called HCI Mesh. For a number of reasons, we have revised the model. We will soon announce a new evolution of this product with much better performance.