Gajus - Fotolia
Virtualisation and disaster recovery: DR features in VMware
The key areas of functionality in VMware that can help with disaster recovery – including VMware backup, migration, high availability and replication
In a previous article on virtualisation and disaster recovery, Computer Weekly looked at some of the aspects that enable disaster recovery (DR) in a virtual server environment.
As a brief reminder, the key measurements of disaster recovery are: Recovery Point Objective (RPO), that is, the amount of acceptable data loss; and Recovery Time Objective (RTO), that is, the time taken to restore systems back to a normal operating state.
A disaster recovery plan looks to match the RTO/RPO requirements of the application to the recovery capabilities of the technology. In this article Computer Weekly will look at the options available to do that with VMware vSphere virtual machines.
VMware vSphere is a comprehensive virtualisation platform that comprises the ESXi hypervisor, vCenter Server for management and a suite of tools covering data protection, availability and systems management.
Data protection is implemented in vSphere through a number of features and products, each of which delivers layers of data protection to the application. With a wide range of capabilities, vSphere offers RTO/RPO choices for all types of data, including production, test and development environments. The disaster recovery features in vSphere are complemented by a range of third party systems that integrate with vSphere management tools and hypervisor.
VMware backup
Virtual machine backups in vSphere are enabled through the use of VMware vStorage APIs for Data Protection (VADP), a standard feature of all vSphere versions. VADP provides an API that allows VMware and third party backup software suppliers to safely take full and incremental backups of virtual machines, while maintaining both operating system (OS) and application level data integrity.
VADP implements backups through the use of hypervisor based snapshots. An API call to VADP suspends I/O in an orderly fashion for the OS and any supported application running on the guest (through either VSS or VMware Tools support). VADP then takes a snapshot and makes it available for backup. Once the backup process has completed, the snapshot is deleted.
Read more about virtualisation and disaster recovery (DR)
- The fundamentals of disaster recovery are well-established. But there is uncertainty, and even false claims from suppliers, about how the rise of virtualisation affects DR.
- Jon Toigo argues against virtualisation advocates that say the software-defined datacentre, with its high availability and clustering, does away with the need for disaster recovery.
VMware offers a backup application called vSphere Data Protection (VDP) that can be used to take VM backups using VADP. In the current (6.1) release, a VDP virtual appliance can support up to 8TB of backup disk space capacity, sufficient for around 150 to 200 VMs. A single instance of VMware vCenter can support up to 20 VDP appliances. VDP can use any NFS or SAN-attached storage and provides data deduplication capabilities based on EMC Avamar.
Third party support for VADP is available from a wide range of traditional backup suppliers (such as Symantec and Commvault) as well as newer companies like Veeam and Acronis. These companies provide lots of additional value by delivering features that include enabling VMs to be started from the backup media without a restore; extracting application-specific data from a VM backup (Active Directory objects, for example) and to integrate the backup process with snapshot features of external storage providers.
Integration with external hardware suppliers is an important feature that can help to mitigate the performance impact of taking snapshots through the hypervisor. HP recently released Recovery Manager Central (RMC) which allows the snapshot overhead to be offloaded to an HP 3PAR storage array, while retaining the data integrity features of the vSphere hypervisor-based snapshot.
VM migration
VMware vSphere has provided the capability to move virtual machines around the infrastructure for some time. Although migration isn’t strictly part of DR, the ability to move live VMs between hardware platforms can help reduce the impact of some disaster recovery hardware failure scenarios.
Virtual machines in vSphere can be moved around the infrastructure using vMotion and Storage vMotion. VMotion works in environments with shared storage, while Storage vMotion allows VMs to be moved between physical storage systems, including locally connected disk.
With the introduction of vSphere 6.0, VMware significantly enhanced the capabilities of vMotion. The changes included the ability to migrate VMs between vCenter Server instances, support for Long Distance vMotion – with up to 150ms of round trip time latency – and replication of data over a routed layer-3 network (although the VM network still needs to be a stretched layer-2 subnet).
At VMworld in August 2015 VMware demonstrated the capability to live migrate a VM from an on-premises vCenter infrastructure into VMware’s public cloud platform, vCloud Air. The demonstration was a technology preview of Project Skyscraper and is expected to appear in a future vSphere release. Although cloud-based VM migration is being positioned as a tool for optimising and balancing workloads, the technology does provide business continuity capabilities.
High availability/fault tolerance
VMware provides both high availability (HA) and fault tolerance (FT) features in vSphere.
VMware vSphere High Availability automatically restarts virtual machines when a physical server goes down and so provides a degree of business continuity in the event of hardware failure. VMware extended HA functionality with vSphere App HA, which introduced policy-based application monitoring and recovery that applies recovery decisions at application rather than VM level.
VMware vSphere Fault Tolerance extends the capability of HA by deploying a secondary “ghost” machine or shadow instance on another physical server. The secondary image is always powered on and kept up-to-date with the primary and so, in the case of a hardware failure, no outage occurs as the failover process is handled automatically, including creating a new shadow VM image to re-protect the failover copy. Both HA and FT have some technical restrictions and are licensable features.
VM replication
Virtual machine replication can be achieved in a number of ways, including at the hypervisor, in an external storage array or through a VM that provides storage functions. VMware offers vSphere Replication which is asynchronous data replication built into the hypervisor and operating at VM level. The product delivers RPOs from 15 minutes to 24 hours and can be integrated into VMware vSphere Site Recovery Manager (SRM) to provide automated failover capability.
Array-based replication can be used to provide better RPOs than vSphere Replication, where the hardware supports synchronous replication. But, array-based systems aren’t application- or hypervisor-aware and so will appear on recovery as “crash copies” of the virtual machine. In addition, a single LUN or volume on an external array can contain many VMs, creating problems when only performing partial or selective VM failover. Although not supported by any supplier today, array-based replication of VVOLs will fix the issue of VMs being aggregated into groups in a single LUN, providing more granular array based failover.
There are also systems from third party suppliers that implement VM based replication and protection. These systems (such as Hypervisor-Based Replication from Zerto) sit in the data path for each VM, allowing updates to be tracked and replicated to a remote site where the changes are applied to a duplicate copy of the virtual machine. Hypervisor based replication (whether in the hypervisor kernel, or in a VM) is a useful tool for testing the validity of virtual machine backup images and can be more flexible than array based systems in this respect.
To manage the entire failover process, VMware's SRM provides the orchestration layer to automate the failover between a primary and secondary site, and is used in conjunction with either hypervisor based replication or array based replication from an external storage provider.
To further increase resilience, VMware offers vSphere Metro Storage Cluster (vMSC). This feature implements a highly-available stretched storage and hypervisor cluster between two local datacentres using synchronous array-based replication. Many storage suppliers (including HP 3PAR and EMC) can provide support for vMSC configurations today.