vaso - Fotolia
Choosing the right disaster recovery for your business
We look at the various options available when implementing disaster recovery, and how much they’re worth
Historically, building and maintaining a disaster recovery (DR) site, while critical to ensure business continuity, was often too costly and complex for most companies.
As Rajiv Mirani, chief technology officer at Nutanix, points out: “It simply wasn’t feasible for many enterprises to pay for the upfront costs and ongoing expenses of maintaining a second site, to be used only in the event of a disaster.”
From a DR perspective, the starting point for most large enterprises is their core IT infrastructure, which is often based on a primary on-premise datacentre or private cloud.
This is then supported by a secondary DR site at a separate geographic location. There, the core and primary systems and data are backed up and replicated, ready for activation in the event of the primary site suffering a failure and no longer being able to serve the business in a reasonable capacity.
Datacentre replication
Clearly, replicating an entire datacentre, with all its equipment, management, cooling and power needs, is a huge expense. Some businesses may require a hot standby, where the disaster recovery centre switches over the moment the primary site goes down. This is the most expensive option.
It can be so redundant that data is synchronised between the two sites, leading to minimal disruption in the event of a failure. Others are warm or on standby, which leads to a certain level of delay before the backup site is fully operational. In July 2019, analyst IDC forecast public cloud spending would grow annually by 22.3%, from $229bn in 2019 to nearly $500bn in 2023.
The analyst firm noted that infrastructure-as-a-service (IaaS) spending, comprising servers and storage devices, will be the fastest-growing category of cloud spending, with a five-year compound annual growth rate of 32%.
These figures illustrate that cloud computing is increasingly becoming a mainstay of enterprise IT. Feifei Li, president and senior fellow of database systems at Alibaba Cloud Intelligence, explains: “Organisations around the world are embracing a range of disaster recovery solutions to protect themselves against hardware and software failures and ensure zero-downtime for their business applications, which are business-critical but can be costly.
Read more about disaster recovery
We examine the key decisions when considering DRaaS. Whether to go full self-service, assisted or managed will depend on what you need to protect and your in-house resource.
We look at some key pitfalls in disaster recovery, such as failing to plan, not testing the plan, not protecting backups, poor communication and neglecting the human element.
“Cloud-native DR provides a cost-effective option for customers to back up data in case of a disaster with a pay-per-use pricing model.”
One size doesn’t fit all
However, one size doesn’t fit all when it comes to computing architecture, cloud and disaster recovery. It is not simply a matter of moving cloud-ready business-critical applications and data to the cloud and hoping that the existing storage architecture provides the right set of services to support it.
Enterprise compute environments differ vastly, and there are many environments that, either through technical reasons or data movement regulations, cannot be hosted or backed up in a cloud environment.
While promising attractive IT economics and easy accessibility, Mirani, says: “Cloud-based DR services come with their own challenges.”
IT companies claim technology is now advanced enough, given the right architecture, for them to offer zero-time recovery. In reality, though, this depends on a number of factors, all of which must be assessed. But first and foremost, organisations must define DR plans that prioritise their mission-critical data and applications, and how much downtime they can sustain before the business begins to suffer.
Enterprise maturity
Recovery point objectives (RPO) and recovery time objectives (RTO) are used in the industry to measure business continuity.
“RPO describes how often I need to protect my data to ensure that I can get back to what I had when disaster or data corruption struck – or as close to that as possible,” says Tony Lock, a principal analyst at Freeform Dynamics.
“It relates to how fast my data changes. RTO is how quickly I must make the recovered data available should disaster strike or a request come in from a user, an auditor or even a regulator.”
Lock says that answering these fundamental questions is simple enough for small numbers of different data sets. However, it becomes complicated very quickly when you have lots of different data sets of varying business importance, all of which may have very dissimilar protection and recovery needs.
Enterprises may be at different stages of cloud maturity. Some are at the planning stage, while others, that have deployed workloads in the public cloud and are comfortable with how to manage their on-premise and public cloud environments, may be on a path to multicloud and hybrid datacentre transformation.
Whatever the level of cloud maturity, there are good reasons to use the cloud as an emergency backup in the event of a disaster.
In fact, according to the Gartner Magic Quadrant for the backup and disaster recovery market, published in October 2019, backup and recovery providers are consolidating many features such as data replication, (cloud) disaster recovery automation and orchestration, and intercloud data mobility on a single platform.
In addition, Gartner’s study reported that backup providers are adding data management functionality to their backup platforms to support analytics, test and development, and/or ransomware detection on cloud copies of backup data.
By bundling these additional services, the backup and disaster recovery providers are looking to deliver a higher return on investment in data protection.
Cloud-based disaster recovery services can be hybrid in nature; they move infrastructure and services between the cloud and on-premise datacentres.
Traditional hardware companies now offer capacity-on-demand and flexible consumption-based pricing, providing managed services wrapped around their cloud or pay-as-you-go offerings.
Traditional data backup hardware (tape backup) and software companies are also building out scalable DR cloud platforms, giving their customers a two-or-more-tier approach to their DR.
Architecturally, the customer’s business continuity and disaster recovery system is located on-premise in the primary and/or secondary datacentre.
Through a managed service offering, the disaster recovery provider manages the backup and replication of data not only locally on the customer’s hardware, but as part of the service. Copies are sent to the cloud, where a mirror copy of virtual servers, applications and data are kept, ready to spin up in the event of a disaster that takes out the customer’s primary and/or secondary on-premise systems.
Hyperscale cloud platform providers, such as Amazon Web Services, Microsoft, Google and Alibaba, have security, redundancy and recovery measures in place that make it very unlikely for them to lose your data – but they are not infallible, says Bola Rotibi, research director of software development at CCS Insight, who warns that many organisations falsely assume data and information stored in cloud applications and services is safe from loss.
“Without a plan that actively addresses protecting critical data stored in the cloud through software-as-a-service solutions in operations, that comfort blanket could just as easily smother an organisation when the light gets turned off,” she says.
Advanced data management
Beyond the basic requirements of using cloud-based, on-premise or a hybrid approach to DR, there are plenty of tools available that can help make general operational IT and DR run efficiently and cost-effectively. These need to be considered alongside the DR platform choices. Such tools are generally designed to help IT administrators responsible for data and backups understand, visualise and manage the lifecycle of all of the data across the organisation, as well as the relationships between different datasets, systems and people that require access.
With an effective data and information management policy and supporting toolset, enterprises are able to have a better view of what data should remain on-premise or on private cloud platforms, and what data can reside in public cloud systems.
Building on this data governance framework, data backup, replication and recovery policies for disaster can be applied to critical and less critical data and their associated applications.
Data discovery
Along with effective data and information management, data discovery can be used to help an organisation understand and regain control over its data.
Data discovery not only helps to mitigate expensive industry compliance penalty charges, but also enables IT administrators to have better insight into the organisation’s data. This helps with cost optimisation.
Data discovery lets IT departments see where their data resides across disparate geographic systems or locations, and classify the criticality of that data. The discovery process can check to ensure the data is compliant with legal or regulatory requirements and corporate data governance policies.
While it may not be seen as part of DR, data discovery has its place alongside data retention policies, cyber security and data loss prevention initiatives, as part of a firm’s data stewardship.
As is the case across many aspects of IT, artificial intelligence (AI) and machine learning (ML) also have their place in DR and business continuity.
Thanks to advances in AI and ML, many routine administration tasks that were previously people-intensive can now be fully automated. “Automation, performance, high availability and security are key differentiators when choosing DR solutions,” says Li.
“For example, many customers prefer virtual machine snapshot backup with high snapshot quota and a flexible automatic task strategy, which helps reduce the impact on business I/O [input/output].”
Disaster recovery automation can run tests on applications, for example, or recover data to another environment for testing or development. Typically, AI is used to track metrics relevant to data backup and recovery, such as a performance statistics, rates of change, speed of access, performance insights and bottlenecks.
The AI dynamically makes changes to the DR system if it needs to be optimised, re-prioritised or modified to improve a desired business outcome, such as a service-level agreement to speed up recovery after a system failure.
In essence, the AI ensures the disaster recovery is running optimally and matches the requirements of the business.
Additional reporting by Cliff Saran.