voyager624 - Fotolia
Data protection, backup and replication in the age of the cloud
A data protection and backup strategy has to take account of the expansion of the sphere of IT, from the datacentre to the public cloud to devices and locations at the edge
Backup and data protection is more important than ever as we move towards an IT ecosystem that is increasingly dispersed in this era of the cloud.
But as in any IT landscape, a solid backup has to be based on auditing the data that needs to be protected and the processes that can be used to secure that data.
In this article, we look at how to do that in the current IT landscape – shaped as it is by increasing use of the cloud – and how current solutions are solving the data protection conundrum.
Data protection scope
Data protection covers a wide range of scenarios that include:
- Corruption – Software or applications inadvertently changing content
- User error – Users accidentally deleting data
- Hardware failure – Media issues, server crashes and other related issues
- Hardware loss – Outages caused by fire, flood or theft
- Malicious loss – Deliberate acts of data deletion or access denial, like ransomware
While most of these scenarios relate to private datacentres, the increasing use of public cloud means these environments have to be protected too.
So, for data protection we have to consider on-premise applications as well as those running as cloud software platforms like Office365 or SalesForce.com.
Public cloud services like these do not backup data by default other than to recover from system failure, so getting emails back after deletion is the data owner’s responsibility and so must be included in a data protection plan.
With so much infrastructure accessible over the public internet, IT organisations also need to think about DLP – Data Loss Prevention. Or, probably more accurately, data leakage protection. We’ll discuss this later when talking about security.
Meeting service levels
Data protection looks to meet the needs of the business by placing service level objectives on data protection and restore. In other words, recovery objectives drive protection goals.
The two main measures are RTO (Recovery Time Objective) and RPO (Recovery Point Objective). RTO determines how quickly data and applications can be restored to operation, while RPO defines the amount of data loss tolerable. An RPO of zero, for example, means all data up to the point of failure must be restored.
Replication vs backup
Implementing data protection means using a range of available techniques.
Replication describes the process of creating multiple redundant copies of data, with the aim of at least one copy surviving any disaster scenario.
Typically, replication meant using array-based synchronous or asynchronous copying. But, ow we can copy at the hypervisor and application layers. Database platforms already offer replication via tools like log shipping. NoSQL platforms offer eventual consistency replication.
Remember that replication alone is not enough to provide full data protection. Synchronous replication, for example will replicate data corruption issues. And asynchronous replication may not be fully up-to-date with recent application updates.
Eventual consistency
Eventual consistency also applies to dispersed storage like object stores.
Where low latency isn’t essential, data can be geographically dispersed using erasure coding algorithms and asynchronous replication that occurs in the background – so called eventual consistency.
One of the main benefits of using erasure coding is the ability to recover data from a subset of the protected content without having to create full additional copies.
An erasure coding scheme could protect, for example, from the loss of any one datacentre across four locations with only a 25% increase in storage capacity. This can enable multi-cloud object storage protection schemes that can span multiple vendors and on-premises datacentres at the same time.
Snapshots and CBT
Most modern backup platforms use some kind of snapshot or changed block tracking (CBT) solution.
Snapshots offer a point-in-time image of application data, taken at a pre-determined schedule. Snapshots are usually co-ordinated with a pause in application I/O to guarantee data integrity.
Changed Block Tracking provides the ability to access a stream of changed data at source, rather than having to copy an entire data set and deduplicate it on the backup appliance or software.
Read more about backup and data protection
- Applications that run in the cloud are protected, but only so much. For full protection of data generated by cloud-based apps you need cloud-to-cloud backup.
- Backup appliances in scale-out format allow data protection to be scaled easily by simply adding extra nodes. But is it an idea that’s ready for the enterprise?
The most obvious implementation is in virtual server deployments where hypervisors like VMware vSphere give access to a stream of changed data since the last snapshot image was taken. The backup software can then create synthetic backup images (ones built from full and partial backup copies) for future restore.
CBT solutions also work well with hyper-converged solutions, where the stream of changed data is provided by the integrated storage layer. Nutanix Acropolis provides this capability for block storage attached to virtual machines and data stored on Nutanix Files.
CBT and snapshots offer much more efficient processing of backup data, especially across a backup network. Public cloud service providers offer snapshot capabilities within their platforms for block-devices attached to virtual instances. Snapshot image data is moved to cheaper storage for long-term retention.
Cloud
The public cloud brings some interesting challenges and opportunities for data protection.
As discussed above, applications can protect data using snapshots.
The public cloud can be used as a target for data protection and archiving snapshots and are a great target for backup data, with geographic access and built-in protection against loss or failure. There’s no need to think about scaling backup storage as the service provider ensures effectively unlimited scalability.
But, there are some downsides to using public cloud for backup.
First, savings from data deduplication aren’t passed on to the customer. So, backup software needs to deduplicate before writing to the cloud.
Second, issues of performance need to be considered. Restoring from public cloud is dependent on available bandwidth and is often restricted by the cloud provider.
Most vendors now support public cloud as a backup target. In many cases, software-based appliances (which complement hardware appliances, discussed below) can also be used restore data back into the public cloud as primary storage. This means the public cloud can be used as a replacement for a traditional disaster recovery site.
Appliances
A solution that can help use of the public cloud as a backup repository is to deploy an on-premises appliance.
The appliance caches data locally while archiving older snapshots and backups to cheaper media, such as an on-premises or public cloud object store.
Because data recovery usually occurs from the most recent backup copy, archiving to public cloud via an appliance is cost effective and allows IT organisations to meet service level objectives.
Ransomware and security
Application deployment is diversifying to public datacentres and the “edge”, often with smaller datacentres or machine rooms running local services. With such dispersal of compute and data many more systems are exposed to ransomware.
In ransomware attacks, hackers install code on compromised servers or desktops that encrypt local data. Payment is demanded before the data will be unlocked and made accessible.
Vendors are helping to protect against ransomware attacks with easy roll-back on backed up data, but also in helping to identify where a ransomware attack has occurred. This is done by techniques like tracking volumes of changed data during the backup window.
Ransomware is only one part of a wider awareness of security that is becoming necessary with modern data protection. Backup data must be encrypted and safely protected, with minimal audited access to only those involved in recovery.
Building a Strategy
How is data protection changing? What strategies should IT organisations now employ?
Where previously the focus of data protection revolved around shared storage and the hypervisor, today’s protection methods are much more diverse.
As a result, a joined-up protection strategy required implementation of multiple backup processes and a joined-up view on the status of backups across on- and off-premises locations.
This needs to be tied to a strategy on data mobility, because future applications may become much more mobile and need to be restored cross-platform.
Perhaps the biggest challenge for storage administrators and data protection specialists in the coming years will be to ensure that wherever a workload is run, the application data can be made available or recovered in a timely fashion.
So, perhaps the future focus should be on the data itself, to ensure ongoing accessibility that is independent from the platforms on which we store it.