Storage technology explained: Replication vs snapshots and backup

Are replication and snapshots the same? Can you replace backup with replication or snapshots? We look into the key planks of data protection strategy, including in cloud storage

Backups, snapshots and replication are key methods of data protection. We look at how and why they should form part of a comprehensive enterprise data protection strategy.

In this article, we’ll look at replication – the different ways it can be done, how it differs from snapshots, and its various pros and cons. And while we’ll define and examine replication as it’s found in on-premise infrastructures, we’ll also look at use of replication in cloud storage, where customers may want to specify their requirements. 

Are replication and snapshots the same?

Replication essentially produces, as you’d imagine, a replica of a defined set of stored data. It can be a replica of a drive, volume or logical unit number (LUN), for example. What you get with replication is an exact copy. How the variants of replication differ is the mechanism by which they are created and whether that replica arrives almost immediately or maybe just eventually.

A snapshot is quite different to a replica, because for snapshots to become a usable replica some sort of rebuilding process has to occur. Replication pretty much creates a useable copy there and then  – with some caveats, as we’ll see. 

Snapshots are literally that. A saved point-in-time snapshot of the state of a given dataset. Then “the snapshot” typically comprises many of those recorded copies of the drive or volume, plus any updates made to it. That would also include deleted blocks that must be reincorporated to create an accurate copy from a specified previous point in time.

Snapshots can be rebuilt and rolled back to pretty quickly. Meanwhile, replication creates replicas that exist as an alternative, usable copy of the source media. 

The simplest example of replication imaginable is a one-off case when, for example, a developer needs a test database to work on. In such a case they can clone an existing production database and do what they want with it in the test environment. That illustrates what a replica looks like, but it won’t reflect any further changes to the source copy, and is also limited in that it’s one specific dataset.

At the other end of the continuum is synchronous replication. In this case, data is written to two or more storage instances as near to simultaneously as possible. That provides a second working copy that can be used for almost immediate failover. Think mission-critical systems where the margin for error or delay is close to non-existent.

Obviously, synchronous replication is costly and demands the best in terms of technical infrastructure and networking.

Can replication replace backup?

Replication cannot replace backup – the two things are quite different, and they should both be used as part of a data protection strategy.

Replication will often be an almost continuous process that creates a near real-time copy. That means it will also make a replica of corrupted or infected files. So, you need backups to provide a version of your data to roll back to.

Replication cannot replace backup – the two things are quite different, and they should both be used as part of a data protection strategy

The key here is cost and how quickly data needs to be accessed – see recovery point objective (RPO) and recovery time objective (RTO). Replication is probably the most costly form of data protection, so it may be that only certain datasets are replicated while everything is backed up.

What is synchronous and asynchronous replication?

In synchronous replication, data is written to the second location as soon as it hits cache in the primary site. When it is received, the second site sends an acknowledgement to the primary site and the host where the change originated. 

Synchronous replication is as close as you can get to writing multiple copies of data as near to simultaneously as possible.

Asynchronous replication acknowledges the host at the primary site when data is written. Then the write goes to the second site, and that is acknowledged back to the primary site. It therefore adds a stage in the process compared with synchronous replication. 

Latency in replication suffers by about one millisecond per 100 miles. For the most critical systems that puts a cap on physical distances, but may be fine for other use cases.

Synchronous replication has more impact on application performance because it demands acknowledgement before the next input/output (I/O) operation can take place.

Asynchronous replication acknowledges locally so the next change can take place, with movement of data delayed. 

The difference, of course, is that in asynchronous replication the two datasets will differ for a longer time than with synchronous. 

An enterprise data protection strategy would aim to use a combination of synchronous replication for the most critical applications or datasets, while less critical data goes via asynchronous. Snapshots could be in the mix too, with the whole thing underpinned by regular backups. 

What is cloud replication?

So far, we have dealt primarily with synchronous and asynchronous replication in on-premise storage arrays and servers.

But many replication options are available for cloud storage. The big three hyperscalers – Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) – all offer replication services for customers that store data with them.

Cloud replication services allow data to be stored in multiple remote locations, potentially very distant from each other for disaster recovery or to enhance availability

AWS offers live replication that also copies metadata, with replication options that can be cross-region, same region, bi-directional, to different storage classes or to different owners, and within 15 minutes of writes or in batch – ie, when required – mode.

Microsoft Azure offers similar services, with built-in disaster recovery functionality as part of the service.

Google Cloud Platform has its Turbo Replication offer, which is also a within-15-minutes replication service.

Cloud replication services allow data to be stored in multiple remote locations, potentially very distant from each other for reasons of disaster recovery or to enhance availability. 

Replication in the cloud is usually carried out via erasure coding, because most cloud storage is object storage and not suited to synchronous and asynchrounous replication as described here for on-premise storage. 

Read more about data protection

Read more on Cloud storage