AWS outage shows vulnerability of cloud disaster recovery

The recent outage at Amazon Web Services has highlighted the risks of using any public cloud for disaster recovery IT

Cliff Saran, Managing Editor

Published: 06 Mar 2017 9:30

The recent outage that affected Amazon Web Services (AWS) users highlights the risk of running critical systems in the public cloud.

The general consensus is that the public cloud is superior to on-premise datacentres, but AWS’s outage, caused by human error, shows that even the most sophisticated cloud IT infrastructure is not infallible.

Many organisations have reaped the benefits of the reliable, elastic IT infrastructure available from public cloud providers such as AWS and Microsoft Azure.

The sums add up in terms of paying for computing workloads only when required, and this has made infrastructure as a service (IaaS) a good candidate for companies to deploy their disaster recovery (DR) infrastructure. But the AWS outage raises doubts over this strategy.

Jon Forster, consulting senior IT adviser at Moray (Fitness First Group), said: “It is a very interesting situation to run everything in the cloud and your disaster recovery also runs in the cloud.”

Clearly, if both the live system and DR run in the public cloud and that service is offline, the organisation’s business continuity plan will fail.

One approach could be to invest in two completely separate cloud providers, with separate network links to each.

But Forster believes this is impractical. “The difficulty in running on two providers’ clouds is how easy is it to move from one to the other, because they run fundamentally different technology,” he said.

So realistically, to run the same application across AWS and Azure, an organisation needs to ensure that no unique services of either cloud platform is used.

The other approach is to go back to old-school, pre-cloud IT and run DR on-premise as a reverse cloud business continuity plan.

Forster added: “Failing to on-premise IT is a reverse of the classic way to do cloud DR. Normally, you would operate on-premise, then fail to one of the clouds.”

Backup networking

Forster said he had been working with Fitness First’s Australasia business to improve backups by using Azure. “We are paying for the ExpressRoute pipe from our Singapore site into Azure,” he said. “The bigger the pipe, the more you pay. But you don’t need to pay for it all the time. You will probably only use ExpressRoute for a two-hour backup window every 24 hours.”

He said the Australasia business had been working with its local telco provider, Telstra, on a means to throttle back the ExpressRoute connectivity outside the backup window. In effect, a script runs that provides bandwidth on ExpressRoute only during the backup window, which is a potentially huge cost saving for Fitness First.

There are many choices and configurations for hosting on-premise or using the cloud, but from a business continuity perspective, “best practice is to work through who your users are and where they will be using the applications”, said Forster.

Fitness First hosts its core system on-premise on Nutanix infrastructure hosted in an Equinix datacentre, but it uses Microsoft Dynamics on Azure for its software as a service (SaaS)-based sales application and runs DR in the cloud.

“The benefits are not just cost,” said Forster. “You have to look at latency. This is why we host the core systems on–premise. It runs over our network because it is very important that when someone enters a Fitness First gym, there is no delay. They swipe [their membership card] and enter.”

The company’s fitness app is Azure–hosted, which means it can be accessed by the public anywhere, any time. Although access over an internet connection may be slow, unlike the membership system, it is not considered mission-critical, so the firm is happy to run it in the public cloud.

AWS outage shows vulnerability of cloud disaster recovery

The recent outage at Amazon Web Services has highlighted the risks of using any public cloud for disaster recovery IT

Read more about cloud outages

Backup networking

Read more on IT architecture

12 core Azure networking services you need to know

Cloud DR from the Big Three: Who’s best at what?

JetStream Software carries DR to Azure VMware Solution

Is SCCM in Azure right for your organization?