pingingz - Fotolia

AWS outage shows vulnerability of cloud disaster recovery

The recent outage at Amazon Web Services has highlighted the risks of using any public cloud for disaster recovery IT

The recent outage that affected Amazon Web Services (AWS) users highlights the risk of running critical systems in the public cloud.

The general consensus is that the public cloud is superior to on-premise datacentres, but AWS’s outage, caused by human error, shows that even the most sophisticated cloud IT infrastructure is not infallible.

Many organisations have reaped the benefits of the reliable, elastic IT infrastructure available from public cloud providers such as AWS and Microsoft Azure.

The sums add up in terms of paying for computing workloads only when required, and this has made infrastructure as a service (IaaS) a good candidate for companies to deploy their disaster recovery (DR) infrastructure. But the AWS outage raises doubts over this strategy.

Jon Forster, consulting senior IT adviser at Moray (Fitness First Group), said: “It is a very interesting situation to run everything in the cloud and your disaster recovery also runs in the cloud.”

Clearly, if both the live system and DR run in the public cloud and that service is offline, the organisation’s business continuity plan will fail.

One approach could be to invest in two completely separate cloud providers, with separate network links to each.

But Forster believes this is impractical. “The difficulty in running on two providers’ clouds is how easy is it to move from one to the other, because they run fundamentally different technology,” he said.

So realistically, to run the same application across AWS and Azure, an organisation needs to ensure that no unique services of either cloud platform is used.

The other approach is to go back to old-school, pre-cloud IT and run DR on-premise as a reverse cloud business continuity plan.

Forster added: “Failing to on-premise IT is a reverse of the classic way to do cloud DR. Normally, you would operate on-premise, then fail to one of the clouds.”

Read more about cloud outages

  • Amazon Web Services cloud storage service experienced technical difficulties in the US overnight, which had knock-on effects for a number of high-profile websites and service providers.
  • Insurance brokers have accused SSP Worldwide of withholding information about the cause of the cloud outages that have blighted its Pure Broking service during three out of four of the past working days.

Failing to on-premise IT would not be an effective plan, said Forster. The business would be required to pay up-front for all the computing hardware, storage and software licences to run the application at potentially its full capacity.

“If you have an application that peaks around three days a month, you have to pay for it running at peak load for the whole month – and if it requires a database like SQl Server, you have to pay extra for the operating system and the maximum CPU usage of the server,” he said.

Fitness First has been a customer of Nutanix hyper-converged infrastructure for over two years. Nutanix runs the company’s membership system, which supports payment and gym membership. However, Forster runs DR for this core business application on Microsoft Azure’s IaaS.

“When you go from an on-premise Microsoft environment to Azure, you can do so because you are fundamentally using the same technology,” he said. In fact, newer versions of the Windows Server operating system are built to support migration to Azure.

“I did a DR test of the payment system, copied it from our Nutanix environment to Azure and failed over to the cloud,” said Forster. “The business did not see any difference.”

Forster believes the most cost-effective and safest approach for companies is to run hybrid IT, where core business systems are engineered in a way that allows them to burst to a public cloud. “This way, you can have the cloud dialled down to run in the smallest footprint, and ramp up when you need it,” he said.

A big selling point of the cloud is that cloud-hosted virtual machines are spun up as and when required. This can be exploited for datacentre backups to the cloud.

Backup networking

Forster said he had been working with Fitness First’s Australasia business to improve backups by using Azure. “We are paying for the ExpressRoute pipe from our Singapore site into Azure,” he said. “The bigger the pipe, the more you pay. But you don’t need to pay for it all the time. You will probably only use ExpressRoute for a two-hour backup window every 24 hours.”

He said the Australasia business had been working with its local telco provider, Telstra, on a means to throttle back the ExpressRoute connectivity outside the backup window. In effect, a script runs that provides bandwidth on ExpressRoute only during the backup window, which is a potentially huge cost saving for Fitness First.

There are many choices and configurations for hosting on-premise or using the cloud, but from a business continuity perspective, “best practice is to work through who your users are and where they will be using the applications”, said Forster.

Fitness First hosts its core system on-premise on Nutanix infrastructure hosted in an Equinix datacentre, but it uses Microsoft Dynamics on Azure for its software as a service (SaaS)-based sales application and runs DR in the cloud.

“The benefits are not just cost,” said Forster. “You have to look at latency. This is why we host the core systems on–premise. It runs over our network because it is very important that when someone enters a Fitness First gym, there is no delay. They swipe [their membership card] and enter.”

The company’s fitness app is Azure–hosted, which means it can be accessed by the public anywhere, any time. Although access over an internet connection may be slow, unlike the membership system, it is not considered mission-critical, so the firm is happy to run it in the public cloud.

Read more on IT architecture