What cloud risks should business consider after Amazon's EC2 outage?

Amazon's recent outage has raised questions over the reliability of cloud services. If Amazon, with its global reach, cannot get IT right, how can CIOs expect other operators to cope? Jenny Williams investigates.

Amazon's recent outage has raised questions over the reliability of cloud services. If Amazon, with its global reach, cannot get IT right, how can CIOs expect other operators to cope? Jenny Williams investigates.

 

  • Technical problems

 

 

Online retailer, Amazon, has apologised after a network interruption brought down customer sites using its web hosting service, Amazon Elastic Compute Cloud (EC2). The service went offline following a routine upgrade on April 21 2011. It took four days to return EC2 back to normal operations.

Amazon's outage and customer downtime gave new life to old concerns about cloud security, data management and compliance. Cloud services are designed to be highly redundant, meaning they can cope with multiple failures and still maintain service levels. This is one of the key attractions of cloud computing for CIOs, as high levels of datacentre availability can cost millions of pounds to engineer and run.

Gartner recently predicted the cloud computing market will reach almost $150bn worldwide by 2014, expecting an acceleration in adoption by businesses.

The sites affected included Reddit, Quora and Foursquare. Foursquare said in a blog post: "Our usually-amazing datacentre hosts, Amazon EC2, are having a few hiccups [...], which affected us and a bunch of other services that use them."

Technical problems

In a technical statement, Amazon apologised for the outage, explaining it was caused by a network change as part of Amazon Web Service's scaling activities in the east of the US, which disconnected the primary and secondary network. This also made some relational database services (RDS) inaccessible.

"We will be making a number of changes to prevent a cluster from getting into a re-mirroring storm in the future," Amazon added.

But the outage has drawn attention to the disaster recovery, security and compliance risks associated with cloud services.

Compliance wake-up call

Dale Vile, managing director at analyst Freeform Dynamics, says organisations are often left with a gap between the service provider's service level agreement (SLA) and the business requirements.

"Cloud evangelists and purists have the idea that the cloud makes a lot of IT problems disappear. Some problems do. But disaster recovery, security, compliance and data management arguably become harder as it's spread across multiple sites," he said.

"The location of data is very important. Some organisations may be unable to sign-up to a service if the provider can't guarantee location details of its data to meet compliance needs. You can't always move everything to the cloud, which is why there aren't many large-scale cloud deployments."

So if a CIO needs to ensure data stays within the EU, for example to meet regulatory requirements, the cloud service provider is restricted in how it can mirror its datacentres for business continuity. A stalemate may ensue as the services will need to stay in the EU, but the service provider may not wish to build a back-up datacentre to support this requirement.

Vile says traditional hosting contracts often include disaster recovery as a managed service, but utility providers' business models often put the onus on the business to organise its own disaster recovery, back-up and other risk management measures.

Cloud concerns

Amazon's service has had a good track record and few businesses would be unable to achieve the high levels of reliability of EC2, unless they spent vast sums of money on infrastructure.

Andy Burton, chairman of the Cloud Industry Forum (CIF), says the issues are just as relevant for on-premise businesses. He points out that Amazon customers who had subscribed to and were using the full flexibility of the platform capability were not impacted by the outage.

"It is in the architecture, deployment and operation of the services where the disaster recovery capability has to be achieved, and could've been for these customers, had they chosen a different basis of implementation," Burton said.

That said, it is still early days for cloud computing. Businesses are not really looking at techniques such as dual and triple redundancy, where multiple cloud services providers are used to avoid and limit a company's exposure to downtime, just as a network manager would build dual or triple redundancy into their wide area network.

Amazon has now offered all customers a 10-day credit equal to 100% of their usage of Elastic Block Store (EBS) volumes, EC2 and RDS database services. While affected Amazon customers won't be out of pocket, the reputation of the cloud services firm has inevitably been dented.

Forrester Research analyst James Staten writes in a blog post: "You can't afford to pass up the opportunity cloud computing presents for turning IT from cost centre into revenue driver."

While cloud services offer cost-cutting opportunities, the potential gaps between a SLA and individual company legal requirements re-iterate the importance of disaster recovery and data back-up measures to survive any service provider outages in the future.

 

What lessons can UK businesses learn from the Amazon EC2 outage?
 
1. Public cloud is often too risky Using high-risk public cloud infrastructure is not appropriate for organisations that require a business-grade service.
 
2. Get an SLA In the cloud, brand names (even global leading brands) are no substitute for service level agreements.
 
3. Look for local service provisions What UK organisations want is a local provider with facilities that can be visited and touched, including UK datacentres, UK-based support, strong and relevant service level agreements and someone to talk to when they need help.
 
4. Pay the price Although the cloud is unarguably better value than the alternative, UK businesses can still expect to get what they pay for.
 
5. Outline business targets Work with the provider to define the most appropriate services to match your cloud needs to a long term delivery strategy designed to achieve your business goals.
 
Source: Ricky Hudson, CEO at IT services firm Star

 

Read more:

Read more on IT risk management