weerapat1003 - stock.adobe.com
AWS outage: Datacentre power cut knocks ‘hundreds’ of internet services offline
A year on from Amazon’s S3 outage at its US-East-1 datacentre region, a power loss incident in the same place has caused a fresh round of service disruption for the cloud giant’s customers
A power outage affecting one of Amazon Web Services’ (AWS) largest US datacentre regions reportedly knocked hundreds of online services offline across the world on Friday 2 March.
The cloud services giant confirmed that its US-East-1 region suffered two separate power loss incidents over the course of two hours in one of the site’s network peering facilities, each one lasting about 10 minutes.
As a result, organisations that rely on that region to host their applications and workloads “may have experienced internet connectivity issues”, said AWS in a statement on its services status page.
“Our network is designed to be fully redundant with multiple independent peering facilities in every region,” the statement continued. “Some customers experienced elevated latency and packet loss while the network rerouted affected traffic to these unaffected network peering facilities.
“Some packet loss was also observed as we restored traffic to the affected network peering facility.”
Computer Weekly contacted AWS for further details about Friday’s outage, but had not received a response at the time of publication.
According to an analysis of the incident by networking monitoring company ThousandEyes, more than 240 “critical services” that run on the AWS infrastructure suffered a disruption because of the outage, including Slack, Twilio and Atlassian.
According to reports, the incident also blighted US-based users of Amazon’s voice assistant technology Alexa, as well as organisations that rely on the firm’s Direct Connect service to obtain a private connection between their datacentres and the AWS cloud.
Read more about outages
- Cloud services giant says an input error by an engineer is what led to large numbers of users being unable to use its cloud storage services for several hours on Tuesday 28 February.
- The Amazon cloud storage outage provides a neat reminder about the role humans continue to play in the delivery of online services, but – when things go wrong – end-user sympathy for the plight of the engineers involved is often in short supply.
“The AWS-East region is one of the first AWS [datacentre] regions and is, hands down, their largest, with at least five availability zones,” wrote Archana Kesavan, senior product marketing manager at ThousandEyes, in a blog post. “What started as a power outage impacting a small set of services quickly cascaded into a major event.”
News of the outage comes nearly a year to the day after Amazon’s Simple Storage Service (S3) suffered an outage that led to widescale disruption across the internet, after an engineer incorrectly executed a command at the same AWS datacentre region that led to an unspecified number of servers falling offline.
This latest incident serves to highlight just how complex and interconnected the services that run in the public cloud are, said Kesavan.
“Outages and natural disasters in one part of the cloud can quickly ripple over into other areas,” she added. “Cloud vendors offer several ways to directly connect into their infrastructure. However, they do not make you immune from the external dependencies of the internet.
“While availability zones offer some level of redundancy, regional outages like these can quickly envelop entire clusters of datacentres.” ................................................................................................... .........................................................................................................................
Read more on Datacentre disaster recovery and security
-
At CloudWorld, Oracle completes hyperscaler trifecta with AWS
-
Use site reliability engineering to address cloud instability
-
UK heatwave sparks cooling system meltdown in Google’s and Oracle’s London datacentre regions
-
Cloudflare confirms outage caused by datacentre network configuration update error