Amazon services ‘recovering’ as Snapchat and banks among sites hit by outage-BBC

More than 1,000 apps and websites – including banks such as Lloyds and Halifax – were impacted by problems at the heart of the cloud computing giant’s operations in the US, according to platform outage monitor Downdetector.

Many of the world’s largest websites, including Snapchat, Reddit and Roblox, were knocked offline on Monday after a huge Amazon Web Services (AWS) outage.

It said user reports of problems globally had soared to more than 6.5 million during the outage on Monday morning.

Amazon later said it had fixed the underlying problem, but issues for some services persisted, and experts said the outage demonstrates the perils of many companies relying on a single, dominant provider.Amazon has not yet fully detailed what caused Monday’s outage or issued an official statement regarding it.

It said in an update on its service status web page the issue “appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1”.

DNS, which stands for Domain Name System, is often likened to a phone book for the internet.

It effectively translates the website names people use (like bbc.co.uk) into numbers which can be read and understood by computers.

This process basically underpins the way we use the internet, and disruptions to it can leave web browsers unable to locate the content they are looking for.

Matthew Prince, chief executive of Cloudflare, told the BBC the AWS outage highlighted the power cloud services have over how the internet works.

“Everyone has a bad day, today Amazon had a bad day,” he said.

“There are amazing things about the cloud, it allows you to scale… but if you have an outage like this it can take down a lot of services we rely on.”

And Cori Crider, head of the Future of Technology Institute, told the BBC it was “a bit like a bridge collapsing”.

“An essential part of the economy has fallen to pieces,” she said.

And with so much of cloud computing relying on Amazon, Microsoft and Google – estimated at around 70% – she said the status quo was “unsustainable”.

“Once you have a concentrated supply in a handful of monopoly providers, when something like this falls over, it takes a huge percentage of the economy out with it,” she said.

“We should really look at trying to buy more local services, rather than relying on a handful of American monopoly platforms.

“That’s a risk to our security, our sovereignty and our economy and we need to look at structural separations to make our markets more resilient to these kind of shocks.”

One computer science expert says some of the responsibility rests with the companies that use AWS.

“Companies using Amazon haven’t been taking enough adequate care to build protection systems into their applications,” says Ken Birman, a computer science professor at Cornell University in New York.

Outages like the one on Monday occur frequently, although not always at this scale.

Birman tells the BBC that app developers should take care to invest in backing up mission-critical applications that live in the cloud.

“We know how to make these systems stronger, and we know how to do it securely,” Birman says.

The question of responsibility could well land in the courts.

More than a year after the massive CrowdStrike outage, Delta Airlines is still wrangling with the company to recover more than $500m in losses.

Even after CrowdStrike had fixed the issue, the airline said it had to manually reset 40,000 servers, leading to major flight delays over several days-BBC

|

|

|

|

Share:

More Posts

Translate »