Amazon’s cloud-computing outage on Wednesday was triggered by effort to boost system’s capacity

November 28, 2020

Within a few hours, the malfunctions began hitting customers of Amazon Web Services, the company’s cloud-computing unit. Customers of the Amazon-owned Ring security camera service couldn’t log in or watch video. Users struggled to operate their iRobot vacuum cleaners because the outage affected the iRobot Home app. And media companies, including The Washington Post (owned by Amazon founder and chief executive Jeff Bezos), experienced publishing system outages.

Amazon acknowledged that the system failure was exacerbated by the co-dependencies its various services have on one another. The company had been trying to add capacity to its Amazon Kinesis service that customers use to process real-time data including video, audio and application logs. To resolve the issue, Amazon needed to restart a piece of its system it described as “many thousands of servers,” a lengthy process that had to be done gradually. But because other Amazon cloud services rely on Kinesis, including its Cognito authentication offering, they failed as well.

And because Amazon uses Cognito itself to let customers know about the status of its cloud operations through its Service Health Dashboard website, it couldn’t immediately update that site. The company has a backup method to update the site, but said “it is a more manual and less familiar tool for our support operators.”

An Amazon spokeswoman didn’t respond Saturday to a request for comment about the outage. In the blog post, the company pledged to do “everything we can to learn from this event.”

The failure of its service underscores a danger of only a handful of vendors managing global cloud computing. Amazon held 45 percent of the global market in 2019, according to the market research firm Gartner. In addition to Ring and iRobot, Amazon’s customers include Netflix, BP and Capital One, all of which run significant pieces of their computing operations on AWS.

Source: WP

Amazon’s cloud-computing outage on Wednesday was triggered by effort to boost system’s capacity

Related post

Organized crime blamed for nationwide scourge of shoplifting, smash-and-grab flash mobs

Legal contortions keep President Biden’s unconfirmed labor secretary on the job

As AI booms, fear spreads that Biden is undercutting U.S. industry

Consulting firm McKinsey and Co. agrees to $78 million settlement with insurers over opioids

Houthis show no sign of ending ‘reckless’ Red Sea attacks as trade traffic picks up

Embezzlement of Oregon weekly newspaper’s funds forces it to lay off entire staff and halt print

Israeli who joined fight against Hamas, indicted for impersonating soldier, stealing weapons

South Korean opposition leader attacked, stabbed in the neck by an unidentified man

Powerful earthquakes in Japan leave at least 8 dead, destroy buildings

Olympic host country France sees less New Year’s Eve disorder as it celebrates 2024’s arrival

Japan lowers its tsunami warning but still tells people not to go home after a series of earthquakes