Catchpoint detected an AWS outage earlier today, 25th November 2020. Enterprises that had their applications or services running on AWS US East 1 were impacted. Even some of the AWS products were impacted by the outage.
We noticed Amazon services such as Athena slowing down at 5AM PST followed by intermittent HTTP 500 errors that started around 5:15AM PST.
However, by 5:30AM PST, the fallout from the outage was evident. We noticed drastic performance impact and a major outage started at around 7 AM PST.
Amazon updated their status page highlighting all the Amazon services that were impacted by this outage
Services relying on Amazon services could only wait for Amazon to resolve the issue.
The outage was not limited to Amazon services alone but also impacted enterprises using AWS US East 1 as their data center. However, they were able to mitigate the issue with little or less damage to their end-user experience.
For example, the enterprise below whose origin is AWS was also impacted by the outage. But the impact on their production environment was mitigated as the end users were hitting a CDN service that rerouted traffic to rectify the issue. Changing the origin mapping to an alternate origin to fetch the content helped maintain a better end-user experience.
This ensured the service was available for the end users, however, the end users did notice a drop in page load times.
Several SAAS solutions that were impacted by this outage notified their customers with an update they got from AWS. For instance, the Catchpoint Customer Success team uses GONG, a revenue intelligence platform, was also impacted by this incident and they sent us this communication:
The AWS outage also impacted several ecommerce websites, for example, GameStop’s page load time dropped because of third-party content that was being served from AWS. The scatterplot graph below illustrates the spike in load time during the outage.
The outage probably serves as a quick reminder for organizations to evaluate and verify infrastructure setup, including monitoring and failover strategies, before the upcoming Black Friday.
- Do you have monitoring set up to detect such outages?
- What is your back-up plan in a similar scenario?
- Are you relying on third-parties that may become a single point of failure?