On June 8, 2021, many of us were left staring at blank screens or “Service Unavailable” errors when trying to access the internet. The panic was shared by millions of people around the world. Everything from Spotify, Amazon, and Reddit to Vimeo, Twitch, and Pinterest was inaccessible to users.
This major outage that impacted any service using Fastly. Here is a quick rundown of what happened and why.
Internet Outage Timeline
The issues began at 05:50 AM EST on June 8 and lasted till 06:45 AM EST. Websites, applications, and services that relied on the CDN provider Fastly were inaccessible, as the CDN provider suffered an outage. The outage impacted thousands of sites and lasted for about an hour.
The scatterplot below shows that the content from Fastly was unavailable globally due to 5xx errors that started at around 5:50 EST.
Fig 1: Failures detected during outage.
Countless popular sites including Twitch, Stackoverflow, Spotify, Pinterest, HBO Max, Hulu, Shopify, Paypal, Reddit, Github, HBO, Amazon, and many more that were hosted on Fastly were impacted.
Below is another look at some of the websites that suffered an outage. These sites were backup and running once the fix was applied by Fastly. However, we do see that it took time for the sites to be up completely, due to errors cached at the time of outage.
Fig 2: Some of the impacted websites.
The end users were seeing 503 Service unavailable and 502 Connection failures. Some users saw a broken page, as cached static content wasn’t loading as expected.
Update From Fastly
Fastly updated their status page in a timely manner, acknowledging the impact across the globe.
The Fastly status page was constantly refreshed to ensure their customers were aware of what was being done and knew when they could expect systems to be fully restored. Transparency was of the utmost importance to Fastly.
Fastly said it had identified a service configuration that triggered disruptions across its servers around the world. The company has disabled that configuration to resolve the issue.
We can see that the impacted sites recovered once Fastly fixed the configuration issues.
Fig 3: Site performance after issue was resolved.
Why Employ a Multi-CDN Strategy?
Fastly was able to quickly rectify the situation while keeping their customers updated, but a lot of users were already impacted. This incident is another one in a long list of CDN-related outages that can be swiftly managed with a Multi-CDN strategy.
Failover options are a must, to avoid situations like this where your website is rendered inaccessible to users. Rerouting traffic to origin servers may help temporarily, but at the cost of performance. It would be better to switch CDNs to mitigate the impact of the outage on the user.
For example, the graph below shows how the outage impacted eBay. When Fastly went down, eBay had performance issues but did not suffer a complete outage.
Fig 4: Ebay site availability over different CDNs.
Conclusions and Advice
Any Multi-CDN implementation would be incomplete without a CDN strategy. It is also vital to constantly monitor CDN providers to catch such outages early on.
Moreover, the incident is a reminder that implementing any change without monitoring the impact of that change can be detrimental to your business. Proactively monitoring your vendors gives you a head start, so you can quickly deploy failover strategies to mitigate any resulting impact on end-user experience.