This summer has seen a series of outages and performance degradations from some of the world’s most widely used CDNs, including the June 8, 2021 Fastly outage (owing to DNS or configuration issues) and an Akamai outage on July 22, 2021 (also likely caused by DNS failure).
Performance Issues Creating Problems at Akamai
On Tuesday August 31st, we saw problems at Akamai due to a performance issue. That’s right, it was not an outage that showed failures or error pages for end users, but a problem which slowed down the performance of the websites that Akamai supports. You can only imagine what the folks at Akamai were going through during the incident, along with everyone else affected. As we’ve seen this summer, however, it happens to all of us.
At Catchpoint, we’re obsessed with user experience and so as with any incident, we want to know what happened, why and its impact – so that we can all learn from each other's mistakes and deliver better experiences on the Internet.
The Four Pillars of Digital Experience Monitoring
Performance is as important as the availability of any website. That's why at Catchpoint we believe in the following as the four pillars of digital experience monitoring:
The Three Network Components When Using A CDN
When a customer is using a CDN, there are three network components involved:
- User to Edge, which represents the connectivity between end users and the CDN network.
- CDN Network, which represents the connectivity between edge servers within the CDN network.
- CDN Edge to origin, which represents the connectivity between the origin and CDN network.
![Image showing the three network components in a CDN](https://assets-global.website-files.com/5babb9f91ab233ff5f53ce10/6130b21a3777b714c8b60456_Akamai image 2 .png)
While static resources are usually cached and served from the edge, dynamic requests are routed through all three network components. Most monitoring solutions can only provide visibility into performance or availability issues for user to edge or CDN edge to origin. Catchpoint is the only synthetic monitoring solution that enables businesses to detect performance issues within the CDN network itself.
Major Websites Hit By High Response Times
Starting at 6AM ET on August 31, we noticed that various websites around the world were having high response times. Sites affected included Expedia, the National Bank of Canada, AT&T, and Discover.
On further analysis, we found that the response time was directly impacted due to high wait times (i.e., Time to first Byte). High wait times commonly occur due to the following reasons:
- High server think/processing times
- Issues in load balancers/gateways
- Large backend processes
- Server-side cache miss
- Serve resource utilization
High wait times can also occur due to high latency within the CDN network or between the user and the edge.
![Graph showing how the increase in the CDN network response time impacts the end user.](https://assets-global.website-files.com/5babb9f91ab233ff5f53ce10/613091575757bdd7bc773fb3_Akamai image 3.png)
Performance graph showing how the increase in the CDN Network response time impacts the response time for the end user.
![Graph showing the Akamai degradation specific to the CDN network.](https://assets-global.website-files.com/5babb9f91ab233ff5f53ce10/61308aa7098916bd9c2ebc78_Akamai image 4.png)
Performance graph showing the issue specific to the CDN network while the origin did not have any problems.
503 Error Codes
We also noticed availability drops as a result of 503 error codes from Akamai’s ghost server. In this scenario, the request was trying to hit the origin server, but the request failed at the Akamai Edge network.
![503 Error Code From Akamai Edge Server](https://assets-global.website-files.com/5babb9f91ab233ff5f53ce10/6130916d0c35885fa7173538_Akamai image 5.png)
503 Error Code From Akamai Edge Server
Causes Of Latency In CDN Networks
It’s not yet clear what was behind the performance issues on Tuesday, but there are various potential causes for latency within a CDN network. These include:
- Routing between the CDN edge servers, which are managed by the CDN provider. If there are any routing issues between the edge servers, this could impact the time it takes to process incoming requests.
- The edge servers are overloaded or there is a surge in inbound traffic.
- There is a wrong CDN configuration.
- Issues with third-party solutions utilized by the CDN.
Be Prepared To Act Quickly
With outages and latencies, it's not a question of "if" they will happen, but "when” and “how.” Although you can't avoid them, you can be prepared for “what” to do when they do occur.
You are using a CDN to be as close as possible to the end user so that you can be faster, no matter their geography. To detect whether your CDN is impacting your users, you must monitor from closer to where your users are – so that you can identify and act on any such performance issues.
This is where Catchpoint can help. We have the industry’s largest global active monitoring network, meaning that we can provide you with the fastest detection of any issues your providers might introduce that have an impact on end user experience.
For further information on major incidents in 2021, please check out our new report. You’ll find detailed analysis, as well as a checklist of best practices to prevent, prepare for, and respond to an outage.