Content Distribution Networks (CDN) have been around for more than two decades, and as per Intricately 2019 report, there are over a million companies leveraging CDN services from across the world. In the chart below, they have compared four popular CDNs but there are several other major CDN providers such as Verizon Media (Edgecast) and Lumen.
Fig 1: Customer distribution among popular CDN providers.
Large enterprises such as Linkedin, Ebay, Walmart, and others have already implemented a multi-CDN approach to power all their applications. Recently, in an insightful webinar with Ebay and Walmart, we discussed CDN best practices and performance tuning tips.
In Fig 1, around 10% of the total CDN users fall in the large enterprise category, which means that around 100K enterprises spend anything between 100K USD to several million USD on CDN services. That is a lot of money that is spent on CDN services.
Enterprises invest in CDN services for multiple reasons. Some of the key ones are:
- Guarantees maximum reachability, availability, speed, and reliability.
- It is cheaper to use a CDN than power up your own regional POPs.
- Better application performance thus better user experience.
- Capacity flexibility.
- Customize & Optimize content based on user location, device, speed, etc.
Enterprises go through a rigorous evaluation process while choosing the ideal CDN provider. Here is a blog that discusses some best practices to consider while evaluating a CDN service. Every CDN has its strengths and weakness. Benchmarking CDNs during evaluation is an important step in the evaluation phase. Usually, they rely on two sources of data to validate ROI:
- The difference in performance with and without CDN measured using a Synthetic Monitoring Tool
- Using RUM telemetry, provided by either the CDN or any tool that the enterprise owns.
If everything goes well, they sign up and have a contract in place. The contract usually guarantees one thing: 100% SLA. Some providers guarantee 100% while others guarantee three or four 9s SLA by others. If they fail to provide the service, then the customer is eligible for a service credit based on the contract terms and conditions.
So far, so good!
The enterprise can sleep better knowing that their performance is in good hands. Technology teams have now offloaded some of their compute and network resources to the CDN. CDNs have many advanced capabilities aimed at improving digital delivery. And they will track SLA breaches.
In case of an SLA breach, enterprises use logs provided by the CDN or some form of Real User Monitoring that is either provided by the CDN or some home-grown solution.
Most enterprises worldwide are satisfied with the value that CDNs bring and are not really aware of the value hidden in the unknown unknowns of CDN Performance that are brought into the picture with the help of Synthetic monitoring.
Uptime VS Real Performance
Last year one of the world’s largest online payment systems reached out to Catchpoint as their global partners and user base was complaining about poor user experience, latency and the revenue impact this was having on merchants.
When I looked at their data, this is what I noticed:
Fig 2: Performance scatterplot.
The 25th percentile of their application load time was 4x faster than the 90th percentile which is abnormal.
We then looked at their CDN mapping dashboard (Fig 3) for deeper analysis. We saw that an object that was downloaded under 100 milliseconds in Germany or US was taking more than 200 milliseconds and around 400-700 milliseconds in other regions. All these were assets that were served from the CDNs cache.
We then focussed on the end user mapping and noticed that users in India were being served from the US or Europe. There were similar patterns in other regions as well.
Fig 3: CDN Mapping Dashboard
We asked the customer to share this data with their CDN vendor, The CDN provider used the data to look at their logs and noticed some configuration issues with mapping. Some changes were then implemented and the results were astonishing:
- 30%+ performance improvements in India (Fig 4).
- 18%+ performance improvements in Brazil (Fig 5).
Fig 4: Performance improved in India.
Fig 5: Performance improved in Brazil.
The uptime SLA of 100% was met by the CDN vendor both before and after the changes were made. But the difference in performance was significant.
The story does not end here. The CDN infrastructure will continue to evolve, their configuration will continue to change and so will the performance. To maintain performance and optimize continuously, an enterprise must monitor the CDN performance 24/7/365.
But this is just one aspect of the overall CDN monitoring strategy. There are several other hidden opportunities which when discovered can help an organization improve its performance significantly without spending an additional penny.
Let’s take a look at another example which in most cases will go undetected even before the CDN logs or the RUM records any data.
One of our customers noticed DNS failures across all properties mapped to their CDN provider.
Fig 6: DNS failures.
The customer escalated the issue to the CDN provider, and they came back with the response below (Fig 7) and eventually resolved the problem.
Fig 7: Response from CDN provider.
These failures are usually never included in the SLA calculation because these are unknown unknowns.
CDN Monitoring is Vital to Maintaining Performance
CDNs are a crucial part of every major enterprise’s digital strategy so not monitoring them properly is equal to leaving 2x money on the table:
- Not maximizing the investments made in the CDN.
- Missed opportunity to deliver great end-user experience and in turn impacting revenue and brand reputation.
- Holding the vendor accountable from an SLA perspective.
The 4 pillars of end-user monitoring play a critical role in any CDN monitoring strategy.
Below is a comprehensive CDN monitoring checklist that we recommend:
- CDN DNS: The DNS resolution process is lengthy when a CDN is introduced and a lot of things can go wrong here.
- CDN Mapping: Ensure the end users are being mapped to the nearest edge.
- Cache HIT vs Cache Miss: Higher cache miss means requests are sent back to the origin. Increasing your costs.
- End User to CDN Edge Latency: Detect latency between end user to a specific edge server or across all.
- Edge to Origin Latency: This usually goes unnoticed, if the enterprise has multiple origin data centers, it is crucial to ensure the users are edge to the most optimal origin DC.
- Multi CDN Monitoring: Ensure the traffic is being sent to the most optimal CDN provider across all regions and quickly steer traffic away from CDNs when things break or get slow.
- Image Optimization Monitoring: Ensure all images are compressed and optimized.
- Monitoring Edge Workers: Measure the time taken to execute an event handler.
Optimal and efficient utilization of the CDN is probably the quickest and easiest way to optimize application performance and to deliver a very reliable service. But it comes with a catch, and the catch is nothing but the problem of unknown unknowns.
Comprehensive Synthetic Monitoring for CDNs is the approach to measure and detect the unknown unknowns.
- Organizations with a reactive monitoring approach deploy a CDN and assume all is well.
- Organizations with a proactive monitoring approach not only focus on deploying a CDN but also focus on continuously detecting and optimizing the implementation.
- And organizations with a transformational monitoring approach – deploy, monitor, optimize, and automate the whole process by steering their traffic to the CDN that delivers the best performance.
Follow the best practices recommended above and you will be surprised by the outcome of comprehensive CDN monitoring.