On March 2, 2018 one of our prospective customers engaged their crisis teams when they began receiving alerts that their services were down. The alerts were generated from their sole synthetic monitoring solution from a known APM vendor with nodes only on AWS. The issue wasn’t with their application it was due to an outage at Amazon. They experienced firsthand the noise and false positives that can exist if you monitor only from cloud nodes. Since they were evaluating Catchpoint, we were able to show that yes their services were not working from our AWS nodes but everything was ok from the end users’ perspective where it matters most!
Response time and availability of backbone vs public cloud
One of the first steps when developing a synthetic monitoring strategy is to identify what the objectives and goals are. This will determine what to monitor and where you need to be monitoring from. Companies use synthetic monitoring solutions for various use cases including benchmarking, baselining, validating content, understanding the end-user experience, and monitoring infrastructure performance. If the purpose of synthetic monitoring has anything to do with measuring the performance, availability, reachability, and reliability from the end-users perspective you need to monitor from locations that are in the path of service delivery. This means monitoring from backbone, broadband, last mile, mobile, and enterprise locations.
But, it’s not just use cases related to the end users’ experience. Other scenarios where synthetic monitoring from backbone, broadband, or last mile as opposed to the cloud is necessary include:
- Service Level Agreement (SLA) measurements for services on the same cloud, or services delivered to end users
- Measuring and validating the performance of content delivery networks
- Competitive benchmarking for consumer service delivery products
- Monitoring, alerting, and discovery of network and ISP peering and connectivity
- Availability, performance, and validation of DNS based on geo locations
The primary issue with using cloud nodes for end-user experience monitoring is that end users don’t access websites or applications from within those environments. AWS and Azure are not ISPs, nor do they have the footprint to simulate the latency between geographies and highlight the complexity of the network layer powering the internet.
Monitoring from cloud locations will not help you identify issues with ISPs or see how your application performs for end users.
Synthetic monitoring from the cloud can provide some insights if your goal is to:
- Determine availability and performance related to an application or service from within the cloud infrastructure environment
- Perform first mile testing of your infrastructure without deploying agents in physical locations
- Test some of the functionality and content of your application
- Evaluate the latency of cloud providers back to your infrastructure
Catchpoint recently conducted a study of performance of various websites from both cloud and backbone nodes to help customers see how synthetic monitoring from the cloud should be used only for certain use cases.
For this study we looked at results from our nodes hosted on AWS. Amazon being the largest and oldest cloud computing provider, it operates in 17 geographic regions across the world. Catchpoint offers synthetic monitoring nodes hosted on Amazon in most of these regions.
Approximately 10% of Alexa’s top 10,000 most visited websites today are hosted on AWS and about 5% of the top 100,000 websites are hosted on AWS. These websites are hosted and served from one of facilities where we have nodes. In addition to this, there are hundreds of third-party services that are hosted on AWS and this third-party content plays a major role in the end-user experience of these websites.
Considering the huge number of websites and services hosted on AWS or other cloud services, what would be a realistic synthetic monitoring approach to test these websites, applications, or internet services?
To determine this, we conducted a synthetic monitoring study in December 2017 for multiple websites. We fetched the base HTML of these websites from our Catchpoint AWS nodes as well as Catchpoint backbone nodes running in traditional datacenters like Equinix and connected to internet backbone providers like NTT, Level3, etc.
NBC Sport Response Time AWS vs Backbone – December 2017
ESPN Response Time AWS vs Backbone – December 2017
Instagram Response Time AWS vs Backbone – December 2017
In all the above tests, we see the performance of tests which ran on the AWS nodes are almost always faster compared to the tests that ran on the Backbone Nodes. In most cases the website is faster by a significant margin/percentage which is not a realistic of the end-user experience.
To confirm our findings, we applied this methodology to several other major websites and this is what we found:
In more than 70% of the test instances we noticed tests that ran on AWS nodes are faster than the tests that ran on backbone nodes. This is because:
- Monitoring an application hosted on AWS from AWS is not entirely realistic because there is usually some form of dedicated network connection between data centers.
- Because of these dedicated connections, the network operators will be able to tune their routing policy to determine how to send and receive traffic through adjacent autonomous systems. This practice is known as BGP traffic engineering.
- When the routing must happen within the same entity(AWS) it is in most cases, completely under the control of AWS.
The above theory can be confirmed by the traceroute results of ESPN.com shown below:
Example: Traceroute from Washington, D.C on AWS node
VS a traceroute from Washington, D.C on NTT
We noticed similar patterns across all the numerous top websites that we monitored. We would be happy to share more granular information with anyone interested in the above findings.
As a company, we have decided to make sure we put in our customers hands the widest range of solutions to achieve their goals. Based on this objective, Catchpoint allows our customers to choose the type of nodes for their synthetic monitoring. Catchpoint has over 700 nodes across 75 countries, 200+ cities, 223+ ISPs including backbone, broadband, last mile, mobile, and cloud.
The success of a synthetic monitoring strategy that includes testing, measuring, validating, modeling, and monitoring depends on how the measurements are taken and from where!
Remember reducing “signal to noise ratio” is critical for a successful monitoring strategy deployment.