Chapter 5

How to Choose the Best Synthetic Monitoring Tool

Catchpoint’s guide to synthetic monitoring
Chapter 5

How to Choose the Best Synthetic Monitoring Tool

Synthetic monitoring simulates user engagement on a website or an application to verify its reachability, availability, reliability and performance at all times of the day or night. The term is also synonymous with “web monitoring” and “digital monitoring”, which, in turn, is a more modern term associated with digital transformation initiatives. The greatest value of synthetic monitoring lies in its ability to enable organizations to discover problems long before they are experienced by the end user. The field of synthetic monitoring has greatly evolved over the last few years, while newcomers have also entered the market with basic offerings. So, how do you choose which tool is best? In this article, we have rounded up the most important features that a synthetic monitoring tool should possess to help you answer the question. We have grouped these features into four categories: synthetic monitoring types, analytics capabilities, global reachability, and administration of your chosen synthetic monitoring tool. You can use this guide while navigating the different stages of your purchasing journey or when drafting a Request for Proposal (RFP) to solicit information from a vendor or service provider.

Synthetic Monitoring Types

You can use synthetic monitoring to analyze multiple aspects of your web environment, including the paths that your end users take to reach your applications. When monitoring a web application, it is important to break down the overall application response time into dozens of incremental steps. This helps with discovering the reasons behind any  performance issues. It is also important to account for all the different versions of browsers available, as well as how they behave when rendering your application’s user interface. We recommend simulating a full business transaction through synthetic monitoring to gauge the complete user experience and not stop at the first loaded page. 

Use the following table to gain familiarity with all the types of monitoring options you should expect from an advanced synthetic monitoring tool.

  • User Journey Transaction Testing: User journey transaction testing involves recording or scripting sequential steps to capture a complete web-based business transaction simulating real users, including login and SSO.
  • HTTP Monitoring: HTTP monitoring entails testing the base-page only for availability using HTTP GET or emulating a page with and without executing the JavaScript.
  • Real Browser Testing: In order to truly emulate end user experience, web tests must be performed from various types and versions of browser.
  • Mobile Simulation: This lets you simulate mobile users by platform (Android, iOS), browser (Safari, Chrome) or wireless speed (3G, 4G).
  • SaaS Monitoring: Synthetic tests of mainstream SaaS applications. The monitoring solution should provide out-of-the-box templates or custom scripts for monitoring different workflows.
  • API Monitor: Monitoring an Application Programming Interface (API) is a key requirement for both a single URI and a full multi-step transaction, and should include support for both local and global variables.
  • Ping Test: A ping is used to test the basic reachability of a website or IP address. Your monitoring solution should support ICMP Ping, UDP Ping, and TCP Ping.
  • TraceRoute Test: A traceroute tool shows you every hop sequentially and the total number of hops required. Segmented hop-by-hop network latency measurements should be supported via UDP, TCP and ICMP TraceRoute.
  • SMTP Test: To know if your email server is functioning correctly, you need support for testing of SMTP, POP3 and IMAP protocols.
  • DNS Monitoring: DNS testing needs to provide performance and error data for every step in the DNS resolution process, including IP address, SOA, MX and SRV records, and NS record and root servers. DNS monitoring should also involve tracking all individual servers involved in DNS resolution to help mitigate a DNS attack or outage.
  • CDN Monitoring: Monitoring from as many geographic locations as possible is key to maintaining CDN performance. CDN monitoring will provide insight into server mapping and help identify whether incidents are isolated/regional or system-wide.
  • FTP Monitoring: FTP monitoring entails testing file transfer services or delivery of data via the FTP protocol on TCP/IP networks. A test of an FTP server must exercise user login and issue requests to FTP simulating real users that are accessing your web applications.
  • WebSocket Test: A WebSocket test checks the availability of your WebSocket service (typically used in chats, financial tickers, and gaming applications). WebSocket monitoring also offers the option to send a string message and verify its return data.
  • Test NTP Server: This type of test should verify the NTP server and time that services are available, whether the public UDP port is reachable, and ensure that time offset is accurate.
  • SSL Test: SSL testing involves checking SSL certificate validity, protocol support, key exchange, and cipher strength.
  • MQTT Test: This type of testing allows you to monitor the performance and availability of IoT devices. MQQ testing requires publishing messages to the MQTT broker, listening for MQTT messages, and simulating connection issues.

Synthetic Monitoring Analytics

Once you have achieved full coverage of all the facets of your application environment with synthetic monitoring, it is time to analyze the collected data. To conduct effective analysis, it is important to be able to visualize data trends on a dashboard, set up alerts for latency and outage anomalies, and isolate any performance issues in the data path between your application and its end users. More advanced visualization options include charts such as a histogram or scatterplot, having access to a screenshot, or even a filmstrip, of the user interacting with the page. The sophistication of your tool’s analytics will allow you to shorten the mean time to detect an application performance problem before your customers have the chance to complain.

  • Trends Data: Performance monitoring vendors should offer the ability to compare data over different time periods so that you can easily visualize changes day over day or year over year in order to determine trends.
  • Waterfall of Granular KPIs: Waterfall data should capture standard HTTP metrics for all requests on a page, and include page-level metrics such as Render Start, DomComplete, and OnLoad.
  • Data Aggregation: Data aggregation allows you to roll up raw data into minimum, maximum, average, median and 95 percentile groupings so that you can pinpoint any variations over time.
  • Data Aggregation by Meta-Data: This feature allows users to group data by specific dimensions, such as node, ISP, user-defined tags, city, hour of day, error code and so on.
  • Benchmark Comparison: This functionality allows users to compare performance across multiple websites to analyze how they stack up against competitors and industry benchmark standards.
  • Grouping by JavaScript: Grouping by JavaScript lets you load latency by first-party (your code), third party (e.g. your site’s ad service) or designate it with a tag.
  • Scatterplot: A scatter plot shows each sample measurement as a single point on a graph according to its value and collection time which in turn helps identify outliers as a glance, especially when the points are color coded.
  • Cumulative Distribution Function (CDF): CDF helps determine the probability of a poor end user experience by latency (e.g. 33% of users experience latency of 10 seconds or more) and answers the question, “Where does my performance long-tail start and end?
  • Histogram: A histogram or frequency distribution shows how often an end user may experience a long latency (e.g. 20% between 1 or 2 minutes) and consequently a slow page load.
  • Advanced Visualization: Data visualization features such as Bubble Chart, Grid Chart and Heat Maps allow you to better analyze and correlate data.
  • Dashboards: Any analytics dashboard should be customizable, support AI-based smart-boards, offer heatmaps, and be shareable via public links so that any relevant employee can access it.
  • Reports: A dashboard helps you in real-time while reports combine more than one test, helping you automate executive updates, perform trend analysis, or manage your SLAs.
  • Alerting: Monitoring your website and application must include a comprehensive alerting system. Alerts allow you to monitor all alert events triggered in the system. Alerting on raw or aggregated data should be configurable by threshold, transaction step, percentage of failed checks, trend changes and/or transient incidents.
  • False Positive Reduction: False positive reduction entails the suppression of false alerts using techniques such as generating alerts only after a specified percentage of concurrent or sequential checks fail.
  • Fault Isolation: This feature allows you to isolate issues down to all elements of the service delivery chain, including the wireless provider, ISP, last mile provider, CDN provider and DNS provider...
  • Cached Versus Uncached: This feature lets you measure the delay experienced by end users when the elements being loaded have not been cached (e.g. by a CDN).
  • Service Level Agreement (SLA) Management: This critical feature analyzes your monthly service uptime and latency to determine whether your SLA was met outside of any pre-agreed maintenance windows so that you can hold your service providers to account.

Synthetic Monitoring Reachability 

Guaranteeing the reachability of your digital services is one of the most important strategic aspects of synthetic monitoring. Your service may be perfectly reachable from AWS within the West Coast of the U.S., for instance, but not from a wireless network in Eastern Europe. The only way to ensure your service can be reached by your entire market is to choose a vendor that has heavily invested in deploying hundreds of nodes around the world as strategic vantage points. Nodes may be located in data centers owned by local Internet Service Providers (ISP), within the wireless coverage of major mobile service providers, in the backbone of the Internet, or even in the Local Area Network (LAN) where your application is deployed. Your ability to isolate a performance fault depends on the number and location of nodes you measure from.

  • Public Cloud Nodes: Cloud is central to the application delivery chain and huge amounts of traffic flow to, from, and between cloud service providers. All synthetic monitoring providers implement vantage points, or nodes, on public clouds (e.g. AWS, Azure, Google Cloud). Some recent providers do so, however, mostly because it makes it easy for them to claim quick global coverage without investing in more node types.
  • Internet Backbone Nodes: Backbone nodes allow you to monitor directly from ISPs, which carry the majority of the Internet’s global traffic, letting you measure performance excluding external network noise. They are critical in helping determine whether performance incidents stem from a problem with the edge or core of the Internet.
  • Last Mile Nodes: To isolate reachability problems, nodes are required in every segment of the service delivery chain, including the “last mile” access network. Synthetic monitoring from the on and off-ramp of local ISPs to the backbone is essential for troubleshooting and localizing specific Internet problems.
  • Wireless Nodes: To isolate mobile access issues, your provider must have nodes co-located with all the major wireless providers in your end user’s regions.
  • International Nodes: For users in Asia, the Middle East, Africa, Australia and South America, you should ensure your provider has a robust set of international nodes with ISPs local to your market and your customer base.
  • On-Premise Nodes: On-premise nodes allow you to use your own hardware and the provider’s software to implement your own node/s to monitor user experience behind the firewall and identify local issues specific to critical business applications.
  • Detailed Network Monitoring: Troubleshooting of performance outages and slowdowns always requires a breakdown of network latency hop by hop and by Autonomous System Number (ASN). Synthetic monitoring providers should offer the capability for detailed network monitoring.
  • Bandwidth Throttling: Bandwidth throttling allows the simulation of a mobile access point from a slow wireless network or an area with a weak wireless signal.

Synthetic Monitoring Platform

There are a number of attributes associated with your synthetic monitoring tooling that can’t be technically categorized as either monitoring types, analytic features, or pertain to reachability scope, even though they are equally important to the success of your monitoring strategy. The ability, for instance, to manually and instantaneously invoke a test of your CDN by pushing a button may prove extremely handy during an outage while troubleshooting to isolate the issue in a specific data path. Another useful functionality is support for conducting simultaneous tests from multiple vantage points so that you can alert your operations team only if 75% of concurrent tests have failed, helping you avoid false alerts due to transient network issues. Below is a list of further features that are of value to a synthetic monitoring platform.

  • Test Frequency: The frequency of checks should be configurable, ranging from seconds, minutes and hours to allow for maximum flexibility.
  • Data Retention: Storing raw data for a specific retention period (e.g. a set number of months or years) helps you compare data and determine trends over time.
  • Concurrent Monitoring: Your site needs to be simultaneously monitored from a customizable number of nodes to account for interference from short transient network delays and prevent false positives and unnecessary noise.
  • On-Demand Synthetic Test: The ability to run a synthetic test on demand (either via the UI or API) will help you identify critical problems during troubleshooting to improve web and application performance.
  • Geographic Selection: This useful feature will allow users to select testing checkpoints based on geography.
  • Screenshots and Filmstrips: Having the end user’s interaction with the webpage visually recorded throughout the transaction journey helps complement recorded data for additional context.
  • Custom Scripts: In addition to monitoring standard web protocols, vendors should support the ability to run your own code and monitor custom protocols/endpoints.
  • Script Editor: If you record a sequence in the UI, this feature lets you manually edit the script generated by the recorder so that you can further customize it.
  • Support for Selenium: Selenium is a powerful scripting language typically used in QA which helps avoid vendor lock-in as you create a library of scripts.

Synthetic Monitoring Administration 

Finally, do not underestimate the importance of ‘managing and handling your synthetic monitoring tools. The features described in this section are meant to save you time as the tool’s administrator, such as the ability to receive notifications in Slack instead of email or the option to export data in a specific format, such as CSV. Typically, it is important for administrators to have access to a REST API, for instance, to programmatically integrate the performance monitoring tool with existing automated DevOps processes. Another helpful administrative feature is Single Sign-On (SSO) to integrate with your directory services and avoid laborious manual access management. You must also consider compliance and security measures for safekeeping of your data and appropriate training of your team of operation engineers.

  • Alert Notifications: Support for email or text notifications is a basic feature of most alerting tools while the option for native integration with DevOps tools like PagerDuty, Splunk and Slack differs across vendors.
  • Maintenance Windows: Maintenance windows allow you to stop monitoring and/or alerting during planned maintenance periods to avoid false alerts.
  • Support for Webhooks: Webhooks are messages formatted to be programmatically consumed. They may be used to extract raw sample data or provide a convenient way to integrate with generated alerts. Some vendors will provide support for them.
  • Export to Excel: Export to Excel provides a useful option if you want to slice and dice your data in a spreadsheet or if you simply wish to save a copy for future reference.
  • API Support: APIs are necessary to programmatically extract data from your provider’s platform or to integrate it into operational processes.
  • Security: Your vendor needs to be able to ensure a secure platform with encryption for critical business data, such as credentials used in simulation scripts.
  • Single Sign-on: For a large team, it is helpful to integrate your tool with your directory service enabling single sign-on to avoid the burden of separately managing user access.
  • User Teams: Your SSO integration works best if your monitoring tools support the notion of user teams at varied permission levels.
  • Access Privileges: Some users may only need access to view results, but not the ability to change integrations, receive alerts or view dashboards, for instance access privileges allows you to determine the level of appropriate access for each individual user.
  • Sub-tenants: If you have multiple business units or application owners accessing your performance monitoring tool, you can set them up as sub-tenants.
  • Mobile App: A mobile app will allow you to log into your monitoring tool while you are out and about and not waste valuable time diagnosing your website or application problem.
  • User Analytics: It is helpful for administrators to be able to see user analytics at a glance, for instance, to track which users are frequently visiting specific pages or to trace any configuration changes.
  • Customer Support: Your overall customer success depends on the vendor support team. Do they have high domain expertise and are they available 24x7?
  • Script Migration Support: If you are using existing scripts such as Selenium for QA testing, you will want to migrate them to your performance monitoring tool. Some vendors provide this kind of support.
  • Training: It is helpful for all training materials to be available online as a self-service option, but also be available as custom training for a group of operators.
  • Customer Success Manager (CSM): Customer success managers proactively help customers with regular implementation, updates and configuration reviews, in addition to providing any required platform support on an ongoing basis.
  • Certification: If you have a large team, it can help to certify your team on your performance monitoring tool, especially since turnover will likely impact your team’s knowledge base over time. Certification provides a useful record of training in addition to boosting enthusiasm and company buy-in to the tool.

Conclusion 

The field of synthetic monitoring has attracted many newcomers in recent years who have typically avoided the heavy investment required to deploy a worldwide network of monitoring  nodes across all types of network. This investment in a holistic monitoring strategy that encompasses a wide network of distributed vantage points is essential for determining performance issues at every segment in the path between your application platform and your end users. Many vendors in the synthetic monitoring space rely solely on public cloud providers to project an impression of global coverage while ignoring the Internet Service Providers (ISP) used by your customers to access the Internet, the various points along the backbone of the Internet, and the wireless antennas of mobile operators. 

The success of your performance monitoring strategy depends on the test types offered to address every use case, the sophistication of the analytics that can be applied to the raw data collected during monitoring, and the management features that will make your life simpler as a tool administrator. We recommend asking the hard questions of your synthetic monitoring tool upfront before finalizing your vendor selection.

WE'RE EXPERTS!
Catchpoint could be your Synthetic Monitoring solution.
Detect performance issues from over 850 global monitoring nodes with complete visibility into your end-users’ data path including DNS, CDNs, APIs, local ISPs, backbone, cloud, last mile, enterprise and wireless nodes.
Find Out How
Continue Reading this Series
Back To Top