Blog Post

How to Monitor the Networks You Don’t Control

Updated

Published

October 17, 2019

mins read

in this blog post

Network engineering and operations teams are currently faced with a paradoxical dilemma. On the one hand, the capabilities for delivering content and services to end users are more robust and diverse than ever thanks to the explosion of third-party infrastructure and services. The adoption of new IT architectural structures such as multi-DNS, multi- and hybrid-cloud, multi-CDN, and others means that when everything is operating smoothly, end users are getting content and services delivered to them faster than ever before.

However, these developments in architecture and digital delivery come with a cost. Every additional layer of the delivery chain adds more complexity, introduces visibility gaps, and reduces these teams’ ability to understand how network health is affecting the end user experience. According to Gartner, “I&O (Infrastructure and Operations teams) is losing direct control of infrastructure and applications.” This unfortunately means that whenever there is a disruption in the infrastructure delivery chain, network teams are often left scrambling as they work alongside infrastructure and application teams to try to identify the root cause of the problem.

The only way to solve this problem – since a reversion to the simple IT architectures of the late 90s and early 2000s is not an option – is to deploy a monitoring strategy that provides complete visibility into the various network layers (as well as infrastructure and application layers). This means ensuring that these widely-distributed architectural components – DNS, CDNs, ISPs, Wireless networks, Cloud providers, BGP routers, and LAN – are all working together seamlessly to deliver content to the end-users.

Catchpoint’s Network Insights is designed to provide everything that network professionals need to keep their digital systems healthy. This suite of capabilities is comprised of four primary capabilities:

DNS Monitoring
Traceroute Monitoring
BGP Monitoring
Endpoint Monitoring

Together, these capabilities provide visibility into different components of the delivery chain, which enables proactive detection and triage of problems that arise within all network layers from Layer 3 to Layer 7 – as well as across application layers such as TCP, UDP, and MQTT. In doing so, Catchpoint enables IT teams to proactively detect and resolve networking issues to prevent users from experiencing service outages, or quickly rule out networking issues during the triangulation phase.

To understand the full scope of Catchpoint’s Network Insights, let’s dive into each of these capabilities in more depth.

DNS Monitoring

Catchpoint offers two different types of DNS monitors: DNS Direct and DNS Experience.

DNS Direct Name Servers tests directly query the Name Servers to provide availability data and ensure the responses delivered are accurate.
DNS Experience tests run recursive queries to resolve DNS (just as recursive DNS servers do) to measure latency, performance, and availability of the various DNS servers in the pathway.

Monitoring DNS servers allows I&O teams to ensure reachability by detecting and identifying issues with their DNS provider(s), and either switch to their backup provider (if one is set up) or work with the vendor to route traffic to different resolvers and/or name servers. DNS monitoring also allows quick detection of DDoS attacks, DNS hijacks, and DNS cache poisoning.

DNS histogram showing Response & Availability

DNS scatterplot showing individual response times

Traceroute Monitoring

Using Catchpoint Traceroute visualizations, IT teams can ensure network reliability with synthetic monitoring visualizations that show the network path, mesh, and cloud / multi-cloud / hybrid-cloud data. This allows network operations to collect data from every hop so that they can detect issues and quickly pinpoint the root cause. Understanding what in the path is introducing latency and/or packet loss helps network teams quickly solve router issues or peering problems to reduce end-user impact.

Catchpoint provides two different types of Traceroute visualizations: a logical Autonomous System (AS) view, and an IP/hop-by-hop view. These show the health of each router in the network path, and can be run via various protocols like ICMP, UDP, and TCP.

AS Sankey diagram

IP / hop-by-hop diagram

Traceroute tests can be run as their own tests, or alongside other Layer 4+ tests like web browser, web transaction, API, DNS, etc.

BGP Monitoring

BGP (Border Gateway Protocol) is protocol that manages how packets are routed across the internet through the exchange of routing and reachability information between edge routers. BGP directs traffic between autonomous systems (AS) – networks managed by a single enterprise or service provider. However, because there’s little to no oversight as to how the roughly 65,000 AS’s that make up the internet are configured, the network is extremely vulnerable (check back next week for an in-depth blog series about how BGP works and the vulnerabilities that come with it).

Changes in BGP routes – either through malicious attacks or human error (e.g. improperly configured AS’s, bad filters, etc.) can negatively affect service availability, directly impacting end users. We saw this back in June, when a BGP leak propagated from a local ISP to Verizon to Cloudflare’s CDN, where it then spread all over the world.

Catchpoint BGP monitoring monitors route health and detects issues such as:

Route hijacks
Policy configuration issues
Route flaps

Unlike other monitoring vendors, Catchpoint’s BGP data is delivered in real time thanks to a combination of public open source data (via RouteViews and RIPE) and private peering sources. So rather than waiting 15 minutes to discover issues with their BGP routes, Catchpoint customers can now be alerted to route anomalies as they happen so that they can detect and remediate quickly.

BGP dashboard

BGP continent view

Endpoint Monitoring

With the explosion of SaaS applications that are critical to business operations such as Salesforce, Office365, Slack, Okta, and hundreds (if not thousands) of others, the need to monitor them is vital to ensuring employee productivity and the flow of business throughout the day. Failure to do so can cripple not just remote employees, but also entire offices if an issue cannot be resolved quickly.

Catchpoint solves this with a dual approach of enterprise monitoring nodes that can be placed within the firewall to detect network, WiFi, and Proxy problems that are causing performance or reachability issues with in-house SaaS applications, as well as Endpoint Monitoring to provide telemetry from employees’ devices. This provides true end-to-end visibility of the delivery chain, from the local network to wherever the application is hosted.

With this combination of synthetic and real user data, internal IT teams can detect issues before employees are forced to file a ticket, and determine whether the issue lies within the local network, the internet (i.e. external networks), or the SaaS vendor.

All Your Monitoring Needs in One Place

With the release of Network Insights, Catchpoint is now the only monitoring vendor that provides synthetic monitoring of the full stack from Layer 3 to Layer 7. By offering network data side-by-side with application data and endpoint data, customers can detect and identify issues within any layer of the network so that they can identify the root cause(s) and fix accordingly.

This is especially valuable for our customers with global footprints and complex architectures (e.g. multi-/hybrid-cloud, multi-CDN, etc.), as it provides insight into network health around the world. This is powered by the largest and most diverse synthetic testing infrastructure in the industry, with over 800 different locations across backbone/broadband networks, cloud providers, wireless networks, last mile networks, and enterprise locations. This way, it enables out customers to detect micro-outages in specific geographies and see every layer of the network from the end users’ perspective, regardless of where those end users are located.

This network visibility is combined with additional monitors and over 20 different test types – including custom monitors to accommodate specific tools – as well as Catchpoint’s powerful analysis engine that can slice and dice the data for rapid MTTD and MTTR.

Summary

DNS Monitoring
Traceroute Monitoring
BGP Monitoring
Endpoint Monitoring

To understand the full scope of Catchpoint’s Network Insights, let’s dive into each of these capabilities in more depth.

DNS Monitoring

Catchpoint offers two different types of DNS monitors: DNS Direct and DNS Experience.

DNS Direct Name Servers tests directly query the Name Servers to provide availability data and ensure the responses delivered are accurate.
DNS Experience tests run recursive queries to resolve DNS (just as recursive DNS servers do) to measure latency, performance, and availability of the various DNS servers in the pathway.