Blog Post

Traceroute InSession: A traceroute tool for modern networks

Updated

Published

March 26, 2024

mins read

in this blog post

Heading 2

This is a follow-up to our previously published post announcing Traceroute InSession, where we provided extensive technical details about how it works. In this series, we explore the challenges InSession addresses within modern networks and compare it to other traceroute variants.

At Catchpoint, we pride ourselves on being traceroute experts. Why? We’ve run over 15 billion traceroutes over the last 15 years and more than 3 billion in 2023 alone! These traceroutes have originated from backbone locations connected to Tier 1 ISPs worldwide, cloud and wireless locations, enterprise nodes within private networks, and even laptops running our Workforce Experience agent.

Uncovering the issues

With standard traceroute, the results all too often look something like this:

A diagram of lines and dotsDescription automatically generated

This example showcases a TCP Traceroute from one of our backbone nodes in New York connected to Cogent Communications, to Amazon. As expected, Amazon uses a CDN to serve traffic from different edge locations, so the traceroute traverses five different destinations over time. This behavior is anticipated and normal.

The firewalls problem

However, what’s not expected? The red dots and all the crossed lines. The red dots show packet loss – but in reality, the Internet isn’t that broken. Rather than packet loss, this is likely a result of firewalls blocking some of the traceroute packets. We’ll call this the firewalls problem.

The paths problem

All the crossed lines indicate that this is a very complex router topology – except that’s not true either. We’ll call this the paths problem. In fact, standard traceroute cannot reveal the path that a packet takes! That’s super counter-intuitive: Traceroute is meant to trace the route – but in reality, what it tells you is that it encountered one router X hops away and another router Y hops away. It has no way to prove whether the routers are connected or what’s between them.

To explain the paths problem further, let’s look at an individual traceroute run:

Traceroute to www.amazon.com (18.164.107.218), 30 hops max 
1 gateway (38.101.106.65) [AS 174] 0 ms 0 ms 0 ms   
2  * * *   
3 te0-0-0-1.ccr31.jfk04.atlas.cogentco.com (154.54.5.177) [AS 174] 1 ms 1 ms 1 ms   
4 be3363.ccr42.jfk02.atlas.cogentco.com (154.54.3.125) [AS 174] 2 ms 1 ms 1 ms   
5 be3201.rcr52.ewr01.atlas.cogentco.com (154.54.90.62) [AS 174] 2 ms 2 ms 2 ms   
6 38.142.215.210 (38.142.215.210) [AS 174] 2 ms 2 ms 2 ms   
7  * * *   
8  * * *   
9  * * *   
10  * * *   
11 15.230.208.23 (15.230.208.23) 2 ms 2 ms 2 ms   
12 15.230.208.23 (15.230.208.23) 2 ms 2 ms 2 ms   
13 server-18-164-107-218.jfk50.r.cloudfront.net (18.164.107.218) [AS 16509] 2 ms 2 ms 2 ms

This output indicates that the traceroute went to the gateway, then went to the jfk04 Cogent router, then the jfk02 router, etc, right?

WRONG!

What it actually shows is that one packet from the traceroute took a path through the gateway and found that it was one hop away from the source. Another packet took a path through jfk04 and found that it was three hops away from the source. A third took a path through jfk02 and found that it was four hops away from the source, and so forth. Each packet in a traditional traceroute operates independently. The traceroute output only reveals the distance of each router from the source, but not whether they are interconnected.

Another problem is that a single traceroute doesn’t tell you if all possible routers have been identified. Fortunately, Catchpoint and others solved this problem long ago: since each traceroute identifies some routers, the only way to find them all is to run lots of traceroutes. That’s how we got the diagram at the top – each traceroute found one router at each hop, and we ran many traceroutes to be pretty confident that we found all the routers.

Solving the firewall and path challenges with Traceroute InSession

We set out to improve the two remaining problems (“firewalls” and “paths”): firewalls causing so-called “packet loss” and the misjudgment that routers are adjacent when they actually might not be.

And we succeeded.

The result, Traceroute InSession, creates a diagram that looks like this for the example above:

A line of lines with dotsDescription automatically generated with medium confidence

Far fewer red dots (none in this example!) and much cleaner lines – ones that we can be relatively confident- indicate that routers are adjacent.

How? We’ll tell you (and even provide a link to the code we wrote and made open source), but first, let’s rewind to the beginning of the traceroute. If you’re impatient, skip the background and dive straight to the solution.

First, let’s delve into some background.

A brief history of traceroute

Traceroute was developed as a diagnostic tool to identify the journey of a packet through an IP network and to assess the pathway’s performance. Before IP networks, traceroute didn’t need to exist. Well, maybe it could have – but the Internet’s simplicity at the time didn’t demand such a tool. To illustrate, this is what the Internet looked like in 1984:

A map of the united statesDescription automatically generated

Compare that to what the Internet looks like nowadays:

A colorful circular object with many lightsDescription automatically generated with medium confidence

There’s a lot of complexity – and complexity necessitates tools to decipher it. This principle will be a recurring theme in our discussion.

In the 1980s, the Internet Protocol (IP), created in 1974 by Vint Cerf and Bob Kahn, was still a novel concept. Ping was introduced in 1983 to measure the latency between a client and a destination. Following this, in 1988, Van Jacobson wrote the now-famous ‘traceroute email’:

A computer screen shot of a black screenDescription automatically generated

The level of complexity had reached a point where the creation of a specialized tool became necessary.

How does traceroute work?

Van Jacobson created the original traceroute in 1987 using the IP Time To Live (TTL) field. This field specifies the maximum number of routers a packet can pass through. Each time it’s routed, the field is decremented. When it reaches 0, an error message (ICMP TTL Exceeded for IPv4 or Hop Limit Exceeded for IPv6) is sent back.

A table of informationDescription automatically generated

Modern networks are more complex

The original algorithm implemented by Van Jacobsen works for a lot of scenarios today – but the complexities of modern networks mean that this algorithm must be enhanced to work properly. Specifically, traceroute struggles with firewalls and load balancers.

The firewall challenge

Security is a perpetual cat-and-mouse game. Each technological advancement is met with exploitation by malicious actors, leading to further innovations designed to close those loopholes. This is exactly what happened with traceroute.

ICMP traceroute often fails in modern networks because ICMP messages are generally used for diagnostic purposes and rarely carry application data. Many firewalls block them entirely.

Similarly, with UDP traceroute, firewalls often block it because it doesn’t look like application traffic. There isn’t a common UDP application that is imitated by traceroute, so firewalls block the packets.

For a while, the answer was TCP traceroute – because every TCP connection starts with a 3-way handshake, the firewall has no means of checking for application data, so it allowed the SYN packets through. Then malicious actors discovered and exploited SYN floods – and as a consequence of firewalls blocking SYN floods, TCP traceroute stopped working too. (Of course, this depends on how SYN flood detection is implemented and where SYN rate thresholds are set, etc).

The load balancer challenge

The Internet is fragile. One of the technologies that help improve its resilience is load balancers, specifically link load balancers.

In this context, a load balancer can be considered a specialized router that decides where to send a packet, not by the packet’s destination network (that’s what every router does) but by various other criteria. For example, cost-related policies might direct important application traffic over more expensive links for reliability, or resilience considerations might reroute traffic via a backup link if the primary route is too slow. These decisions lead to scenarios such as the following:

A diagram of a networkDescription automatically generated

Looking at this diagram without any outside knowledge, it’s impossible to know which path a particular packet might take to reach the server. Maybe 1-2-5? Or 1-3-4? Or 1-2-4?

Load balancers are generally configured to balance flow traffic. If a TCP connection is established using a particular path, the load balancer will try to keep the same path for every packet in that connection. The same principles apply to UDP flows.

However, with traceroute, a “flow” is a single packet and response. Each probe sent out has different characteristics and may get load balanced to a different path.

This variability can lead to some very weird traceroute responses. For example, in the diagram above, Hop 1 will always be router 1. Hop 2 could be Router 2 or Router 3. Hop 3 could be Router 4 or Router 5. Hop 4 is the Server.

So, if you ran a traceroute and saw output like this:

1 (Router 1)
2 (Router 2)
(Router 3)
3 (Router 4)
(Router 5)
4 (Server)

How do you know what the actual path is? You could assume, for example, that a possible flow is 1-2-4, but that isn’t a path that exists.

Navigating complexity: Introducing Traceroute InSession

The complexities introduced by firewalls and load balancers have caused the original traceroute to be less useful than it once was. To address these challenges, Catchpoint developed (and open-sourced) Traceroute InSession.

You can read our original blog post for a detailed explanation of how Traceroute InSession works. Below is a brief overview.

InSession imitates TCP connections by establishing a TCP connection to the destination server, so a firewall allows it through. Also, by using a single TCP connection, load balancers use a consistent path for every probe because each is considered part of the same flow.

Here’s the result:

A screenshot of a computerDescription automatically generated

A screenshot of a computer screenDescription automatically generated

The first image shows many missing hops, with multiple routers returned for hops 5 to 9 and just one of three probes successfully reaching the destination (hop 13) because a firewall blocked the others.

Conversely, the InSession version depicted in the second image demonstrates a marked improvement: most hops return data, the path is consistent with only one response per path, and all three probes succeed in reaching their destination.

What about SideCar?

Sidecar is the algorithm that inspired InSession. It essentially sets up an actual TCP connection or sends data over an existing one. By doing this, it also bypasses firewall detection and causes load balancers to use a consistent path.

The problem with Sidecar is that it sends actual data over a connection to a real server application, which means that the destination server has to process this data. The data has to look real enough for it to reply. With InSession, we use two cool TCP options to prevent this additional load from being added to the server: Congestion control and SACK.

Congestion control means that if there is a gap in the sequence numbers received by the server, the packets won’t be delivered to the application until the gap is filled – a gap that remains unfilled because we put it there deliberately.

SACK, or selective acknowledgment, allows the server to respond to the client and tell us which sequence numbers made it and which didn’t. Without it, ACK packets wouldn’t be generated until the gap in sequence numbers is filled.

Read all of the details with examples about how it works here.

What about Paris?

Catchpoint offered Paris traceroute as an option for about ten years and found that customers generally didn’t use it – and when they did, it caused traceroute failures from locations that had previously operated successfully, more often than not. Consequently, we decided to replace the Paris Traceroute option with InSession.

Paris traceroute shares a common objective with InSession: to allow traceroutes to work amidst routers that employ load balancing on packet header fields. This means that in our aforementioned New York to Amazon TCP traceroute scenario, the adjacent routers would be correctly identified.

However, Paris traceroute encounters two significant issues: Firstly, it relies on a stream of SYN packets similar to traditional TCP traceroute, so firewalls often block it. In our New York to Amazon TCP traceroute scenario, the red dots would still be there. Secondly, the algorithm is pretty complex because it manipulates different fields inside the packet header:

Imagine explaining to your security team how this application works and why it’s not malicious! In contrast, InSession simply transmits standard, unmanipulated TCP packets – albeit intentionally skipping one to create a sequence number gap.

While Paris traceroute is available for ICMP, UDP, and TCP protocols, InSession operates solely on TCP. Based on our observations, if there are firewalls that block standard traceroute, they’ll block Paris ICMP and UDP as well. Since InSession runs over a mimicked TCP session, it has proven more reliable in environments where firewalls may drop packets due to SYN flooding, a vulnerability in regular TCP traceroute implementations, including Paris traceroute.

What about other traceroute variants?

There are several other variants of Traceroute. Dublin Traceroute is an add-on to Paris for NAT detection. Pamplona Traceroute attempts to identify IP responses that belong to the same router, also known as aliases.

Ultimately, you should use the tool that solves your problem and allows you to navigate the complexity of your environment. We’re convinced that for most people, Traceroute InSession is the optimal choice.

What else should I know about InSession?

Well, it doesn’t solve the load balancing problem for per-packet load balancers – only for per-flow load balancers. Per-packet load balancers are rare in our experience, so it still solves most users’ issues. Furthermore, as far as we know, no algorithm has solved the per-packet load balancer problem.

Moreover, similar to all traceroute implementations, InSession doesn’t discover possible paths; it only reports the paths it finds. So, looking at a traceroute output, it’s impossible to tell whether all possible paths have been discovered. The best way to do this is to run a lot of traceroutes over time and from multiple vantage points, then look at all of the paths in a single network graph with a tool (like Catchpoint IPM) that will allow you to slice and dice the data to see what you need.

A white and grey background with black textDescription automatically generated with medium confidence

Ready to explore it yourself? Access the code on GitHub, along with some pre-built binaries. We’re eager for your feedback, so please share your thoughts!

Curious about what other traceroute enhancements we’re working on?

We’ve got a lot in the pipeline! We recently discussed some of these developments at a NYNOG meeting and at RIPE 87. We’ll be diving deeper into other enhancements in future blog posts. Some enhancements are already available for you to explore (again, take a look), and others are on the horizon.