Webinar

How to find Network Visibility Gaps: Strategies to Ensure Resilience and Performance

As IT infrastructures grow more complex, visibility and resilience have never been more critical.  

With hybrid IT, remote workforces, and distributed services, your network extends far beyond the data center or cloud—it spans the internet. Traditional monitoring tools leave blind spots that impact user experience and lead to costly downtime. To stay ahead, modern network performance monitoring (NPM) must evolve.  

Watch Catchpoint technical experts, Moiz Khan and Sheldon Pereira, as they explore how to bridge visibility gaps, leverage key network metrics, and enhance monitoring with modern observability techniques. You'll learn how Internet Performance Monitoring (IPM) fills the gaps left by traditional approaches and strategies to strengthen network resilience and performance.

Key takeaways:

  • Networking Landscape: The shift from traditional monitoring to IPM and its growing importance.  
  • External Visibility: Why monitoring DNS, Cloud, CDNs, and BGP is key for full-stack performance insights.  
  • Proactive Resilience: How IPM detects and resolves issues early, using automation and AI for predictive analysis.  
  • Key Metrics: Essential metrics for network reliability and evolving your monitoring practices.  
  • Collaboration & Integration: How real-time monitoring bridges gaps between DevOps, NetOps, and business teams.

Register Now
Webinar

How to find Network Visibility Gaps: Strategies to Ensure Resilience and Performance

Register Now

As IT infrastructures grow more complex, visibility and resilience have never been more critical.  

With hybrid IT, remote workforces, and distributed services, your network extends far beyond the data center or cloud—it spans the internet. Traditional monitoring tools leave blind spots that impact user experience and lead to costly downtime. To stay ahead, modern network performance monitoring (NPM) must evolve.  

Watch Catchpoint technical experts, Moiz Khan and Sheldon Pereira, as they explore how to bridge visibility gaps, leverage key network metrics, and enhance monitoring with modern observability techniques. You'll learn how Internet Performance Monitoring (IPM) fills the gaps left by traditional approaches and strategies to strengthen network resilience and performance.

Key takeaways:

  • Networking Landscape: The shift from traditional monitoring to IPM and its growing importance.  
  • External Visibility: Why monitoring DNS, Cloud, CDNs, and BGP is key for full-stack performance insights.  
  • Proactive Resilience: How IPM detects and resolves issues early, using automation and AI for predictive analysis.  
  • Key Metrics: Essential metrics for network reliability and evolving your monitoring practices.  
  • Collaboration & Integration: How real-time monitoring bridges gaps between DevOps, NetOps, and business teams.

Video Transcript

Howard Beader

00:06 - 07:57

Alright. Why don't we kick this off today? Hey, everyone.

Thank you for joining us today. We're really excited to be talking to you about how to find network visibility gaps and provide you some guidance and strategies for how to, improve resilience and performance.

So before we get started with the content, just wanna make sure you're all familiar with how to use this, platform. We've got a chat console, I believe, is on your right as well.

And within that, feel free to ask us any questions throughout the, presentation. There's also a q and a tab where you can, ask questions as well.

Depending on the questions you ask, we may actually promote them into the q and a tab so that everybody's, can see the questions and answers. We also have a docs tab.

And in that docs tab, you'll find some handy, documents that you can leverage around, our capabilities, our network admin guide, as well as an ability to sign up for our LinkedIn newsletter, for network folks as well. So with that, let's jump in.

So, I'm joined today, by some, experts, here at Catchpoint that are deep in the platform and are working with, some of the leading organizations around the globe. I'll have them introduce themselves when they start to present.

That way we can kinda make this a little easier. But I'm Howard Beader.

I lead product marketing here at Catchpoint, and really happy that you've all taken the time to join us today. We've got a pretty big agenda.

I'm gonna kick off with just a little update on Internet performance monitoring, and then, Sheldon and Louise are gonna jump into, some details on, what's changed in network monitoring, how the different use cases that Catchpoint can be applied to and help with, and, ultimately, how to improve your performance and resilience overall. So let's talk about what we mean by Internet performance monitoring.

And what we mean, and why we're talking about this is that, really, the Internet has become the core, new application fabric today. And what that means is that when you think about modern applications today, they're distributed, multiple clouds, multiple APIs, multiple, microservices, and all of them are pulled together, in milliseconds by the Internet.

And so we're all depending now on the Internet, not just from a pipe standpoint, but to ensure that our applications really even just run and provide great user experience and have the ability to operate and transact within milliseconds. And that's where Internet performance monitoring comes in.

On the right, you can see what we call the Internet stack. And the Internet stack is really a collection of all of the capabilities from your front end user all the way back into your application.

You know, think of this as all of your core, network services, your CVNs, your DNS, your BGP, your trace route all the way back. And that's what we'll talk about.

And we are able to monitor, all of those different applications, whether it's your customer experience, your workforce, your, applications and APIs, your internal and external networks, leveraging the world's largest agent network. You can think of these almost like secret shoppers that we have in all the different layers of the Internet, throughout the globe.

We now have over 3,000 agents out there, and these give you the ability to monitor from where it matters for you. Right? You know, our, chief product officer, who I'm presenting with all the time, for our in industry analyst likes to say and, you know, imagine having a tsunami detector.

And that tsunami detector is in Tulsa Tulsa, Oklahoma. Well, if I'm in Hawaii or Japan, monitoring if there are tsunamis in Tulsa, Oklahoma really aren't gonna matter a whole lot.

You want it to be monitoring from where it matters, from where your users are so that you understand whether or not there's any issues, and can literally jump in and fix them before it impacts the business. We're gonna talk be talking today a lot about our network experience.

And network experiences are a capability to ensure that your network, both internal and external, is resilient and reachable. And if there are issues, we wanna ensure that there's no finger pointing.

Right? I'm sure you've all been in the room where, you know, the application team is saying it's the network. The network team is saying it's IT ops, and you're really trying to get to the root cause.

Well, by leveraging Internet performance monitoring, we're gonna help you identify what that root cause is very, very quickly so that both, the network team, IT ops, your SREs, all are on the same page very, very quickly and can resolve the issue faster. And, you know, we've been doing this quite a while.

We are a recognized leader, in the industry. Gartner has what they released as the digital experience monitoring, Magic Quadrant last October.

Catchpoint is one of the core four beaters in that Magic Quadrant. And interestingly enough, we are the only one in that Magic Quadrant with a global agent network where we can monitor, from over 3,000 different agents, for you.

So, definitely very differentiated from everybody else. So with that, I'm gonna turn it over to Sheldon.

Let him introduce himself, and, we'll continue.

Sheldon Pereira

07:57 - 14:07

Thank you. Thank you, Howard.

And thank you everyone for joining this webinar today. My name is Sheldon Pereira, and I'm a value engineer with Catchpoint.

I've been with Catchpoint for about three years now, and the topic of network monitoring is of particular interest to me because I have worked in the past as a NOC engineer and a network engineer as well. So, yeah, I'm really looking forward, to this discussion today.

To to start off with, I wanted to quickly talk about how the network, how modern networks have evolved compared to those of the past. So if you see on the left here, we see a traditional network setup where your applications were generally hosted on premises and the network devices that were used, within a site or to connect to your applications were for the most part within your control.

Even if you had a connectivity from one site to the other, these were generally established using private MPLS or leased lines, which again, was good in getting end to end control between sites. So what this meant is that visibility into the network was much easier as, there were lesser dependencies that you had to account for, and you had more control over the network devices and the applications.

But if you move to the right and we see the network, that we see on the right side of the image, this is the network as we have it today. The modern network and application delivery chain is quite complex as there are a lot of dependencies as you can see in the image.

To take an example of an e commerce company that's targeting end users who are shopping using their local ISPs from home. There's a heavy reliance on these local ISPs to ensure that they have good connectivity and performance is good over the network as well as local DNS resolvers.

And again, many applications are increasingly becoming more SaaS based. So for example, something like Zoom, which is used heavily by remote workers for collaboration.

You have a heavy reliance on the Cloud platforms as well as CDN vendors, wherever applicable. Then you have third party APIs.

These days, most applications integrate with each other to give you the best offering of the application itself, and they rely heavily on third party APIs for this integration. Again, monitoring these APIs, making sure the APIs are reachable is again integral to ensuring seamless application delivery experience and a network experience.

Finally, more and more network solutions are moving to a software defined platform. SD WAN is becoming increasingly more popular where the overlay and the underlay split again, that again causes a situation where you don't have complete visibility into the network.

So to monitor a network that is, that has so many dependencies and that has become increasingly more complex, we need more effective monitoring solutions so that we can catch any issues that are beyond our control. I wanted to quickly touch on, the growing needs of the network.

So, what I've listed out here is the different types of network monitoring approaches from the past from, somewhere around the eighties, nineties, and moving to the monitoring approaches that we have today. Generally, SNMP and NetFlow was some of the earliest, SNMP one of the earliest forms of network monitoring, and it was used to notify you using SNMP traps and it would give you vital information like, device health.

So memory utilization, CPU utilization, device uptime, and even interface metrics. We then move on to flow based monitoring.

Now flow based monitoring like NetFlow or sFlow was was used heavily to detect, things like top talkers. So you could see what devices were consuming the most bandwidth over the interfaces.

And then we move on to packet capture and deep packet inspection. So tools like Wireshark and TCP dump.

So it was normal for a network engineer to configure a SPAN port or a a tap kind of an appliance which sends Wireshark data to a laptop. And then you look at the data in a in the packet level and you can understand what has gone wrong.

You can look at it in granular detail, which was again very useful for troubleshooting and fixing any issues which were caught with network monitoring. We then move on to network observability.

Now, here's where it comes into the realm of the networks, that network monitoring strategies that we have today. The idea of network observability is to monitor your network from an outside in approach.

What that means is you have vantage points across the globe hosted in either Cloud locations or in data centers. And from these vantage points, you test what the performance is like towards your network.

You look for ping packet loss, you look for round trip time, you look for latency, and that helps you understand what, users in these locations are experiencing, when were they are accessing the network. From network observability, we then want to digital experience observability, and this is where Catchpoint fits in.

So besides out the outside in approach where as Howard mentioned earlier, with the vantage points that we have across the globe, we can monitor your networks from various ISPs and various cities across the globe. We also employ an inside out approach.

So we look at, how your network is performing when you are testing from your branch location or from your data centers. And that this is performed using synthetic testing, and this is come combined with real user monitoring and tracing and BGP data.

So the end result is not just network monitoring and getting a sense of how your network is performing, but a complete view of your entire digital experience. And, that was a quick, recap on the way networks have evolved over the the last, few years.

And, we're gonna move on now, and we're gonna talk about how we can leverage network monitoring and the type of use cases that we can use network monitoring for to capture issues or gaps in visibility that you may not be aware of if there are issues. I'm going to pass it on to Moiz now and he'll be walking you through some of the use cases.

Moiz Khan

14:07 - 20:10

Thank you so much, Sheldon. Thank you everyone for joining today.

My name is Moiz. I'm based out of Dubai.

I've been with Catchpoint for over eight years and have enjoyed this time. It's, it's kind of a journey that I've loved how we have evolved.

Now Sheldon and Howard have actually, created a very nice perspective of what I'm gonna talk about. That's kind of, set the platform for me.

Now, Sheldon, towards, the end of your talk, you mentioned something about, which I also kind of, like, totally agree with you on, which is just monitoring the network, which is definitely one of the most important pieces of your monitoring strategy. But it is in today's world, it is not enough if you just look at the network.

You have to look at the network, but from a digital experience observability or monitoring perspective because everything is so over, complete, like, kind of interconnected. There is dependency on each other, which is what, I'm gonna give an example to explain, like, what happens, like, in those situations.

So you have applications that you own and you have users who are coming on across the world, and they want to access your application. And they could be, accessing that from a mobile device, a laptop sitting at home, office, in a coffee shop.

You have your users who could be accessing the applications internally, which is not on cloud, let's say, on your own data centers, but they might be accessing that from their office, coffee shop, or again from, their home. You have applications which you might have, installed on your cloud locations, different clouds.

You have multi cloud approach as well. So these are all the perspectives.

It is not complete. But just to give you a picture of what is happening in today's world, I'll give another example so so that, it'll be, a little clear as to what I'm trying to get to.

So everybody has understanding of, like, APIs, microservices, and how critical they are in today's, application delivery. It's very, very important without that.

Literally, I don't have, I don't see any application which can run without APIs or microservices. Now you have, platforms like, APG, which provide that service so that you can manage your APIs better.

Now the way it works is that, let's say, if you're using APG, any kind of and you have your own API microservices endpoints, which you might be hosting on, let's say, cloud, environments like AWS. And then what happens is when a user is requesting now, they don't request your API endpoints directly because it's a part of your application.

So you have your application. They access that application, and that makes a call to, for that API endpoint, which goes through APG, which is on Google Cloud.

And that request is then sent to your endpoint, which could be on any cloud, could be on GCP, could be on AWS. Now you can see there could be latency.

Now how would you identify where the latency is happening? Is it happening from the user to the, GCP, location where APG handles those request, or is it happening from GCP to where your application is? So you you know, okay. This is how the structure is, but you need to have visibility to understand.

Okay. If the user is coming from this region, what is the kind of performance they are getting? And if the latency that is happening, is it because of the network part? Unless and until you have the monitoring in the right places, in the right locations, it's very difficult to identify.

You may get to know there is latency, but you may not get to know why exactly that is happening unless and until you have that visibility. So this is the kind of, world we are living in today where it has become so complex.

It has made our life easy, but for for people who are in the the backstage, they know how complicated it is and what needs to be done to ensure that we have visibility into everything, not just network. Great.

Moving on. So one of the biggest thing that happens, from the delivery chain perspective when the user or whoever is trying to access an application is, looking at, like, the DNS resolution.

Right? That's that's the main thing that happens. The first thing without that, obviously, you don't get the IP.

You are not able to connect, to whatever, application that you're trying to connect to. And this is actually important from a delivery chain perspective, application monitoring perspective, but definitely it's a role and responsibility from a network team perspective because they are kind of, the the people who take care of the DNS, and it makes sure that it is resolving properly.

There is no the latency is under control. Because there are so many issues that can happen.

Is it being served by the, DNS servers who which are which are closer? And, the latency is not too high because if, let's say, a user is accessing, your application from Singapore and the DNS resolution is happening from US, that could create a latency which can multifold because multiple users can have problems, and that just adds up to your overall page load time. Now to identify that, you need to have DNS, monitors in place.

And with Catchpoint, we we do have a capability, which I personally really like is that you can not only run DNS test, but you can also enable trace routes to see what path it is taking to get to that DNS server. Is there any kind of issues, in that path? And what, can be done to ensure that the the latency is, definitely under control? And that also Moiz,.

Howard Beader

20:10 - 20:31

sorry sorry to interrupt. Let me just ask you a quick question.

So when looking at DNS and DNS monitoring, is there just, for most of our customers' websites, let's say, is there only one DNS for the page, or are there multiple DNS calls that have to get made for all the different elements on the page?

Moiz Khan

20:31 - 33:56

Great question, Howard. That's that's that's how complicated our systems have become.

Right? So when we look at our pages, we have requests which are being made to, obviously, things which are which we have under our control, which we call it as, like, the first party. We own those host names.

But then you have pages which have very high dependency on third parties, which also have to go through the same DNS resolution process. So not only you're responsible for taking care of our own properties, from a DNS resolution perspective, you also need to monitor because if those DNS resolutions fail, it will not help your application to load properly.

And then it can increase the latency of your page load, and it can in in some cases, we have also seen the page can even break if that, is, that third party service is not loading. And it's not just about monitoring just the DNS resolution.

You also need to check your name servers, which Catchpoint, definitely has that capability that you can monitor your individual name servers. You can even look for the specific hosts which are being served by that name server.

So you can drill down to that level where you can set it. This is the name server.

Is it responding with the right information? What is the latency? And if the latency is high, what what is the reason behind it? And there are certain things which, can be also complicated, in terms of, like, it's an anycast, DNS provider. Because in today's world, your DNS providers are also outside of your control in in in a sense that you use third party DNS providers.

And then you have to ensure that you're monitoring that so that you have some SLAs which are defined and you are keeping an eye on the on those SLAs. Are they providing the right service? Is it under control or not? Because if you do not keep an eye, then they might not give you the right information at times.

So which is why it is important from, a balanced way that you are also monitoring your, service because somebody else is taking care of your DNS. And if it is an anycast, that also becomes, tricky because you have just one IP address or in some cases a few IP addresses.

And then, there are multiple servers behind the scene, and you do not have visibility into that with with, NSID, which is a feature in catchpoint. You can enable that to get to the point where what is that, actual server which is responding to the DNS queries so you can visualize that, and understand what is happening.

Is it a specific name server which is, causing that, latency or failures that you might be seeing? And a continuous monitoring is very important for for DNS because the the issues can be intermittent, but then there could be a pattern that is happening. If you're not doing this continuously, then, it becomes a challenge because when the issue happens, you should have the historical data as well to compare with what is happening, at this point of time.

And then once the issue is, let's say, fixed, the was it actually fixed? Do you see improvement in the performance, or is the outage, taken care of? So that's that's, how critical it is because if your DNS goes down, then obviously your application is respective of what it is supposed to do. Everything stops working.

Going on. So this is an issue that we we we we captured wherein the the authoritative name servers originating from a specific subnet.

They they failed to respond when the request was coming from a specific ISP. Now this is a very interesting situation wherein we always see that, okay, the DNS failure happened.

Let's say, for example, it's happening for everybody. But right now, in this case, it was just happening for a specific ISP.

So now the from you may not even identify this unless and until you are monitoring it from where the users are coming from, which is why Catchpoint has this approach where our backbone notes have dedicated, nodes on specific ISP providers, which is which is what we call as the backbone. Unless and until you have that visibility, this issue couldn't have been identified, and we had this, visibility.

We're able to identify that, okay, this is not a global issue that our client is facing. It is just specific to an ISP.

Of course, it is an issue, and then they went back and got that fixed. But it's easier, and then it's very specific.

You know exactly where the problem is and what needs to be done. It's not like you're finding a needle in a haystack, but pinpointing exactly where the problem is.

Now in this puzzle, one of the things, that we kind oops. Sorry.

Alright. In this puzzle, one of the pieces that we kind of, like, miss out on is CDN.

Now CDN in, like, if you have to monitor CDN, Catchpoint has a very, extensive, and, detailed way of how it needs to be done. But one of the things that we think is from a network perspective, why do we need to monitor CDN? It's not the responsibility of the network team, but it is very much a question, which can come that if there is an issue, is it happening because of the network or is it something else in which you need to monitor CDN so that at least the network team is aware that the issue that they are facing, it's not happening because of their network or or anything to do with their, responsibilities that they're they're supposed to take care of.

It's happening because the CDN is not performing or not doing its job. So you need to have visibility into the DNS offload that happens to the CDN because you you have your own DNS provide provider, your own name servers.

But then if you are using a CDN at some point of time, the offload happens wherein it goes to the CDN. Is that DNS, performing good? Because once it enters the CDN CDN's DNS network, are they doing, the job properly? Do they have the right, name servers associated? And what is the latency? Is it because it's going through a lot of, process, a lot of levels of resolution to get you the answer? And we have seen these kind of cases in many situations, and it it increases the overall, page load time because the DNS itself is taking a lot of time.

Right? And then you you have the CDN and then where does the request going to? Even though it is something which, can be a part of the network team, if the CDN is supposed to bring the content closer to the user, the user being in a specific location, let's say, for example, US, and the content is being served from, let's say, Singapore, then that is not an optimal way of handling these requests. And is it happening on a regular basis? You you need to have visibility across, regions.

And is it happening all across the region that the users are being sent to different, regions which are quite far, which will increase the latency in a way? And if you have the visibility into that, then you can have a question to the CDN team or, like, the CDN provider, why is this happening? And, take remedial actions quickly and hold them accountable. So from that perspective, CDN monitoring, even though technically from from the word, it is not a part of the network team, but it is very critical from how the the application today is served.

So for for that reason, what we have done is we have prepared this, CDN monitoring checklist. And some of the things which is, which we do on a general basis, we look at, the data that is provided by the CDN, vendor itself.

But then you need to have a third party view of how the CDN is performing so that you can have that data sent to them to so you're if you're questioning them, you need to have a data to back back it up. So you have, DNS as as far as, the this first stage of monitoring of a CDN is concerned, but there are other things monitoring the CDN mapping, checking it from a hit or miss ratio.

So there are a lot of other things. We've again, we can share this later on.

I think it's available in our site. You can have a look at it, but it's it's something which, is very critical from a delivery chain perspective.

Alright. Moving on.

This is something which, I wanted to talk about and then something interesting happened when, I started working on it. And, like, the SSL, part of it, which is one of I think it is something which is, highly ignored.

People don't take it seriously from a monitoring perspective. But then if your certificates are expiring, obviously, your your page is literally down.

And how many times have we seen in the past, like, big organization? They forgot to, renew their certificate and the page was not available. And then, they had to wait for, like, four, five, six hours, who knows, how long to just, get their site up and running.

But a simple, test would have alerted them that, oh, okay. Your certificate is gonna expire in thirty days or fifteen days, and then they could have taken, actions to renew it.

But then now what has happened is if you're if you are going through, the latest updates, the certificate lifetime has been like, they've planned to reduce it. Right now, it's, like, three ninety eight days.

It is gonna get down to 200 hundred, and at some point of time, it will get down to forty seven days or some cases, seven days as well is what I've heard. So unless and until you are monitoring your certificates on a regular basis and taking those actions.

And it's just not about one or two certificates. Organizations, even small organizations, can have hundreds and hundreds of certificates.

It's not possible manually to to keep an eye on all of them. I have I have known organizations where they have employed, one person just to keep an eye on the certificates, and they had about, like, 40,000 plus certificates.

And there was just one person who was employed to do that. When you can set up monitors and Catchpoint, which will do that for you.

Once you set it up, you forget about it, then you get the alerts when, that happens. From a from a delivery chain perspective, from a network perspective, this is important.

Right? Without the SSL sort, handshake happening, your page is like, literally your application is not reachable. So from a reachability perspective, which is a part of your delivery chain and network, your DNS, is important.

Your, certificates and SSL are very, very important. And I think, with this change that we are going to see, in some time, it becomes even more important to have that visibility.

Great. The other thing is benchmarking the throughput.

You have your pages which have contents depending upon what type of content it is. It could be a CSS file, JSON images, JavaScripts, and you have users who come from different types of networks.

Slow networks, fast networks. And what what needs to be done or as a good practice is to look at the throughput and benchmarking it and seeing, okay, what kind of throughput are we getting if the user is coming from, a three g network or a four g or five g.

And just ensuring that, okay, are we giving the optimal page structure to those users who have a slow network connection? But are we keeping an eye on those things or not? So with with this approach, what what you will be able to do is ensure that your users have an optimal performing page, which is kind of structured in a way to ensure that they they do not face any, challenges when accessing your pages. So it is something which you can easily enable in Catchpoint.

You can set it up. Like, benchmarking is something you can do for a lot of things.

You can benchmark yourself against your competitors. But benchmarking against the different types of networks for different content that has been delivered, that helps you to ensure that your pages are always structured and designed in a way that everybody gets a decent performance and they're not facing any challenges.

And if there are any bottlenecks, you are ready with the information to get that fixed. So with this, I'm gonna hand it over to Sheldon.

He's gonna talk about some very interesting things that, he has up on his plate. Go ahead, Sheldon.

Sheldon Pereira

33:56 - 48:14

Thank you. Thank you, Moiz.

Yes. So I'm gonna be, talking.

Did we stop sharing? Let me share again. Yep.

Thank you. Alright.

So I'm gonna be talking about, I'll take the next few slides to talk about network reachability and incorporate BGP as well as other networks and how you can, determine if there is any issue from a network standpoint. So network monitoring at its core is meant to help you understand network reachability across the globe.

That's where the outside in approach comes in, which we were talking about earlier. What we see here on the map on the left side, those blue dots basically tells us the agents that are located across the globe.

Now these agents are located at most of the tier one and tier two ISPs in various major cities across the globe. You're basically testing how your network is performing from the point of view of a user who is located in one of these cities and using either one of these tier one or tier two ISPs.

And for the most part, the testing is gonna be trace route and ping if you are looking more from network testing. But also, if you're hitting an application and you would like to see what path you're taking while you're hitting the application, it's using an application test and combining it with TraceRoute.

Now when you perform testing this way, it can help you monitor very quickly if a network issue is local, regional, or global. It can also help you detect if an incident is a micro incident or a long duration incident.

It can also observe for any traffic rerouting or any failover patterns that takes place during a network outage. The expected outcome is to ensure that you resolve issues faster And because you are monitoring from ISPs across the globe from various cities, it's very easy for you to pinpoint where an issue is originating from.

And it it just it doesn't have to be just that the network itself is down. Or let's say, you are monitoring one of your, routers that's probably an ingress or an egress router.

It's not necessarily the router itself is down, but it could be slowness, packet loss, latency, or jitter. So those are things that you can look at, and you can pinpoint that it's it's just happening from the Asia region or just happening from the America region.

And that's how network reachability monitoring is, can be used in Catchpoint. I'm gonna take the next few slides to talk about BGP.

You know, the way BGP is such an important protocol when it comes to networking and the Internet in general. So being able to be on top of any BGP issues that may be taking places is very important to be proactively notified.

So, the first use case that I wanna talk about is, BGP reachability, specifically the BGP prefix reachability. So let's say you own an ASM and you have a prefix which you are advertising across the globe to to your neighbors.

You need to ensure that your prefix is, announced correctly and all the peers, who you're appearing with and all your neighbors have a path back to your prefix. We want to make sure that it's being announced all across the globe.

Any user trying to access, maybe you have a website that is hosted in one of your IP space. What if your prefix is not reachable? What if users do not have a path to your prefix? How can you be made aware of that? So right here in this example, we can see that, we see the I p v I p v six and I p v four prefixes in a particular region was unreachable.

And we were notified proactively that, this prefix is not reachable. When we did when we went in further, when we see on the image below, we noticed that there was a a significant drop in the reachability of the prefix.

We saw a lot of withdrawals taking place from neighbors across the globe. And at the same time, we saw that a lot of peers, a lot of associated peers, were being marked as isolated, which means that they do not have a party or prefix.

It took some time and this this prefix was not reachable across the globe. But eventually, we saw some announcements and the prefix was then, BGP rerouted the path to the prefix, over a different path on the network.

So now, yeah, you can go into a router. You can do a show IP BGP, and you can figure out that maybe the prefix is not being reachable.

But situations like that are reactive or maybe a user complaints that they're not able to access your website. And you may be looking at it from an application layer.

You don't really know what went wrong. But it's a very when you just look at BGP, you realize that, oh, you know what? My prefix is not being advertised anymore.

What went wrong? Right? And in this particular example, what happened was that there was power issue in the data center and, because of that, the routers went down and they were no longer advertising the BGP prefix for that region, which caused loss of connectivity for users in that region. But, yeah, when you get proactive data like this, you are made aware, in a proactive manner rather than looking at it in a reactive manner and then looking at the BGP tables.

The other use case I wanna talk about is is BGP black holding. Now, this is a situation where, you own a prefix.

Right? And let's say you have a slash 32 that you see on the side of, 244 slash 32. So this belongs to the to the bigger slash 24 prefix.

And let's say that this, IP is, meant to serve content to users across the globe. Now there are times when there might be a live event taking place or a global event due to which there is a surge of traffic towards this virtual IP.

When this happens, a transit, an upstream transit AS thinks that, it's a DDoS attack. It's genuine traffic, but it thinks that, oh, that's a surge in traffic.

It might be a DDoS attack. So what it does is it triggers its DDoS mitigation system.

It basically creates a null route to just that slash two and thirty two, and it sends it to a null interface, basically black holding traffic towards that virtual IP. So what happens now is, you can see the bottom the the the part on the bottom is the actual part which is going to the slash 24.

But instead of that, it is going, on the path in the top, which is in red, and it's being black hold. And users cannot, access this prefix anymore.

Now or rather cannot access this IP anymore. Now when this happens, again, you will be alerted reactively.

Users may say, like, hey. I was watching a live stream.

It's no longer available. What's going wrong? And, it may take you some time to figure out that, it is actually, a BGP hijack because another AS has now started announcing your slash 32 and sending it to a null interface.

So again, the idea of BGP monitoring is that you are alerted proactively that, hey, there is an origin AS mismatch. I see this prefix.

It, is being advertised. It's IP is being advertised, but it's advertised by AS.

It does not belong to me. So what's going on? And then you realize that, okay, we can take it up with the transit AS and tell them to stop advertising our prefix and remove the null route so that traffic is restored.

So this is, again, a popular use case that you can use for BGP. I wanted to quickly talk about BGP private POS.

This is a new feature. So, you know, external visibility on b BGP is great.

Monitoring your eBGP routes is great, but the recitations where we see, companies that have a big ASN and they have a lot of I BGP routers. And these routers are responsible for basically routing traffic within the organization.

And at times they are meant to route traffic towards third party services like Office three six five as well. Now, to ensure that this routing within your AS is happening the way that you expect it to, We can peer with your BGP private peers and sort of get a sense of what the iBGP routers are doing, how they are advertising routes to each other, and, if there's any reason for the iBGP routers not advertising routes to each other.

Again, then you can go into it further, try to understand what the underlying protocols are doing, and, try to troubleshoot further. But, at its crux, it's meant to help you understand what iBGP is functioning like, versus eBGP.

And then you get a full picture of, oh, hey. This is my eBGP route and this is how my iBGP is looking like.

So that is, one of the functionality with, with BGP private peers. I wanted to move back to network monitoring.

I think we've discussed a lot about BGP. And, again, please feel free to put your questions in the chat.

We'll be very happy to answer any questions you may have around BGP monitoring. But I wanted to move on to l four s.

So so what is l four s? L four s stands for low latency, low loss, scalable throughput. And l four s is a new approach in building networks that are basically optimized for real time performance.

So l an l four s network is meant to deliver a network with minimal latency and high throughput, and minimal packet loss as well. And this is particularly useful for online gaming or video conferencing or real time communication.

Now to achieve this, the way l four s works is whenever a congestion is experienced along a path, instead of dropping the packet, the l four s network use a smart congestion management mechanism by marking the packet as congestion experienced and sending it back to the sender. So the sender would say like, okay.

Someone along the path has told me that there is congestion on the network. I am gonna reduce my sending rate so that performance is not impacted.

So let's say a network claims to be l four s compliant. How do you verify that? How do you validate that? That's where we can use a test like traceroute ECN.

Using this test type, we basically send a traceroute probe throughout the network, the L4S compliant network. We we make sure that all routers along the path are preserving the ECN bit, which means that they are supporting l four s and they are honoring l four s and that they will let us know if there is any congestion taking place.

But if for some reason any router along the path strips off the ECN bit, which is also called as ECN bleaching, which we can see in the screenshot right here, it proves to us that this network is not honoring l four s. So any congestion signals which may happen further down the line will not be reported back.

So this is something we can take back and say like, hey. There's a router along the path.

It's stripping the ECN bit. If there's any congestion being experienced, we are not gonna be made aware of it.

Then the trace or ECN test will help you validate that. The other use case would be, from the sender itself.

We can manipulate the ECN bit and we can send traffic saying congestion is experienced. So now all the routers along the path will pass on the CE flag saying congestion is experienced till it hits the destination.

Then we'll see how the destination is reacting. If the destination is sending back an acknowledgement message and telling us that, yep, I acknowledge the fact that experience congestion has been experienced along the path.

So, you know, the sender, please reduce your congestion window. You are sending transmission window.

So yeah. So these are couple of use cases that you can use with TraceRDCN and with L4S becoming more and more, useful in real time communication, it is something that, a test like this can help you validate any L4S compliant networks.

We'd be ending the use cases by talking about SD WAN monitoring. Again, this is, something that we are seeing more and more that networks are moving to SD WAN, a software defined WAN setup.

So in an SD WAN setup, we usually have the overlay network managed by the SD WAN solution and the underlay network is managed by MPLS or DIA or broadband. The challenge here is that visibility becomes harder because, the overlay and the underlay are split.

It's it's not very easy to get a full picture of what is happening, with your overlay and your underlay as well. That's where, we can employ monitoring to make sure that we monitor our SD WAN networks.

The best way to monitor an SD WAN network is to place a monitoring agent right where the action is. Place it in your SD WAN Edge sites or at the datacenters or the hub locations.

Because these agents, they live inside your SD WAN network, any tests which are run by these agents will reflect what an actual end user is experiencing. Some use cases would be running a test from your SD WAN branch site towards the SD WAN controllers to make sure that the controls are reachable.

Do you wanna make sure your orchestration plane or the management plane is reachable? So you run a test on the branch sites towards the controllers. The other, one of the strong points of SD WAN is the fact that it can do application aware routing.

So For example, any traffic towards Salesforce or Office three sixty five should go via the local Internet breakout. But how can you test that? Right? Again, you run a test from the agent located at the SD WAN edge site towards Salesforce, either a trace route test or an application test, and you can see whether you're going via the local Internet breakout or if you're going via a different path.

Application aware routing can also ensure that performance SLAs are met, so you can even make sure that the SLAs are within whatever you have configured on the application of where routing. Another use case is side to side visibility.

You can run a test from branch side a to branch side b, from branch side a to the DC, to the hub, and all across. Basically, create a mesh and understand what your SD overlay performance is like because you're running tests from one agent to the other.

That helps you understand what the overlay links are performing like. And finally, what we spoke about earlier, the, backbone agent presence across the globe.

You can even monitor the underlay. So if you're having an issue with branch site a to branch site b, you can run a test using the global agent network and try to understand if there is any slowness or latency across the public MPLS or the DIA.

These are some use cases that we can work with when we are doing any kind of an SD band monitoring solution. We'll now move on to the way Catchpoint can monitor ISP or service provider issues in in an automated fashion.

I'll pass it back to Moiz, to take this part here. Moiz?

Moiz Khan

48:14 - 54:24

Thank you, Sheldon. Alright.

Let's share this. So one of the things, that was, like, a lot of, our clients had asked for this.

Is there a way that we can see what is happening in the Internet? Like, a weather report. Right? Can can I get an information about what's happening? Just for everything.

And that got us thinking, okay. What can we do? So that's how, Internet sonar, came into being.

And what what we are doing with Internet sonar is because of our presence that we have across the globe, we have so many agents which we can capitalize and use that for running tests across different providers, whether it's, to monitor the cloud, the DNS providers, SaaS providers, marketing tags. Also, so there are there's a list that goes on.

With that, for from a network perspective, the question was, can we also get a visibility? What is happening in the ISP world? Now as you know, Catchpoint has a good presence across the world with our backbone nodes, which is utilizing the the tier one ISPs. And that's how we are able to get this information because we are constantly monitoring the ISPs across the world.

And this helps the network team to know, okay, if they are seeing some challenges with their application outage or latency, is it related to something happening with their ISP? Now it could be something like wherein they have, ISPs, for their data centers, like a primary and a and a secondary ISP. And are they facing any challenges? And if they are facing any challenges, is it happening, in a specific region across the, country or, some ISPs have presence across the world? Is it spread? There is no easy way to identify that, and that's the problem that we have solved with this, functionality or with this feature wherein with a single view, you would be able to see what is happening in the Internet world, in the, ISP world.

Now you don't have to always keep an eye on this. This is a visibility which is provided, but everything is powered by alerts.

You can literally customize your alerts to know if something is happening. Is it related to, your application or your ISP that you are dependent on? Great.

And, whatever, Sheldon just mentioned about all the other features about TCP and, BGP. Everything is powered by alerts.

You can even pull the data using rest API, stream the data. So there are a lot of ways how the data consumption can happen.

So this is a very nice, option wherein with a single view, a single pane of glass, you have lot of visibility coming in. And this is, how it is done in a way where you are looking at, the the the ISP, the network path and the network map within the Internet sonar, and you get to know, okay, from this specific AS, or is there any challenges happening for that specific ISP? And then is it impacting your application? So especially the the the good part is that we are not only looking at this data.

What we are also doing is behind the scene, it's kind of an AI, I would say, that we're comparing this information with your test, with your applications that you're running. And if we see that, okay, there is something happening, let's say, with with a specific ISP and your application is facing some kind of a latency performance issue or outage, is there a correlation? Yes.

There is a correlation, and we highlight that. So that's that's the the the the most important aspect of it is just not looking at it, but the correlation that happens.

And it reduces, your troubleshooting time, not, speaking to a lot of people. You know exactly where the problem is, and it it just helps you, with with identifying things which sometimes it's very difficult to to get to.

And this is an example, where, this this issue happened with Cogent, wherein, there was an outage, and then, there were multiple tests, like, which were failing specifically in in a North America. And there was a switch, from Cogent to an alternate ISP which is because of the outage that happened.

There was a part change, and then lot of, test test for that, customer or across customers that start failing because they had, a heavy reliance on Cogent. And, this was immediately identified by the Internet sonar tool that we have And the the the the customers, got to know about it because of the alerts that, they had.

But then the the best part was within a a few, minutes, they were able to say, okay. It's not me.

It's not something which is, out of, like, happening across. It's specific, to Cogent.

Now the answer is, okay. Is it me or something else? No.

It is not me. It is Cogent.

It is some now you could be relying on Cogent, which means you can now go and question them. If you do not have, like, direct, ISP from Cogent, ISP Cogent is gonna is not your direct ISP provider.

You still have that, visibility that, okay. Fine.

I can do something about this by making some changes. So reduces your overall time that is taken to identify things, reduces war rooms, less, impact on the network team because the network team now has something which will tell them exactly where the problem is.

Alright. Move with this, I'm gonna pass it, back, to, Sheldon and, we'll take it from here.

Sheldon Pereira

54:24 - 56:03

Thanks, Moiz. Yeah.

We're going to be wrapping up by very quickly talking about visualizations. The idea of visualizations is that, you have all these tests running, but it's probably cumbersome to go into each test and, you know, understand what performance is like.

So that's where we can leverage visualizations. So, this view is something which, we have a customer using that monitors their pop locations across the globe.

So right away, this data has been sourced from all the tests from all the vantage points across the globe. Right away, we get to know that there is some kind of a packet loss in The UK.

There's some latency somewhere on France. But we see that availability across all the network devices or rather all the WIP locations are good.

So this is one way of visualizing data. And, finally, this is something else where we combine application and network data.

So On the top, we are basically performing heartbeat checks towards the application and that tells that the application is up and reachable. On the bottom, we are sourcing data from our trace route testing, which tells us that, you know, the application is reachable but users in some region cannot access it.

Combining the data, we can say, yep, the application is up. We see some parts of the world are able to access it, but some parts of the world aren't.

These are just two visualizations that I thought of highlighting right here, but there's a lot more you can do with looking at the data and presenting in a way that is, that gives you a view of how your network is is performing. So that's all from me.

I'll pass it back to Howard, and then we can just to close-up the webinar here. Thanks.

Thanks, everyone.

Howard Beader

56:03 - 58:19

Alright. Hey, everyone.

So that was a ton of information in a relatively short time. Feel free to jump in and ask any questions that you may have.

I'm gonna open a quick poll. So feel free, polls should be available now in the poll tab on the right.

And, you know, if you're looking for more information, if you wanna chat with somebody, you can just click that, and we'll be back in touch. But, you know, we like to, we like to talk about here at Catchpoint about what makes us unique.

And it's really about the ability to monitor what matters, and that's really the Internet stack, your customer experience, your user experience for your workforce. And so that you can monitor from where it matters for you, from where they're located.

Not just a random location in the cloud, but really where those people are. If they're in Boston, if they're in Dubai, if they're in Texas, if they're in Louisville, Kentucky, if they're in New York City, you're gonna be able to monitor from exactly where they are to really understand their path so that you can catch issues before they become incidents that impact the business.

You know, I'd like to thank you all for, spending this time with us. If there's anything we can do to help, feel free to reach out, answer the poll, reach out to any of us on LinkedIn, email us directly, whatever works.

We're happy to chat. You know, I'd like to thank, Moiz and Sheldon for the really insightful content they shared.

Thank you all so much for your time for attending and joining us today. You know, as Beader noted, you can also request a demo, using the button at the top of your screen as well.

So with that, thank you all for attending. Thanks for joining.