Reduced MTTR Drives Business Success

Xandr is an advertising technology company that powers the buying and selling of digital advertising. Xandr connects buyers and sellers across the Internet on five different continents, transacting more than 10 billion daily ad impressions per day and generating more than 150 terabytes of data at a peak rate of 5 million transactions per second.

Serving online ads on a global scale requires strict performance, availability, and reliability standards, and failure to meet them will result in loss of revenue via service level agreement penalties, potential client churn, and brand diminishment. Faced with this critical business need, Xandr partnered with Catchpoint to:

Collect both active observability and real user monitoring data for a complete understanding of the end-user experience.
Guarantee timely ad delivery and optimal global performance 24 hours a day.
Measure network performance around the world to understand its effect on their delivery.
Isolate performance issues for both their first-party applications and third-party networks.
Provide accurate and trustworthy third-party validation for their content delivery services.

Employees:

1,800

Revenue:

$378 million

Headquarters:

New York, NY

Industry:

SaaS

Problem

The sheer amount of data that is processed on Xandr’s infrastructure creates serious performance challenges. Getting real-time insight into the actual client experience is the key to detecting and overcoming these performance obstacles.

Given the complexity of modern IT architecture in conjunction with Xandr’s global footprint, Xandr needed a digital experience monitoring platform that could quickly identify and isolate issues between the internet and their applications. Doing so required real-time performance data from both the network and application levels, as well as context for Internet performance on a global level.

Additionally, Xandr’s global service offering used content delivery networks (CDNs) around the world as proxy locations from which to serve ads. To evaluate these third-party vendors, the company need a trusted and reliable observability tool to provide accurate digital intelligence on the different regional CDNs and their protocols.

Solution

Since first deploying Catchpoint’s observability solution back in 2010 with 10-15 tests, Xandr now uses Catchpoint as its primary observability solution across all core products. Xandr now runs more than 700 regularly scheduled tests, and more than 500 employees use Catchpoint tools.

Xandr's global user base means that Catchpoint’s industry-leading node coverage is a critical differentiator for them. The ability to replicate and measure the actual end user experience regardless of geography allows Xandr to detect performance issues – whether they’re rooted in the application, a third party, or at the network level – before users are impacted. Catchpoint’s solutions provide critical value in several ways:

‍Catchpoint’s waterfall chart. This helps the company understand the performance of every page, as well as the behavior of every object on the page and how each one affects other objects. To do this, Xandr relies on many different test types offered within Catchpoint’s active observability solution to get granular data about a specific IT functionality. These tests include browser emulation, API, transaction, DNS, traceroute, and ping.
‍Enterprise observer nodes. Xandr has deployed these nodes in specific data centers behind the firewall, allowing IT to view and identify issues within the network and understand the application performance without internet noise muddying the results.
‍Real user measurement (RUM). The active observability tests are complemented by Catchpoint’s RUM tool, which captures and aggregates real-time user experience data. This helps IT visualize what actual client requests look like as they hit the Xandr platform from different parts of the world. Additionally, Catchpoint’s RUM solution allows Xandr to aggregate data based on custom metrics, which is something that other tools could not provide. The RUM solution also includes the “time to first interaction” metric. This is incredibly important because whenever an end user browses any of Xandr's apps, it calls on their API to download all the data. This needs to be monitored so that Xandr can make sure that all the data is loading properly and that there’s visual completion of the page once all the data from the request and sub requests has been loaded.
‍Speeding data analytics. This has been proven instrumental to increasing usage within the company, onboarding a new user is very fast and simple. Once they’re up to speed, Catchpoint’s powerful analytics tool allows Xandr employees to slice and dice the data however they’d like to identify the root cause of the problem. From there, sharing the information both internally and externally is easy, fast, and effective.
‍Advanced features. Many of Catchpoint’s advanced features have also proven invaluable. The available alerting options ping IT at different steps of the transactions and on different scenarios within the same test. Pertinent alerts are sent to different teams based on specific conditions. And by using the ‘bring your own visualization’ feature, Xandr can use custom metrics and visualizations right within the Catchpoint portal.

Finally, the critical Smartboard feature has played a significant role in reducing Xandr's mean time to resolution (MTTR). Smartboard gives the company a single place to look at data and do root cause analysis without clicking on several different links or exiting the portal. This allows them to quickly isolate performance issues based on factors such as geography, ISP, and application, and accelerate root cause discovery.

Results

Improving their MTTR is vital to a company like Xandr because it correlates directly to revenue growth. There are number of factors that have gone into the MTTR reduction at Xandr. One is the ability to identify the root cause of issues faster than before. Smartboard plays a huge role in that, as do Catchpoint’s enterprise observer nodes, which allow the company to identify network issues in under five minutes.

Another factor is the advanced alerting capability, which accelerates troubleshooting by routing verified reliable data to the correct team. Rather than first spending time verifying the alert and trying to isolate it between first and third-party elements, the right engineers now get the alert in near real time, along with macros like the waterfall and scatterplot charts. This allows them to quickly pinpoint the nature of the issue so that they can begin fixing it right away.