Our business depends on the trust between us and our publisher and advertiser partners. We want to be proactive and fast about resolving issues, and monitoring with Catchpoint is one of the key ways we can do that.
Outbrain previously used a third party for performance monitoring but it wasn’t sufficient. As Nir Kriss, head of production engineering at Outbrain, says, “The vendor wasn’t evolving to support new browsers and provide more sophisticated functionality.” After running Catchpoint in parallel with other service monitoring frameworks for a while, Outbrain completely migrated to Catchpoint. Outbrain leverages Catchpoint in a number of ways, including the following:
Validate platform performance. Catchpoint complements Outbrain’s use of Pingdom. The company uses Pingdom to run basic tests at the network level, and Catchpoint to monitor at the application layer. “With Catchpoint, we can confirm our services are up and running and APIs are connected. It also gives us a history of our uptime,” Kobrinsky says.
While existing partners know that Outbrain is resilient, the company can share its uptime history when striking new partnerships. “We can confidently say our service will never be totally down but specific components might experience a performance degradation. We can use Catchpoint to verify that even if something is not working as expected, our platform still performs as needed,” Kobrinsky says.
Empower developers. Outbrain’s engineers use Catchpoint to pinpoint problems, making it easier to figure out the root cause of issues. To help understand whether a problem is associated with a domain name system (DNS) server, content delivery network (CDN )provider, agent proxy, an internal router in the Kubernetes cluster, or an application related to R&D, Outbrain created four Catchpoint tests. These allow it to monitor performance from its data centers to site visitors. A graph for each test is displayed on a wiki page that Outbrain created for its on-call engineers. This makes it easy for them to see issues at a glance and understand next steps to take, such as escalating to the relevant development team. “The on-call team includes people from different teams, not all of whom are familiar with our infrastructure. This makes it easy even for them to zero in on issues within a minute,” says Kriss. Going forward, Outbrain will map dependencies between different test results to automatically route issues to the correct team based on thresholds, without using the wiki page.
Monitor APIs. Outbrain’s APIs behave differently depending on the business rules of the publisher’s site. The APIs might power a layout that looks like a carousel on one site or page and a feed or text recommendations with thumbnails on another. As Outbrain grows, its APIs will become more complex to go beyond a classic request-response model to be increasingly feed-like. Even then, it wants to make sure it can holistically monitor its services and application layer. “We want to make sure that we can monitor our variety of APIs and can do that with Catchpoint,” says Kobrinsky.
Integrate with key tools. Catchpoint has become an essential tool in Outbrain’s digital experience monitoring efforts, empowering the company to ensure its platform works as intended. In addition, Outbrain sees more value from other monitoring tools by integrating them Catchpoint. With Catchpoint integrated with Slack, Outbrain’s on-call teams are notified of pressing issues requiring their attention.
Meanwhile, seamless integration between PagerDuty and Catchpoint means Outbrain’s developer teams can easily pinpoint potential problems and confidently escalate to the appropriate team. “I appreciate the context from Catchpoint. With a better understanding of the reason for a problem, the on-call person doesn’t need to wake someone up in the middle of the night unnecessarily to try to troubleshoot,” Kriss explains.
According to Kobrinsky, uptime is expected in Outbrain’s industry. The latest data from an industry DevOps site reliability engineering (SRE) survey found that uptime/availability and response time/performance were tracked by 85 percent and 77 percent of respondents, respectively. Since the company is measured by strict KPIs and needs to deliver revenues and good user experience, 100% uptime is imperative. Fortunately, Outbrain is quite good at pinpointing problems quickly. “We almost take this capability for granted because of Catchpoint,” he says.
To that end, Outbrain communicates potential and known problems to relevant stakeholders as fast as possible. “Our business depends on the trust between us and our publisher and advertiser partners. We want to be proactive and fast about resolving issues, and monitoring with Catchpoint is one of the key ways we can do that,” continues Kobrinsky.
In fact, Outbrain appreciates its strong working relationship with Catchpoint. “Catchpoint’s support team is incredibly responsive. In the five-plus years we’ve been a Catchpoint customer, the support team has never failed us and that’s important in a partnership like this,” concludes Kriss.