Subscribe to our
weekly update
Sign up to receive our latest news via a mobile-friendly weekly email
As an Engineering Manager, I don't get the chance to dig into code as much as I did when I was a developer. Catchpoint's semi-annual hackathon provided me that opportunity last month.
As an engineering team, we strive to write secure, maintainable, performant code. We drink our own champagne so we know when our user experience falters. Sometimes it is from external factors but sometimes our application code has an error and a bug surfaces. And then we need answers, and we turn to our application logs. What's happening? What's gone wrong? Who is accessing what and how is that affecting things? However, often answering those questions is reactive, because the problem has already happened. Our goal was to change that since we are a proactive team.
For context, Catchpoint’s core infrastructure is built to reliably handle tens of millions of Synthetic tests, RUM page views, and Endpoint experience results every day. Catchpoint Symphony leverages React and ASP.NET Core to provide an improved user experience from configuration to analysis. Our unparalleled global reach provides challenges in troubleshooting and debugging production issues. We rely on our application logging across applications and microservices to reduce MTTR.
My reIMAGINE hackathon team set out to enhance our backend logging capabilities. Our goal was to cover more services, knowing that the added information would give us clearer answers to the questions noted above. In addition, we could use that data to build more dashboards with more insights for our users!
In order to achieve our goal, we had to keep a few things in mind:
We started by setting up watchers on our log directories and sending web server logs, OS level event logs, and general server information to our log store. Parsing our application logs proved to be troublesome and noisy.
Our first dashboard was simple and had only four widgets:
This initial dashboard answered basic questions about our users’ experiences. It also proved that we can configure widgets across multiple data sources using a single, unified dashboard. Using information from multiple data sources will provide further insights across our services and help us track down issues that require a holistic view of our system.
The one-week hackathon gave us a great start to what will be a long running project that we can continue to build on. As a result of the week spent on this project, we now have less noise in our logs and metrics from webserver, databases, and redis instances.
Our expanded logs have already helped us figure out ways to make our data processing more efficient, and we have plans for more improvements to come. For example, we plan to enhance our internal application performance data, gather more client-side information, and correlate web logs with Orchestra logs (our in-house NoSQL engine). In addition, tracking services across releases during testing will provide actionable insights, so we can quickly address potential challenges before they get to production.
All of this is the result of just one week and a full dedication to our users to provide the best experience.
Want to find out how to join our team? We are hiring! Check out our open positions in Engineering and this insightful blog from our VP of Engineering, which gives a sneak peek into what the interview process at Catchpoint might hold.