Last month, Catchpointers from around the world met in Austin, Texas for our first Field Kick Off since 2019. Our sales, marketing, L&D, customer and value/solutions engineering teams, along with representatives from product and HR, came together to celebrate successes from the last year and look ahead to 2023, a year our CEO, Mehdi Daoudi dubbed “the year of resilience.” It was an awesome few days, spending time together in person in my hometown - a mecca of music, food, and tech, while hearing and learning from our GTM leaders, industry-renowned speakers, cherished customers, and one another.
I joined Catchpoint six months ago. This was my first opportunity to meet some members of my team in person. The conversations you can have over a meal and a glass of wine are so important – getting to know them at a personal level, building personal connections, and sharing informal conversations. We are now a very geographically distributed team, it’s not easy to have everyone in the same room, but it is a worthwhile investment.
Reasons for optimism
It was clear from the outset that despite the challenges encountered by the wider world over the last couple of years and the lack of certainty in the economy in the tech industry in particular in recent months, there are many reasons for optimism. Mehdi kicked off our FKO looking ahead to 2023 as a journey to the top. Entering the hall from behind us, a mountaineer’s pack on his back, hard hat, and a walking staff at his side, he was ready to motivate and inspire.
Back in 2010, Mehdi reminded us, the company had only 12 monitoring nodes. Today, at over 2400 vantage points, we have the largest number in the industry and thanks to the continuous efforts of our determined operations team, we are continually growing our – across the Internet’s backbone, cloud, last mile, wireless, and Edge, as well as offering the ability to install Enterprise Nodes within our customer’s private networks. By monitoring what matters where it matters and catching issues across the Internet Stack before they can impact the business of our customers, Catchpoint is instrumental in ensuring a resilient Internet for many of the world’s leading companies.
One of our customers, who works for a leading PC and server manufacturer, talked about the importance of a global vantage point relative to monitoring from the cloud: “We want to give the best experience to the customer [wherever they are], not to the switch or to the cloud”
“We used Catchpoint’s real-time measurements to pinpoint and resolve Google Public DNS latency. Instead of a long process, we were able to turn around the problem in just minutes, when it ordinarily could have taken close to an hour.” Matthew White, Reliability Engineering Manager, Google
2023: the year of resilience
We’re seeing a corporate focus on resilience echoed in the world around us. McKinsey, who defines resilience as “the ability to not only recover quickly from a crisis but to bounce back better – and even thrive”, sees that leaders around the world are “thinking more than ever about building resilience.” At FKO, the theme surfaced repeatedly, not least from Adrian Cockcroft, technology strategy advisor to Catchpoint, Partner at OrionX.net and formerly cloud leader at AWS, Netflix and others, who we were fortunate enough to have join us to discuss continuous resilience.
A move towards continuous resilience
Initially, Adrian told us, resilience in engineering terms was largely about disaster recovery. Most people didn’t test as part of this, but today, thanks to a combination of cloud computing and chaos engineering, we are moving towards continuous resilience, which requires tooling that continuously looks at your systems, and testing your failover process is an important part of this. We probably need to start thinking about CI/CD + CT so that continuous testing is not only a discipline to ensure users are experiencing the app as expected, not only during development, but also continuously, and proactively in production.
Continuous resilience means that as you release from your dev pipeline, you test. Does your system (and your team) fail and recover in the ways you expect it to? You need to understand from the outset what will happen if a service slows down or goes down. This requires the right kind of observability to detect an issue when one happens, to pinpoint what the issue is, and to understand both what is and isn’t working. Further, you need to develop a mitigation plan, which you have practiced. Failover exercises need to be a regular process, which your team cares about undertaking continuously to ensure it will work when needed, not just once a year because a banking client has made you do it - to fulfil regulation requirements.
At Catchpoint, Adrian pointed out, we deliver resilience with our Internet Performance Monitoring (IPM) platform because of our unparalleled ability to look across all the dependencies within the Internet Stack of our customers. “You benchmark what normal looks like, and you reveal what’s happening when a cloud vendor or SaaS provider is having glitches or latency issues…”, said Adrian, “to help your customers determine how to route around such problems.” If you are building a service intended to stay resilient, this is essential. DNS will kill you even if everything else within your Internet Stack is resilient. How do you find out if it is? There are a lot of possibilities for failure. You need IPM to watch it, continuously, from every location where your customers are.
Sometimes, the only way for companies to realize this, however, Adrian shared, is to experience the pain directly. When he worked at Netflix, back in 2008, there was an outage that lasted for three full days. It was ultimately found to have been caused by a SAN virtualization layer that randomly started to corrupt storage. Coming out of it, management started to ask, do we know how to run resilient infrastructure? It took the outage for these questions to become top of mind. Netflix moved to cloud shortly afterwards in order to be able to scale and in the process, help develop the concept of Chaos Engineering - with a team explicitly set up to break things to see what happens when things do wrong and to ensure the Netflix system had resilience to absorb failures when needed. Similarly, said Adrian, you don’t care about security until you’ve experienced your first big breach. Companies that have experienced public downtime will always be more attuned to the criticality of resilience.
A resilient business is required to win
Ray Wang of Constellation Research also spoke on the same theme. His first slide was alarming: 61% of the Fortune 500 firms in 2000 are no longer with us. He then zeroed in on the success story of Domino’s Pizza who, despite selling what he stated as “the worst tasting pizza in the world”, saw its stocks go from $3 to $350 between 2000-2019. Why? Because “they got digitization down better than everybody else.” They created “better experience monitoring than anyone else in the pizza business”, which innovatively allowed customers to understand what was going on at every step of their pizza preparation, cooking, and delivery process.
In 2023, said Ray, every business faces a confluence of crises: inflation, invasion, infection, interest rates, and inventory challenges. However, it is worth remembering, that the digital giants (among them, Facebook, Google, Netflix, Amazon, Tesla, Microsoft, and Apple) have quadrupled since 2017. What makes them continue to be successful? “Because they understand how to manage their experiences.” And despite the many recent layoffs and impending threat of recession, tech budgets are actually going up. Why? When Constellation Research interviewed buyers in the fall of 2022, they found that businesses are continuing to bet on software. Why? (i) Because they don’t have enough skilled people, (ii) Therefore they need to rely on smart software and automation to improve their operations.
“You, at Catchpoint, are at the heart of these digital businesses”, said Ray. “Why? Because your ability to give them insights on every aspect of their business from operations to security to what’s happening inside an eCommerce transaction to what’s happening with their performance or uptime, it’s all sitting here. And the most expensive thing for them is downtime.”
“Because of Catchpoint, we haven’t had a single incident in the past year and a half that we didn’t discover in less than five minutes. Seeing a drop from two hours to five minutes is a pretty big deal for us.” David Gonzalez, Director of Technical Operations, Qualtrics
Journey to the Top
In my colleague, CRO Chip Salyards’ presentation, he reminded the room that Gartner cites the average cost of downtime at $5,600 per minute (my colleague Mark Towler has pointed out this equates to $6700 in today’s economy). The cost to business is obvious and the challenges facing them in achieving minimal downtime are growing daily. The Internet is growing exponentially in complexity, and this leads to heightened fragility and greater risk. Businesses have long been turning to APM solutions to ensure they monitor their internal systems and prevent downtime through a combination of logs, code tracing, and metrics. Today as the demand for fast, reliable user experience becomes a product in itself, which must be delivered across the global Internet to both distributed employees and customers, the need for IPM has never been more resounding.
As CMO of Catchpoint at this exciting time in its fourteen-year history, I am proud to work at arguably, what is the first IPM solution on the market. Without doubt, we offer the most advanced IPM solution today - with the deepest and broadest visibility and the most flexible, state of the art analysis and diagnostics engine.
We have an exciting year ahead of us. Our Field Kickoff was the perfect way to align and excite the team to make the best of the incredible opportunity to help our customers achieve Internet Resilience Our journey ahead is certainly to the top!
Get a customized free trial of Catchpoint for 14 days and receive dedicated support across your trial.