Blog Post

Smartsheet’s SRE Team Takes Center Stage as It Hits the 8M User Mark

Smartsheet’s SRE Team Takes Center Stage As It Hits The 8M User Mark

Smartsheet was founded in 2005 with the mission of helping companies simplify and streamline how work is managed. Over three quarters of the Fortune 500 rely on Smartsheet. Through its enterprise platform for dynamic work, the platform aligns people and technology to help businesses move faster, drive innovation, and achieve more.

Catchpoint spoke with Jason Kehr, an Engineering Manager (formerly Senior Site Reliability Engineer, or SRE) at Smartsheet to understand how he and his team help to foster the platform’s performance culture and maintain its reliability and availability. This work has been more important than ever over the last year, with the world relying on digital services to communicate and get work done at a distance.

Jason Kehr, Engineering Manager, Smartsheet

‍_“Synthetic Monitoring is super critical because our platform isn't just the internal metrics that we see. Without it, we would not have a clear understanding of what our customers are doing and what they're experiencing. And a lot of these companies use Smartsheet to run their business.”_ Jason Kehr, Engineering Manager

7 Insights Into How Smartsheet's SRE Team Use Catchpoint

Smartsheet relies on Catchpoint for Synthetic Monitoring for its frontend and API Monitoring to get in front of performance issues before its customers. Here are seven insights you can take away from how Jason and his team of site reliability engineers use Catchpoint to better understand the experience of its global customer base.

Insight 1: Identify Your DEM Objective

Jason told us that initially at Smartsheet, there wasn’t an observability team, “just a group of site reliability engineers all sitting in a room with a bunch of monitors looking at metrics all day long,

responding to alerts.” It was quickly determined they needed such a team. Once formed, the observability team evaluated the current monitoring tools in use and “landed on a much smaller list to focus on.”

At this critical moment, the SRE and observability teams worked together to define their digital experience monitoring (DEM) and observability objectives. They built a set of best practices, deciding which tool to use for which purpose and what was the benefits of each one. They determined that it was critical to “have good inside monitoring and, also, outside-in monitoring. That's where we use Catchpoint for the external.”

Catchpoint’s digital experience monitoring platform gives Smartsheet the needed insights across its entire service delivery chain to detect, identify, and resolve issues quickly.

Insight 2: Understand the Customer Workflow On Your Platform

When we asked Jason what his advice would be for folks in his shoes considering purchasing Catchpoint, his top-of-mind response was to identify all “your external endpoints,” make sure you have good health checks in place, and critically, ensure you understand the customer workflow on your platform.

By understanding the customer workflow on its platform, Smartsheet can now:

  • Benefit from a clear understanding of the complete customer journey.
  • Have critical insight into all their external endpoints.
  • Deploy the right monitoring tests to meet the right needs.

300%

Year over year, growth at Smartsheet has been 300% over the past six months with over 7 million automated actions processed in the last 30 days.

Source: Smartsheet Inc. (SMAR) Q4 2021 Earnings Call Transcript

Insight 3: Build Your Tests

After Smartsheet gained a clear understanding of its customer workflow, the next step was to build its tests. Jason advises other SRE and observability teams to “put some thought into the hierarchy of those tests and products,” breaking them down by service team or product category. “The more things you can categorize, the less work you’ll be doing on managing tests later,” he counsels.

An advanced synthetic monitoring tool will provide you with an extensive list of different types of monitoring options. At Smartsheet, the SRE team has used the different synthetic test types partly to duplicate transactions their customers are performing. This helps them to gain “a true understanding of how much time it’s taking to get from point A to point B” in any given web transaction.

“Catchpoint has made my job a lot easier in a number of ways,” Jason shares. “When you need a particular test for a particular scenario that can occur - either because an ISP may be having an issue or a cloud provider is having an issue - we can create a test, disable it, and enable it when we need it. That’s been an immense benefit.”

80%

80% of performance and availability issues occur outside your firewall. This is why selecting the right monitoring test types is so important.
Catchpoint offers over 20 monitoring types, including:
  • Web
  • Streaming
  • DNS
  • SSL
  • API
  • FTP
  • Custom tests

Each test type is designed to help you understand how each component of your service delivery chain is impacting your end users. We believe in testing everything from everywhere because in today’s global digital landscape, your customers and employees are located around the world.

Mark Mader, President and CEO of Smartsheet

“The world is changing rapidly, and to thrive in the new hybrid work environment, companies must go beyond the pursuit of productivity gains... That’s why progressive leaders are improving corporate agility and unlocking value from more employees, realizing the benefits of what a dynamic platform for project execution, automation, and customer engagement can do for the enterprise.” Mark Mader, President and CEO of Smartsheet

Insight 4: Utilize the Catchpoint Support Team

When Jason and his team at Smartsheet got started with Catchpoint, they met with our award-winning support team on a regular basis. “Early on,” he says, “we met more frequently because our needs were greater.”

Catchpoint’s sales and account team helped Smartsheet bridge coverage before their points expired before the end of their contract, ensuring they could still benefit from the monitoring they needed. Jason also says, “Early on, I spent a lot of time with the solutions engineers getting our tests to be meaningful.” This involved not only making sure they were testing “the right things,” but “making them [the tests] not only economical, but meaningful.”

Smartsheet’s SREs continue to frequently work with our account execs, sales, and engineering teams to ensure they have consistent, ongoing support as they need it.

‍_“The Catchpoint team has been critical in providing us support in introducing, establishing and getting to the point we are with Catchpoint.”_ Jason Kehr, Engineering Manager

Insight 5: Set Up Alerts and Integrations

From the start, the SRE team set up their alerts by priority to ensure they were always ahead of issues. In fact, Jason shares, “A lot of times before our customers are aware there's an issue, we’ve been able to get in front of it.”

Jason has leveraged the support of Catchpoint’s engineering team to refine their alerting strategy, in addition to creating and modifying tests “so they’re a little bit more effective.”

When Smartsheet got started with Catchpoint, they were simply using email alerting. They soon began to leverage our integrations, particularly PagerDuty and Slack. Catchpoint allows you to integrate with all the DevOps tools you already use, from Splunk to ServiceNow, and many more.

Source: Smartsheet

Q4 2020 saw the community of Smartsheet users exceed 8 million, and the average domain annualized contract value saw growth of 40% year over year.

In this brief video [1:18], you’ll understand how Smartsheet partners with Catchpoint support and how it uses the alerting and troubleshooting tools to get in front of an issue before its customers do.

Insight 6: Utilize API Monitoring For Full Visibility

Smartsheet uses not only Catchpoint’s availability monitoring capabilities, but also to monitor the API component of its platform. This is primarily used by Smartsheet’s mobile and tablet users, making it equally critical that the platform should be performing as expected.

In one instance, the observability and SRE teams were experiencing a blind spot in terms of how they were using their tests for alerting. “Initially, it was just availability,” says Jason, “we thought, hey if we’re getting four nines, fantastic.”

They quickly found out, however, there was a condition in which the API appeared to be available, yet it wasn’t actually responding within an available timeframe. The team then modified the alerting so that they would be notified any time it exceeded a predefined threshold.

“API Monitoring was critical in us staying in front of issues that maybe internal monitoring wasn't capturing,” says Jason, and “making sure that we're providing the customers a good experience.”

In this brief video [2:02], you’ll gain insight into how Smartsheet uses Catchpoint’s API Monitoring to stay ahead of issues that its internal monitoring wasn’t capturing.

Insight 7: Remember DEM Is a Continuous Journey

Remember digital experience monitoring is a journey, not a destination! Jason advises SRE teams to build growth and experimentation into their DEM budget. “I can guarantee that once you start using the platform, you’ll be wanting to make more tests”, he advises.

In terms of growth within the organization, Jason has found it “pretty easy to get other teams to adopt Catchpoint for their external monitoring.” He was excited to be able to “show an immediate benefit” to the service teams (in Smartsheet’s case, that was the developers developing the code for the platform). They recognized the “visibility gained into external performance, from simple to complex tests that allow them to see how our platform is performing.”

Further Resources

To find out more about how Smartsheet’s SRE team is taking center stage, read the full story here.

Web Experience
Endpoint Experience
DNS
SaaS Applications
API
SRE
DevOps
End User Services
SLA Management
Digital Employee Experience
Media and Entertainment
SaaS
Enterprise

You might also like

Blog post

How To – Monitor Split Tunnel Traffic with Catchpoint

Blog post

Lessons From An Internet Outage - Issues Caused By Let’s Encrypt DST Root CA X3 Expiration

Blog post

SRE Report 2023: Findings From the Field — Toil

Blog post

Why you Need WiFi Observability in the Era of Work From Anywhere