For the past four years, Catchpoint and various partners have been running a yearly SRE Survey. This year, Blameless is excited to partner with Catchpoint for the fifth annual survey. We want to hear from you if you are in a DevOps or SRE role or even if you work on reliability with some other title or role. There are tremendous, valuable learnings when we listen closely to practitioners. (Hint - you can fill out the survey at https://www.catchpoint.com/sre-survey)
We believe deeply in the importance of reliability for all digital services and the people who are invested in delivering that reliability are those who practice resilience engineering principles. When the survey started, these were usually people with the title of SRE (or at some companies, PE) but as resilience engineering and reliability have become more widely adopted, the practices have spread beyond the scope of these specific titles. To a certain extent, these practices have never been exclusively within the domain of specific titles since there have always been people who were passionate about software reliability, long before Google coined the SRE moniker (circa 2003).
For the past 5-7 years, and especially over the last two years as the pandemic dramatically accelerated the importance and speed of digital transformation, SRE and related roles have consistently been amongst LinkedIn’s “Most Popular Jobs” listings. Companies have been learning just how important reliability is to their online reputation, even in cases where they have not made this a priority (example). This survey is a way to look at the lived experience of these practitioners and understand the state of SRE as a whole. Since resilience engineering is based on understanding the real “work as done”, this survey is a way to keep in touch with the wider profession.
You can take the survey at https://www.catchpoint.com/sre-survey - it should take less than 10 minutes of your time; and please re-share the link widely amongst your colleagues, friends and community. The greater the number of participants, the richer the analysis will be. We look forward to sharing our findings from the survey in the SRE Report 2022, to be released later this year.
Highlights from past SRE reports
2018 - The inaugural SRE Report focused on building a profile of what being an SRE entails and what someone looking to become an SRE expert could expect from the role. The report found that there was no one “typical” background or skill set for resilience engineers. Nonetheless, 64% of survey respondents had previously held a role as a SysAdmin, a surprising stat perhaps given that the majority of SREs report into the engineering department and not operations.
The Report also found that while “the majority of SREs felt their job directly contributes to one of their organization’s core business values, they did not feel that the role was well understood and/or respected throughout the organization.”
2019 - Continuing from the previous year, this year’s SRE Report found that Site Reliability Engineering was still emerging as a practice. However, the survey and report concentrated on incident management, asking “What impact do incidents have on organizations and the people responding to them?"
While it was clear that organizations are focused on building resilient systems and recovering quickly, the question was raised of whether this focus extended to employee resilience and recovery from post-incident stress? Sadly, the survey found that the impact of incident management on employees is significant, and highly stressful, moreover, most companies do not effectively ameliorate these effects.
2020 - This survey was performed in two parts, January (pre-pandemic lockdown) and then a follow up in June (post-lockdown). In contrast to the 2018 findings (when only 20% were, or expected to be remote), over half the respondents, after being forced into a remote arrangement, did not see a need to return to the office in order to carry out their duties.
Other key findings centered on the importance of designing observable systems to prevent service disruptions instead of purely reacting to outages and the need to continuously work to overcome the entropy (or quicksand) of reactive operational work.
2021 - With the growth of third party reliance and internal platform engineering teams to enhance developer productivity, allocating reliability engineering cycles to make these platforms reliable in themselves was shown to be an important trend. Also, connecting the resilience work that SRE teams do to direct business-valued capabilities was highlighted as a key way to demonstrate the value of reliability efforts.
2022 - This is your chance to contribute! This year’s questions range from wanting to know how SREs are spending their time to the impact of the great resignation on SRE teams and efforts.
Contribute to the industry’s longest-running SRE report
Not only will you be actively contributing to the industry’s longest-running SRE Report, but you’ll also help in providing actionable solutions for organizations worldwide. For every survey taken, Catchpoint and Blameless will donate $5 to the International Red Cross and $5 to Girls Who Code.
All contributions will be aggregated and kept confidential.
Participate here: https://www.catchpoint.com/sre-survey