This month featured our second live Ask Me Anything (AMA) event. After the success of our first AMA just a few months back, we decided to turn it up a notch by co-sponsoring the event with O’Reilly Media and featuring another expert lineup of panelists.
This month’s topic was DevOps vs. SRE; there seemed to be a lot of confusion relating to the difference between the two roles, expectations and best practices, and how to fill each position to build the ultimate team of IT pros, so we decided to take on the task of clearing the air and finally putting some rumors to bed. Our panel featured Chef CTO Adam Jacob, Google SRE Manager Liz Fong-Jones, and Charity Majors, Engineer and Co-founder of Honeycomb.
Andrew (Catchpoint Performance Engineer):
This is going to be a fairly interactive session. I’ll usually call on someone to answer the questions and some of the panelists might chime in. We’ll start with Charity. The first one is really going to be talking about defining SRE and DevOps.
Charity, are the distinctions between SRE and DevOps and operations just media-made? Are there valid differences between the two? What are they and why do they exist at all?
I thought it was hilarious that a solid quarter, two-third of the questions that anyone asked were a version on “What the hell is this?” “Define this.” “Is there a difference?” “Am I doing it right?” “Am I doing it wrong?” “What does the job need?” “Is this thing a part of DevOps?” “Is this thing a part of SRE?” The answer is yes to every option that you listed. There are specific heritages. Can you make that a plural? I’m not sure. There are different lines of … SRE was very much, Google started having this conversation ten plus years ago about how do we scale the human side of our software. Around the same time, Adam from Chef and Etsy and Jez Humble and a bunch of people were having this conversation about similar, but slightly different, Google has a very specific set of problems in some ways. DevOps is more open source and crowdsourced and “bottom-up” is the way Kellan from Etsy was describing it to me.
We can’t be operators anymore. We have to scale ourselves technically. We have to learn new skill sets. Practice in this field was taking a very heavy toll on the people, on the humans who are trying to run these systems. We’re all familiar with the signs. It’s a classic thing right now, the burnt-out people who are caring for systems. People started reaching the age of thirty and realized this just wasn’t going to work. Yes, I could ramble about this for a long time. I think that yes, the people who spend a lot of time talking about how these things are different or defined, would say that DevOps is about empathy, it’s about practicing like adult methodology. The people who, like Ben Treynor and all the amazing people at Google, would say that SRE is about software engineering for humans. Liz, maybe you want to give a better definition of how Google would define this.
Sure. Basically, our perspective is that DevOps is, in class-inheritance phrasing, that DevOps is an interface that says, “Here are some principles, here are some things that you should consider doing.” There are many different concrete implementations of the DevOps philosophy. Some of them were independently invented, but we also agree on very similar philosophies within the DevOps organization at Etsy, within the SRE organization at Google, the production engineering organization at Facebook, we kind of all agree on these principles and then it’s a matter of what we’ve chosen to do to implement those principles and what additional things we’ve tacked on. If you’ve deciding to spin up a DevOps or a SRE organization, look and see, “What traits do I want my organization to have? Who can I borrow from?” Then you go and create your own implementation of it. It’s very much a thing where we all inherit ideas from each other.
You also see this thing happening which drives me [crazy], where people who are trying to hire, they just use either of these as shorthand for ops engineers who can code, which should be basically everybody now. They think that by naming their team a thing, they get those qualities and it does not work that way.
You’re right, they don’t get those qualities. I flipped on this a couple…
I remember the keynote where you publicly flipped on this!
Maybe six months ago. I’m a systems administrator. I feel like I’ve been one my whole career, I feel like I still am. There are people who would argue with that. I don’t know that I do systems administrator work all the time. I don’t know that I needed to be told that I was a DevOps engineer or an SRE in order to feel like I was great in what I do. I’m a systems administrator, I’m great at it. That developed over time and I feel good about it.
The thing about DevOps that happened that lost all control, and maybe to a lesser extent SRE, was that because it was this set of principles that we loosely agreed on and then everybody was willing to run off and build their own concrete implementation, there’s a marker that you’re saying when you use that word. If you’re using it with good intentions, there’s marker that you’re saying that what you want are people with that cultural background or that you want with that set of principles and calling it that is as good as anything else.
I don’t know that it’d be better if you said, “I want you to join DevOps organization as a systems administrator,” or just saying, “I want you to be a DevOps engineer.” Whatever, man, just be a DevOps engineer, call it a day. I wasn’t a DevOps engineer, so when you call me one, I’m like, “I’m not sure that’s what I am.” If that’s what you need to call yourself, call yourself that.
It’s really interesting, the idea of whatever DevOps or SRE are names of organizational styles or whether they’re people’s job titles. That’s something that people don’t really make a good distinction between. Another one that Tom Limoncelli gave a really great talk on last night at the New York Tech Talks New York Tech Talks I organize is basically on, “Is DevOps or SRE about the technologies you’re using?” Is it about Get and Docker or is it about the philosophies behind it? [People] fixates on the technologies rather than on the principles. If we spend some time unpacking the principles, we realize that we’re very much on the same page regardless of what tools we’re using or what job titles we have. It’s the organizational model and the way we think about the work that we do.
Absolutely. I have a friend, actually, who is hiring for a startup and he has just posted the same exact job title with both DevOps engineer and SRE and just watched over time to see. DevOps is winning by ten or twenty percent, but it’s just title doesn’t matter, and you also don’t make your team magically have these qualities just by naming that thing. Focusing on the name is wrong.
I think this transitions to our next question perfectly for Liz. We touched on something there. It’s a culture. It’s the idea of how the company is run just as much as it is actually the tools and the infrastructures in place. Liz, based on what you’ve seen, what kind of organizational layouts or reliability and performance goals, how does SRE and DevOps actually fit in?
Sure. There are a large number of ways to do it correctly. There’s not one true way model. It depends a lot on your own organization but I think one of the key elements that’s essential to most structures is the engineers you have working in SRE or DevOps need to be empowered equally with people that are doing product development software engineering. The problems that we are trying solve those with the founding of SRE and the founding of DevOps were that you have this model of developers throw stuff over the wall, tell Ops go make it work. That was just a really [bad] situation for all the people involved.
The idea that people need to be equally valued, people need to have many of the same skill sets. Your product development software engineers need to understand how you do operations. Your operations engineers of whatever form you choose need to be able to write software and write automation. It’s a matter of making sure that you have the right mixture of people and you have a respectful culture, where people can have disagreements and can resolve them as equals, rather than saying “I am hiring you. You are going to do this.”
In terms of answering the concrete question of how you set up one of these organizations, there are many different models. The one that is most used frequently at Google is the model that you have a product engineering organization and you have a number of SRE teams, each of which is 12 people split between the two geographic locations. With that being said, Google is a large company, we have multiple international sites, this is something that we can do that not necessarily every other organization can do.
I know that, for instance, in Facebook and many other places they will embed one or two production engineers or DevOps engineers in each product engineering team with the idea that they will let their ideas assimilate into the and pervade the entire culture of the team. The downside of that of course is that those engineers feel a little bit more isolated and not necessarily a part of the “AY” organization, but there’s also less friction. You have the identity of “I work on this team” rather than “I’m the SRE team for this product.”