AWS Cloud Enterprise Strategy Blog
Enterprise DevOps: Why You Should Run What You Build
“You build it, you run it” -Werner Vogels
It’s an all-too-common scenario: You’re spending time with your family and your phone suddenly steals your attention away. The dreaded airhorn alerts you to a SEV1 failure. Your application — one that periodically suffers from a memory leak that operations “fixed” by restarting it — is now exhausting server resources within minutes of coming online. The application is effectively unusable. The operations team isn’t equipped to do much other than restart or rollback, but the last good copy is months old. Who knows what else changed since then. It’s up to you to fix the leak, but you’re miles away from the office and your computer.
Incidents like this are far too common in a traditional enterprise IT model, where development and operations sit on opposite sides of a wall. But it doesn’t have to be this way. DevOps is not just for startups. It can be used in the enterprise too. Like automation and customer service, “run what you build” can be an effective tenet for improving enterprise IT delivery using a DevOps model.
In the traditional setting, developers architect and engineer a solution, then hand it over to operations. Sometimes they’re kind enough to give some guidance on how to deal with production issues, and sometimes they have little-to-no knowledge of the production environment to begin with. When these teams remain separate entities, each has little information on how the other works and what they need. The operations team often has runbooks, SOP (Standard Operating Procedures), or some other mechanism to address issues as they arise in production. These can be quite effective when you need a fix fast, but when the root causes aren’t identified and addressed, it’s like using chewing gum to patch a leaky boat. Eventually, you’re going to sink.
DevOps can provide a better way…
The cloud has helped tear down this wall because with it, your infrastructure starts to look a lot like software. The API-driven nature allows you to treat your infrastructure as code, which is something developers understand. Now that everyone is much closer to the infrastructure, operations naturally becomes more of a key requirement.
Meanwhile, software is increasingly sold as a service, and your customers are rightfully demanding constant improvements. They may tolerate a mistake here and there, but only if those mistakes are addressed right away and don’t keep happening. In order to keep up with these changes, you need to listen for clues and insights that your customers may not clearly communicate directly to you. Like you, they’re busy with other things, and when they call to give you feedback it’s likely they’re unsatisfied. While any customer interaction is a learning opportunity, you may be better off having them on your terms. These insights are much harder to find with a wall between development and operations — operations may be sweeping problems under the rug with quick fixes, and developers will have a lower bar for operational excellence if they have too much of a safety net.
All of these things are good reasons for moving away from the traditional IT model and towards a DevOps culture, where development and operations come together into a singular focus. I try to encourage executives to make “run what you build” a crucial tenet in their DevOps-driven organizations. Here are just a few benefits and behaviors that I’ve seen organizations reap:
- Design for production. Run what you build forces development teams to think about how their software is going to run in production as they design it. This can help your teams avoid the last-minute scrambling that often occurs when teams try to force-fit what they’ve built to a production environment to meet a deadline. I can’t remember how many times I’ve seen this materially hurt quality. You change something at deployment time to address something that’s different between production and development, run what you think are the relevant tests, and later discover that this change caused a bug somewhere else in the system.
- Greater employee autonomy. The run what you build mentality encourages ownership and accountability, which, in my experience, leads to more independent, responsible employees and even career growth in the organization. More on this in my previous post.
- Greater transparency. No one wants their personal time interrupted. Whoever is taking the calls will do everything they can to avoid getting them in the first place. Your teams will naturally want greater transparency in the environment and will implement proactive monitoring so they can identify issues or concerning patterns before they become widespread problems. In addition to finding problems before they happen, this sort of transparency should make it much easier to find root causes for issues that still make it through.
- More automation. Developers hate repeating manual tasks, so if they find they have to do something over and over in production to address an issue, they’re more likely get to the root cause, and automate things along the way.
Better operational quality. Things like transparency and automation will make your teams more efficient, and will continue to raise the bar for operational excellence. - More satisfied customers. Run what you build forces the entire IT team to understand more about the customer. That knowledge will no longer be limited to a product or sales team, and these insights can be incredibly useful when used as a feedback loop for constant product improvement.
I’m sure you can remind me of other benefits, and I’d love to hear your thoughts. Send them my way!
Keep building,
–Stephen
@stephenorban
orbans@amazon.com