The Case for Deployment Pipelines

Guest post by Keith Casey of Clarify.io

clarify.io

Deployment pipelines are like flossing. We all know we should do it. We really mean to do it. In fact, we promise that we’re going to do it before Next Time. And then reality sets in, we feel crunched for time, and it goes on the “should do but not really going to do” list.

At Clarify.io, we put pipelines on our “really going to do” list. By adopting a pipeline-based approach, we gained immediate returns that we could measure in hard numbers like less downtime, faster release schedules, and reliable fixes. Pipelines made our software more stable, easier to maintain, and safer to change.

In this post, we walk through the benefits of deployment pipelines, and we provide best practices for getting started.

Continuous Delivery

Deployment pipelines are a key part of an overall strategy known as Continuous Delivery. Continuous Delivery is based on automated systems that give you feedback on the production worthiness of your software as quickly as possible. Basically, it allows you to reduce the risk of your project and infrastructure by spreading the cost of your most complicated and painful processes — integration and deployment — across the entire life of your project.

At Clarify, we accomplish this through pipelines and artifacts. Whenever the code changes, we execute the relevant pipelines. This moves the software through each of the stages, including download, configuration, compilation, and deployment. We test the results, and if the software passes the tests, we know that we have an artifact that we can use as an input to the next pipeline.

When we started building for Clarify.io two years ago, using AWS was a given. Our bigger question was whether to plan for Continuous Delivery. After serious consideration, we focused on three main objectives. The three objectives drove us then and still drive us now:

Reduce risk
Improve workflow
Improve confidence

In the remainder of this post, we explain each objective and how to successfully achieve them

Reducing Risk

Our first objective was to reduce risk. At an intellectual level, we all love new things. The new shiny is just so…well, shiny. But unfortunately everything new, whether tools, technologies, platforms, development practices, or even people, introduces risk. Each of those things adds one more unknown, one more layer of complexity, one more set of skills and understanding that you have to figure out and track. The theoretical “everyone is using it!” doesn’t mean everyone is using it the way you need to. Even worse, as a startup, many of our customers view us as yet another risk. In short, there are a ton of risks, and they compound over time. Pipelines allow us to address many of those risks.

Our architecture is based 100% on microservices. At the simplest level, that means that it’s based on the Unix philosophy of “do one thing and do it well.” When your system does only one thing, it’s easier to test and confirm it’s doing that one thing correctly. For example, at Clarify our required testing resembles unit testing, where we validate lower-level functionality before we add any complexity. And then we stop.

We stop because that’s our first pipeline. We start with a single component, and then we build and test it. Once it passes the tests, we keep it on hand as an artifact. Now we know that anything that depends on or includes this component will start in a “known-good” state that we can trust. Then we repeat the process to build the next pipeline for another component, or we combine previous artifacts to create a new pipeline. Each of these pipelines and resulting artifacts make it a little safer to make changes and improvements now and in the future.

Improving Workflow

Pipelines give us integrity of the system as a whole. When the pipeline inputs are known-good components that we can configure, integrate, and deploy in well-defined ways, there are fewer variables and therefore fewer chances for things to go wrong. Or in more concrete terms, if a given component is compromised, we have an artifact that we can roll back to and processes built in to modify it and carry it forward into future pipelines. And this can be as simple as credential management.

Too often, credentials are set up on a machine and quickly forgotten. All it takes is a key accidentally being copied into an email, a blog post, or a Gist, and now it’s compromised. And dealing with compromised keys is painful because as soon as you invalidate one set of credentials, you have to switch all servers to your new set. Unfortunately, most providers don’t allow two active keys simultaneously for smooth handoffs. In short, it’s another problem that pipelines can solve.

First, pipelines mean your credentials are just another change to be managed and deployed. Even if the credentials are part of the configuration, we can make the change and let the pipelines do their job. The better approach is adding credentials as part of the pipeline itself. For example, at Clarify, once the pipelines reach the deploy stage, they connect to Amazon S3, decrypt their respective credentials, and set them up as necessary. This allows us to centrally manage all credentials and still get them to the right machines at the right time quickly and reliably.
Next, we can recover the system at any time. The idea of “move fast and break things” is a great platitude but terrifying in practice. That is, until you realize “move fast” really means “take numerous small and deliberate steps.” Pipelines give us that ability. Any given change — whether it’s a new feature, a fix, or a component upgrade — isn’t a single step but many small steps, each tested and deployed over time in non-critical locations. For failures, we can roll back to old instances or update and deploy new ones. At scale, Amazon, Twilio, Netflix, and numerous others deploy dozens or even hundreds of times per day with similar processes.
Finally, if we have to make radical changes, we have a process to do it. For example, a number of our customers are in Europe, and after the European courts invalidated the Safe Harbor pact in October, 2015, we investigated cloning our infrastructure to AWS in Europe. It took about two hours. Not two hours to discuss it or evaluate it. But two hours to do it. It all came down to the fact that the target Availability Zone was just another configuration option.

And the result from all of this is simple: improved confidence.

Improving Confidence

Our systems have to work exactly the same every time. Odds are your deployment process requires one specific person to log in to various machines, apply different deployment scripts, and wait for the results. If you’re more advanced, the machine configures itself, and then you can test and add it to your infrastructure. Even then, any manual process is one more place variance and differences can slip in.

By taking humans out of the loop in as many places as possible, we know the system is configured, deployed, and tested exactly the same way every single time. There are simply fewer variables in play, which means fewer things change. We can be confident that we’re testing the one and only thing that we changed at any specific moment. Alternatively, if something fails — either due to poor code or malicious intent — we can roll back to known-good states or fix and roll forward using the pipeline.

Start Small, Build Momentum

For most teams, deployment pipelines are intimidating at best. Teams often think they don’t have the right skills in-house, or their system is too complex, or there are just too many moving parts. It seems impossible, but it’s not. The most important concept to understand is that it doesn’t take complete Continuous Deployment to start getting value. There are many interim steps that will help your team sleep better at night.

The secret is that you can start with a single component. Your starting point can be a single web server, one load balancer, or even a basic caching system. Take your existing deployment scripts and wire them into your Git repository so that any commits to the master branch will trigger the process.

As you refine and improve this one component, apply the lessons learned to your next component. At each step of the way, you’ll reduce the risk of deployment, improve system integrity, and gain confidence in both your process and the system as a whole.

AWS Startups Blog