Help us write a new chapter for Gitops, Kubernetes, and Open Source collaboration

Introduction

The Amazon Elastic Kubernetes Service (EKS) team sees the ecosystem around automated software deployment as a technology frontier ripe with potential for groundbreaking innovation.

Over the last twenty years, the way in which developers deploy and manage their applications has changed dramatically. Technology improvements in packaging, automation, and virtualization as well as shifts in operations culture have profoundly shaped the software deployment landscape.

Most recently, a set of best practices for automated delivery of code has emerged in parallel with the growth in popularity of Kubernetes. These practices, called GitOps, underpin the design of two projects in the open source ecosystem, Flux CD and Argo CD, which are joining forces as Argo Flux.

The EKS team is excited to participate in the Argo Flux collaboration and we’re asking the community to join us at KubeCon to learn, provide feedback and help us write a new chapter for GitOps and Kubernetes.

How did we get here?

Twenty years ago, it was common practice to deploy software by copying binaries from a desktop to a target physical server. This practice seems archaic by modern standards, but demonstrate just how far the state of software deployment has evolved.

A slow wave of technical innovation improved how software was bundled together and how changes to deployed software could be controlled.

Linux distributions pioneered package management solutions that coupled well-defined package formats with a concept of a package repository. Package formats allowed developers to describe their software’s metadata, dependencies, basic configuration toggles, and initialization steps. Package repositories promoted an ecosystem of code distribution and discovery.

Similar innovation occurred in how computer resources were managed. Virtualization software provided an ability to slice physical hardware into smaller virtual servers, allowing greater resource utilization and abstracting away much of the complexity of physical hardware. Containerization followed in virtualization’s footsteps. While the concept of containers was not new, the combination of a new way to bundle software (the Docker image format) and the ease of use of Docker’s toolchain ignited the next wave of software deployment.

Virtualization and containerization expanded the size and scope of a target deployment. No longer was a software deployment confined to one or two large machines running in a server room. Now there were dozens of interrelated services spread across different storage, network, and compute layers sometimes comprising thousands of host machines.

In the same way that technology has evolved to manage and automate software deployment, so too have our operational practices.

The concept of service teams and the DevOps mindset intentionally blurred the lines between development and operations. Operationally owning their code has resulted in developers becoming acutely aware of pain points in installation, deployment, monitoring, and change management — pain points that traditional IT operators and system administrators were all too familiar with.

This increased awareness of operational pain points led to the creation of automation software to address those pain points. Continuous Integration (CI) platforms like Jenkins were created to automate the building and testing of software. Configuration Management (CMDB) frameworks like Chef and Puppet were created to address pains around configuring large fleets of target hosts. Software to automate the continuous deployment (CD) of released artifacts followed soon after.

Rapid adoption of Kubernetes impacted this continuous deployment space in a few important ways.

Prior to Kubernetes, the orchestration of a containerized application’s deployment was a complex dance typically involving multiple systems (CI engines, CMDB servers, package repositories, and a healthy dose of human processes). This dance was intricate and prone to errors that caused outages. Kubernetes introduced a built-in mechanism that became a new standard for how a deployment was orchestrated. This resulted in a reduction in common errors occurring when rolling out an application.

Kubernetes embraced and promoted the idea of declarative application management. An underlying design principle of Kubernetes is to express the desired state of an application as a set of declarative definition files (called manifests) and have a controller continually drive the actual state of a system towards that desired state. Instead of a human instructing some system to launch a container on three host machines, wire up a load balancer and connect those containers to the load balancer, instead Kubernetes asks the user to describe their desired application topology using those manifests, and Kubernetes’ controllers will handle the complexity of staged code rollouts and service dependency management.

All these technology and cultural changes have led our industry to GitOps, a collection of best practices and principles for automated software delivery.

GitOps

GitOps — a term popularized by Weaveworks — is a distillation of best practices for managing the deployment of containerized applications as well as the cluster infrastructure upon which they run.

The principles of GitOps that the EKS team find compelling are straightforward:

Everything in source control — configuration of both the infrastructure and the application is entirely represented in source control, which is the source of truth for the desired state of the system.
Declarative over imperative — describe the desired state of the infrastructure and application as opposed to executing installation instructions. Declarative definitions of configuration are easier to diff, easier to reason about, and easier to explain than a set of installation instructions.
Continually converge to desired state — embrace tools that promote self-healing and automated deployment.

That’s it. Simple, yet immensely powerful.

With GitOps, the git repository is the source of truth for a system’s configuration (configuration as code). This encourages the system’s configuration to be represented in a declarative fashion (declarative application management). An automated actor is responsible for deploying this desired state to a target Kubernetes cluster. Something should alert when the actual state of the system drifts from the desired state. Pull request-based workflows allow familiar change-management with reviews, approvals, automated tests. This naturally enables auditing and rollback of configuration changes via the source control system.

Kubernetes is an ideal tool for enforcing GitOps practices and principles. Kubernetes manifests can be stored in a git repository. These manifests containing Kubernetes object definitions — Deployments, Services, Ingresses, etc — represent the desired state of the system. The ‘kubectl apply’ command accepts one or more of the manifests and attempts to make the system match what is represented by the manifest.

What is a “GitOps Operator”?

In Kubernetes jargon, an Operator is a pattern of using custom resource definitions and a server-side controller that runs a reconciliation loop that continually drives actual state of objects to match the desired state of those objects. Sounds fancy, but at its heart, it’s actually fairly simple: users submit their desired state by creating instances of an object and the server tries to make the state of the world equal to that desired state.

So, what exactly is a GitOps Operator? Well, it’s a server component that reads the desired state of a system — as represented by Kubernetes manifests in a git repository — and continually tries to make the actual state of the system match those manifests.

There are two open source projects that implement a GitOps Operator: Flux CD and Argo CD.

History of Flux CD

When Weaveworks were building their Weave Cloud service, they automated the process of building, testing and then pushing out code to their target platforms. A critical component to their deployment pipeline was the service that monitored for changes in the repositories storing configuration manifests and container images. This service, Flux CD — now a CNCF Sandbox project — is the watchful eye that ensures the desired state of what is represented in those configuration manifests matches the actual state of those deployed manifests.

Flux CD actually originated as a daemon that monitored an image artifact repository for new versions of images referenced in Kubernetes manifests that were deployed in Weave Cloud. When it found versions of images that were different than what was deployed to their cloud, the Flux daemon would modify the Kubernetes manifests that referenced those images and write a new git commit to their configuration repository. The Flux daemon would then call `kubectl apply` on those manifests, which would cause Kubernetes to roll out a new deployment of the applications with the new images. This ability to monitor for changes to referenced images and automatically modify Kubernetes manifests in a configuration repository is something of a unique feature for Flux CD.

History of Argo CD

At the same time that the Flux project was evolving, a team of engineers at Applatix saw an opportunity to bring the same benefits of a continuous deployment automation service to a more traditional enterprise IT audience.

The Applatix team had created the Argo Workflows project as a general-purpose workflow engine where steps in the workflow are executed as Kubernetes Jobs. Applatix was acquired by Intuit in January, 2018. Knowledge acquired when building Argo Workflows was used to build Argo CD a few months after the acquisition, including how to expose the public configuration API of Argo CD using a set of Kubernetes custom resource definitions (CRDs).

Argo CD was initially designed for Intuit’s internal application development teams, and the roadmap and features of the project reflect the fact that Intuit is a big enterprise software (and SaaS) vendor. Traditional vendors like Intuit have thousands of application developers arranged in large organizations focused on a business segment. Argo CD needed to allow a centralized policy and authorization system while still making Argo CD relatively self-service. As such, Argo CD supports a comprehensive Role-based Access Control (RBAC) system and a single Argo CD server can support multiple target Kubernetes clusters and multiple source repositories.

Joining forces with Argo Flux

I had a chance to catch up with Mike Bowen, Director of Open Source Engineering at Aladdin Product Group, the engineering team at BlackRock, earlier this month. BlackRock engineers contribute to the Argo Project, including contributing the Argo Events subproject. BlackRock is a great example of a company that is gradually containerizing legacy workloads and moving to a more nimble deployment workflow.

BlackRock is a large financial services firm with a substantial existing footprint consisting of on-premises and virtualized workloads. These workloads, maintained by different development teams across multiple continents, comprise one of the most sophisticated and complex financial services platforms on the planet. The requirements of this platform with regards to security, reliability, and scale are massive, as Mike explained.

“At BlackRock, we operate hundreds of environments running thousands of workloads that yield millions of processes. This doesn’t even include the millions of processes running in our calculation estate. Evolving our platform without robust tooling to minimize the complexity of application teams adopting containerized workloads running on Kubernetes and a unified control plane to meet BlackRock’s operational scale requirements would be a non-starter.”

While BlackRock is currently only using on-premises Kubernetes in specific production environments, it has adopted containerized workloads and Kubernetes to facilitate its cloud-native strategy. The requirements around scale, brownfield environments, and team structure caused Mike and his team to evaluate multiple options in the continuous delivery space, and it turned out that they found different features and aspects of both Flux CD and Argo CD attractive for different use cases. The combination of the two tools were the clear recommendation for continuous delivery of containerized applications and cluster infrastructure.

Like the engineers at BlackRock, the Amazon EKS team sees the ecosystem around automated software deployment as the next great technology frontier, ripe with potential for groundbreaking innovation.

In this frontier, some pioneers have already carved out a space for themselves. As mentioned earlier, the Flux CD and Argo CD projects each provide a Kubernetes-native continuous deployment system that encourages GitOps best practices.

Customers have long been asking the EKS team for our recommended approach to containerized application deployment. Until now, we’ve not had a recommendation. Instead, we’ve pointed customers at various software projects, including Argo CD and Flux CD, and relied on customers to make their own decisions. Some customers provided feedback that they liked certain aspects of both Flux CD and Argo CD and wished there was a single solution that encompassed the best of both worlds.

With this feedback in hand, we started talking with the driving forces behind these projects, Weaveworks and Intuit, about potentially joining forces. Over the last few months, engineers from both companies collaborated on a long-term vision for a combined Argo Flux that represents this best-of-breed continuous deployment solution. Engineers from Weaveworks, Intuit, AWS and BlackRock, including myself and Mike Bowen, were able to attend a summit in London where we could discuss the various use cases and implementation differences between Flux CD and Argo CD and brainstorm on ways to expand the GitOps and cloud-native application delivery space.

“There are two paths to consider when starting any project. The first path is to build it all by our self ‘on an island’ and the second is to build in the open as part of a community. We’ve found through our experience in open source and specifically with the Argo project and Argo Events, the community collaboration approach to solving common challenges yields more well-rounded and diverse solutions. We are very excited to collaborate with AWS, Intuit, and Weaveworks on Argo Flux to help build a best-in-breed continuous deployment solution.”

-Mike Bowen, Director, Aladdin Product Group (APG), BlackRock’s engineering organization

A GitOps Engine

Since that summit, engineers from Intuit and Weaveworks have been working to extract code from both the Argo CD and Flux CD source code repositories into a new GitOps Engine repository, with a goal to have this engine powering the internals of proof-of-concept branches of both Flux CD and Argo CD. These proof-of-concept branches are designed to showcase to the Argo CD and Flux CD user communities how the contributor teams for Argo CD and Flux CD might eventually arrive at an Argo Flux joint codebase.

Initial functionality that has been placed into the new GitOps Engine repository includes:

Managing source git repositories
Creating sync “plans”
Reconciling Kubernetes resources
Caching Kubernetes resource trees
Generating Kubernetes manifests

We’re seeking input from the user community. Our top priority is ensuring users and contributors have a strong voice in the process. We want Flux CD and Argo CD users to have a smooth migration experience if they choose to embrace Argo Flux. If you are interested in the design of the GitOps Engine and would like to provide feedback on its direction, please check out the GitHub Issues.

Looking ahead

Of course, we realize that we’re only at the start of a long journey. The combination of Flux CD and Argo CD presents both technical and community challenges. However, these challenges represent an opportunity to both create something greater than the sum of its parts as well as push the boundaries of what is possible in the cloud-native application delivery space.

If we manage to align the Flux CD and Argo CD projects and advance a shared roadmap for a best-of-breed GitOps Operator, the next area to explore alignment is that of progressive delivery. Progressive delivery is all about rolling out an application deployment in a tightly controlled manner. The traffic to an application’s endpoints — for example, an API server — can be gradually shifted from one version of a deployment to another. This allows the new version of a deployment to be tested and validated before full traffic is passed to it.

The Weave Flagger and Argo Rollouts projects provide Kubernetes-native progressive delivery solutions. We see a future where these two projects can be combined into a single best-of-breed progressive delivery system.

Conclusion

We’re thrilled to see the merging of two great software ecosystems and communities, and we feel that we’ll go further together. Come join us at KubeCon and AWS Container Day to learn about Argo Flux and meet the teams from Weaveworks and Intuit who have been collaborating over the last couple months on this effort.

The keynotes at AWS Container Day on Monday all focus on continuous delivery and GitOps. You’ll hear from Bob Wise, GM of the EKS team, Alexis Richardson, CEO of Weaveworks, and Pratik Wadher, VP of Product Development at Intuit, about GitOps, Flux CD, Argo CD and our collaboration in this space with Argo Flux.

We’re excited to get feedback from you on the Argo Flux collaboration.

If you’re going to KubeCon in San Diego, you can attend either of these feedback-gathering sessions:

November 18, 5:30 – 7:00pm PT: Office hours at AWS Container Day at KubeCon, Hotel Wyndham San Diego Bayside, Harborside Room (First floor)
November 20, 4:30 – 5:30pm PT: KubeCon, San Diego Convention Center – In front of the Lobby East Starbucks

If you won’t be at KubeCon, we still want to hear from you! We’ve got a number of online events planned for the weeks after KubeCon:

Join the Slack AMA on 25th Nov 2019, 09:00 Pacific time (convert to your timezone here) on #gitops channel on Kubernetes Slack (get an invite here).
November 26, 10:00 – 11:00am PT: Weave Online User Group – Join the online call with members from the Flux CD and Argo CD project
December 3, 10:00 – 11:00am PT: Weave Online User Group – Join the online call with members from the Flux CD and Argo CD project