Containers

Part 2: Multi-Cluster GitOps — Cluster fleet provisioning and bootstrapping

Introduction

This is Part 2 in a series that demonstrates how to build an extensible and flexible GitOps system, based on a hub-and-spoke model to manage the lifecycles of Amazon Elastic Kubernetes Service (Amazon EKS) clusters, workloads deployed to these clusters as well as their dependencies on other AWS-managed resources. It’s recommended that you read Part 1 before proceeding. In this post, we dive into the mechanics of how Crossplane and Flux are used to implement a GitOps-based multi-cluster management strategy. We also present a solution to tackle the challenge of secrets management in such a GitOps workflow.

Flux and Crossplane are both open-source, CNCF projects built on the foundation of Kubernetes to orchestrate anything. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. The ability to manage deployments to multiple remote Kubernetes clusters from a central management cluster, support for progressive delivery, and multi-tenancy are some of the notable features of Flux. Crossplane is an open-source Kubernetes add-on that enables platform teams to assemble cloud infrastructure resources, without having to write any code. Through the use of Kubernetes-style APIs, Crossplane allows users to manage the lifecycles of AWS-managed resources. Using these two tools together, organizations can effectively manage the lifecycles of these resources using the GitOps model. They can define their managed resources using Kubernetes-style declarative configurations and apply those artifacts to an Amazon EKS cluster along with those that pertain to application workloads, thus unifying application and infrastructure configuration and deployment.

Let’s dive into the details.

Solution overview

The following diagram depicts the high-level architecture of a solution based on the hub-and-spoke model for provisioning and managing a fleet of Amazon EKS clusters.

  • Start off with an existing Amazon EKS cluster or provision a new one using one of the approaches outlined here. This will be used as the management cluster (i.e., the hub).
  • Install and bootstrap Flux in the management cluster, pointing to a Git repository containing deployment artifacts relevant to the management cluster
  • In the management cluster, using Flux,
    • deploy the core Crossplane controller
    • deploy the Crossplane AWS provider and Kubernetes provider
    • deploy Crossplane-specific custom resources that initiate the provisioning of an Amazon EKS cluster. This is the workload cluster (i.e., a spoke)
  • Bootstrap Flux and related tools on the workload cluster using Flux on the management cluster
  • Manage application delivery to the workload cluster autonomously using GitOps.

EKS cluster provisioning workflow using GitOps with Flux and Crossplane

Key Crossplane concepts

A Managed Resource (MR) is the building block of Crossplane and is it’s representation of an infrastructure resource in a cloud provider. It’s an opinionated Kubernetes custom resource installed by a Crossplane provider. The Crossplane AWS provider packages several MRs such as RDSInstance, Cluster, and Queue that model AWS managed resources such as Amazon RDS instance, Amazon EKS cluster, and Amazon SQS queue respectively. These MRs match the APIs of the corresponding AWS services as closely as possible. They expose the same set of parameters provided by the corresponding API groups in AWS SDK. For example, an Amazon EKS cluster may be created using CreateCluster API, with the following JSON data in the request body.

{
    "name": "prod-cluster",
    "roleArn": "arn:aws:iam::012345678910:role/EksServiceRole",
    "resourcesVpcConfig": {
        "subnetIds": [
            "subnet-xxx",
            "subnet-yyy",
        ],
        "securityGroupIds": [
            "sg-xxx"
        ],
        "endpointPublicAccess": true,
        "endpointPrivateAccess": true
    }
}

The Custom Resource Definition (CRD) for the corresponding MR in Crossplane AWS provider, namely, Cluster.v1beta1.eks.aws.crossplane.io, is shown in the following code. It uses its forProvider field to expose the parameters shown in the JSON above, as well as others supported by the AWS API.

---
apiVersion: eks.aws.crossplane.io/v1beta1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  forProvider:
    roleArn: 'arn:aws:iam::012345678910:role/EksServiceRole'
    resourcesVpcConfig:
      securityGroupIds:
        - sg-xxx
      subnetIds:
        - subnet-xxx
        - subnet-yyy
      endpointPrivateAccess: true
      endpointPublicAccess: true

Many AWS-managed resources have dependencies on other managed resources. For example, to create an Amazon EKS cluster, you must create a VPC, setup subnets and route tables, and optionally create an internet gateway and a NAT gateway. While it’s possible to manage the lifecycles of these managed resources individually using their Crossplane MR counterparts, using Crossplane Composite Resource (XR) allows platform teams to compose these MRs in an opinionated way. Crossplane introduces two special resources to define and configure a XR:

  • A CompositeResourceDefinition (XRD) that defines the schema for a XR, similar to a CRD
  • A Composition that encapsulates one or more MRs that compose a XR, along with their respective configurations

Composite Resources allow platform teams to create higher level abstractions of managed resources, like an Amazon EKS cluster, which have dependencies on several other AWS managed resources that collectively require dozens of parameters to be defined. A Composite Resource helps platform teams to hide all these details and expose just a few high-level parameters that’re relevant for development teams that consume these resources.

Relationship between Crossplane MRs, XRDs, XRs and AWS managed resources

Workload cluster provisioning with Crossplane and Flux

Let’s take a closer look at how this is done with Flux. First, Flux GitOps Toolkit is installed on the management cluster, as outlined here, and is configured to point to the gitops-system Git repository as the source of truth. This bootstrapping process creates the directory hierarchy clusters/$CLUSTER_NAME starting at the top level of this config repository. Subsequently, Flux deploys any Kubernetes artifacts residing in this directory. These could be Kubernetes-native resources or Flux-specific custom resource, such as Kustomization, HelmRelease, and GitRepository. There are many different approaches to managing the directory structure of these artifacts in a repository. In this implementation, Flux deploys the manifests in the clusters/mgmt directory of the repository in the order listed in the kustomization file. The sequence of events that occur is as follows (which refers to the numbered steps in Figure 1):

  1. Crossplane v.1.10 is installed with Helm using a HelmRelease. The Crossplane core controller is responsible for installing cloud-specific provider controllers and CRDs through its own provider packaging mechanism.
  2. Crossplane provider packages are deployed next. In this implementation, we deploy both the AWS provider and Crossplane Kubernetes provider. The latter enables deployment and management of arbitrary Kubernetes objects on both the management and workload clusters.
  3. Deploying these provider packages triggers the core controller to install the provider-specific controllers.
  4. Next, the Composition and CompositeResourceDefinitions (XRD) are deployed. At this point, the management cluster is ready to provision workload clusters.

In a decentralized model for cluster management, the platform team owns the task of provisioning a management cluster and gets it ready to start provisioning a fleet of workload clusters. The platform team uses GitOps to allow application teams to provision and bootstrap clusters themselves. The following is how such a collaborative workflow proceeds.

  1. An application team creates a specification for the desired workload cluster using a CompositeResource (XR) that conforms to the XRD and uses the Composition, both deployed in step 4. The template directory comprises the complete set of artifacts required for provisioning and bootstrapping a workload cluster. An application team can clone this directory using the helper script, specifying the desired name of the workload cluster, configure SSH keys for the repository to be used by the workload cluster using a SealedSecret resource, and create a Git pull request (PR). When the platform team merges this PR, Flux applies the XR to the management cluster.
  2. This triggers the provider controllers to provision the resources that the XR is composed of. The XR used in this implementation results in provisioning of the complete infrastructure for setting up an Amazon EKS cluster –  VPC, subnets, internet gateway, NAT gateways, route tables, and the Amazon EKS cluster with a managed node group, as shown in Figure 2.
  3. After the workload cluster is ready, it’s bootstrapped remotely from the management cluster. This step in discussed in detail later in this post under the section Bootstrapping Flux on the workload cluster.

Secrets management

Secrets management is an important aspect of a GitOps workflow. The manifests for all resources are meant to be stored in Git, but these may include resources with sensitive data that cannot be stored in plain text. Sealed Secrets for Kubernetes is an open-source tool that is commonly used for secrets management in conjunction with GitOps. Sealed Secrets is composed of two parts: a cluster-side controller and a client-side utility named kubeseal. This utility uses asymmetric cryptography to encrypt Kubernetes Secrets and encode them into a Kubernetes custom resource called SealedSecret, which is safe to be stored in a Git repository. A SealedSecret can be decrypted only by the controller running in the target cluster. The public/private key pair used by the controller is referred to as sealing keys.

Automating a GitOps workflow using SealedSecrets requires a strategy for safe storage and retrieval of the sealing keys. One approach is to store them as secrets in an external secrets store, like AWS Secrets Manager, and having a controller in the cluster that reconciles secrets from this external store to the cluster. External Secrets Operator (ESO) is an open-source tool that’s used to implement this strategy. ESO uses a Kubernetes custom resource called ExternalSecret that defines where secrets live and how to synchronize them. Using the information from an ExternalSecret resource, the controller fetches secrets from an external store using provider-specific APIs and create a Kubernetes Secret. If the secret in the external store changes, then the controller reconciles the cluster state and update the Kubernetes Secrets accordingly.

Deploying secrets with a hybrid approach, using both External Secrets operator and Sealed Secrets controller

The previous diagram illustrates how the SealedSecrets and ExternalSecrets Operators are used in tandem for secrets management. Referring to the steps illustrated in the diagram:

  • A platform administrator generates a public-private key pair to be used as SealedSecrets sealing keys using a tool such as OpenSSL. An AWS Secrets Manager secret is created to store the sealing keys. The administrator also creates the Kubernetes manifest for an ExternalSecret that references this AWS Secrets Manager secret, and adds it to Git (steps 1–2).
  • During a GitOps workflow, Flux fetches the ExternalSecret manifest from Git and applies it to the target Amazon EKS cluster. The ESO controller runs within this cluster and makes use of the AWS Identity and Access Management (AWS IAM) Roles for Service Accounts to authenticate to the AWS Secrets Manager. The controller retrieves the AWS Secrets Manager secret using information from the ExternalSecret and creates a Kubernetes Secret that encapsulates the SealedSecrets sealing keys (steps 3–6).
  • In a separate workflow (steps A–D), an application developer encodes a Kubernetes Secret into a SealedSecret using kubeseal and commits it to Git. This could be a Secret used by applications deployed to the cluster. When Flux deploys this SealedSecret to the Amazon EKS cluster, the SealedSecrets controller running in the cluster unseals it into a Kubernetes Secret using the sealing keys.

Bootstrapping Flux on the workload cluster

Let’s take a closer look at how a workload cluster is bootstrapped by using the secrets management strategy discussed above. After a workload cluster is provisioned, the Crossplane provider creates a Secret resource in the management cluster that contains configuration data (kubeconfig) needed to connect to that cluster. By referencing this Secret in a Kustomization resource, Flux allows you to apply changes to a remote cluster when it runs its reconciliation loop. We use this capability to bootstrap a workload cluster with Flux and related tools, using Flux on the management cluster.

  • The bootstrapping process begins by deploying components needed for secrets management. The External Secrets Operator and an ExternalSecret resource are first synced to the workload cluster. This triggers the retrieval of sealing keys from AWS Secrets Manager and creation of a Secret that contains the sealing keys (steps 1–3). Following this, the SealedSecrets controller is synced to the workload cluster (step 4). This controller is now ready to unseal any SealedSecret resource deployed to the cluster.
  • The next step in the bootstrapping process is to deploy the Flux Toolkit and a GitRepository resource that configures Flux on the workload cluster to use the gitops-workloads Git repository as its source of truth (steps 5–6).
  • The final step is to enable Flux access to its source of truth Git repository. This is done by syncing a SealedSecret resource to the workload cluster, which is then unsealed to create a Secret containing the SSH keys for that repository (steps 7–8). Note that the public key portion of this SSH key should have been added to the repository to allow/write access. Subsequently, Flux on the workload cluster reconcile its state with artifacts from this repository using a GitOps workflow separate from that of the management cluster (steps A–C). Crossplane is installed on the workload cluster during this workflow in order to manage the lifecycle of any AWS-managed resources needed by the applications deployed to the workload cluster. The mechanics of how applications are subsequently onboarded to the workload cluster using a GitOps workflow are discussed in detail in Part 3.

This architecture pattern allows an organization to define a clear division of responsibilities between the platform and application teams. Platform teams own the task of provisioning and bootstrapping workload clusters. Once bootstrapped, workload clusters are managed autonomously by application teams, each reconciling its state from a separate Git repository.

Bootstrapping Flux on a workload cluster using Flux on the management cluster

Conclusion

In this post, we showed you how to use a decentralized, hub-and-spoke model to manage the lifecycle of Amazon EKS clusters using Crossplane and Flux. In this model, a platform team owns the task of provisioning an Amazon EKS management cluster that serves as the hub. The platform team also vends Crossplane Compositions that define opinionated, higher-level abstractions of AWS-managed resources, such as an Amazon EKS cluster. A development team consumes this abstraction to create specifications for Amazon EKS workload clusters (i.e., spokes). The platform team uses GitOps to allow development teams to provision and bootstrap clusters themselves through the familiar Git workflow of creating and merging pull requests.

The post also presented a secret management strategy to fully automate the process of bootstrapping such workload clusters with Flux Toolkit. This allows these clusters to be subsequently managed by the development teams in an autonomous manner, using their respective source of truth repositories to define cluster state. Part 3 of the series will discuss the details of onboarding applications to the workload cluster, using IAM Roles for Service Accounts (IRSA) to address security and multi-tenancy requirements.

Viji Sarathy

Viji Sarathy

Viji Sarathy is a Principal Specialist Solutions Architect at AWS. He provides expert guidance to customers on modernizing their applications using AWS Services that leverage serverless and containers technologies. He has been at AWS for about 3 years. He has 20+ years of experience in building large-scale, distributed software systems. His professional journey began as a research engineer in high performance computing, specializing in the area of Computational Fluid Dynamics. From CFD to Cloud Computing, his career has spanned several business verticals, all along with an emphasis on design & development of applications using scalable architectures. He holds a Ph. D in Aerospace Engineering, from The University of Texas, Austin. He is an avid runner, hiker and cyclist.

Islam Mahgoub

Islam Mahgoub

Islam Mahgoub is a Solutions Architect at AWS with 15 years of experience in application, integration, and technology architecture. At AWS, he helps customers build new cloud native solutions and modernise their legacy applications leveraging AWS services. Outside of work, Islam enjoys walking, watching movies, and listening to music.

Sourav Paul

Sourav Paul

Sourav Paul is a Senior Solutions Architect with AWS UK. He helps ISVs innovate on AWS, likes to talk about containers and enjoys contributing to open source projects. In his spare time, he messes around with his raspberry pi and loves spending time with family.

Sheetal Joshi

Sheetal Joshi

Sheetal Joshi is a Principal Developer Advocate on the Amazon EKS team. Sheetal worked for several software vendors before joining AWS, including HP, McAfee, Cisco, Riverbed, and Moogsoft. For about 20 years, she has specialized in building enterprise-scale, distributed software systems, virtualization technologies, and cloud architectures. At the moment, she is working on making it easier to get started with, adopt, and run Kubernetes clusters in the cloud, on-premises, and at the edge.