Part 3: Multi-Cluster GitOps — Application onboarding

Introduction

This is Part 3 in a series of blogs that demonstrates how to build an extensible and flexible GitOps system, based on a hub-and-spoke model to manage the lifecycles of Amazon Elastic Kubernetes Service (Amazon EKS) clusters, applications deployed to these clusters as well as their dependencies on other AWS managed resources. It’s recommended that you read Part 1 and Part 2 before proceeding. In Part 2, we discussed the mechanics of how Crossplane and Flux are used to implement a GitOps-based strategy to provision and bootstrap a fleet of Amazon EKS workload clusters. The focus of discussion in this post is how to onboard applications to workload clusters, which involves deploying applications to the target cluster as well as any AWS managed resources that they depend on.

For applications deployed on clusters that need to make API requests to AWS services, the recommended security practice is to use AWS Identity and Access Management (AWS IAM) Roles for Service Accounts (IRSA) for granting the needed AWS IAM permissions. IRSA provides the benefits of least privilege, credential isolation and auditability. It may also be needed to install tools like monitoring agent and ingress controller on the clusters. IRSA should be setup for such tools that make API requests to AWS services. In this post, we also discuss how to automate the setup of IRSA in a multi-cluster GitOps system.

Background

Deploying an application to a workload cluster involves applying manifests for Kubernetes resources (e.g., Deployment, Service, and ConfigMap) and Crossplane Managed Resources that correspond to AWS services that the application depends on. In a fully decentralized model, each application is deployed from a separate repository, maintained by the respective application team. To fully automate this process using GitOps workflow, Flux on the target workload cluster must be configured to reconcile cluster state from the application repository.

To configure an application to use IRSA, we must first create an AWS IAM OpenID Connect (OIDC) identity provider for the workload cluster. Second, to associate the AWS IAM role with a Kubernetes service account, the trust policy of the AWS IAM role must be defined. Both these steps require the OIDC identity provider ID of a workload cluster, which is not known in advance. Automating the GitOps workflow requires a solution to address this issue.

Let’s get into the specifics of how the application onboarding is implemented, and how IRSA requirements are satisfied.

Solution overview

Application onboarding

In Part 2, we discussed how to bootstrap a workload cluster with Flux and configure it to reconcile initial cluster state from the gitops-workloads repository. In order for Flux to reconcile application-specific artifacts from a separate application repository, a GitRepository resource that references the application repository has to be first applied into the cluster. Next, SSH keys to connect to this repository have to be provided in the form of a SealedSecret resource. Finally, a Flux Kustomization that references the GitRepository resource has to be applied to the cluster. The manifests for these resources are added to a cluster-specific folder under the gitops-workloads repository and synced to the workload cluster.

Once Flux is configured to reconcile the new application repository, the application team has full control over what gets deployed to the workload cluster. The governance team, who owns the gitops-workloads repository, are only involved at the onboarding time. This approach improves agility by taking the governance team out of the loop for application deployment activities. But how can we make sure that the application team do not change the resources that belong to other applications on the same cluster, or make system level changes on the workload cluster?

On multi-tenant clusters, Flux support for role-based access control can be used to ensure that an application team deploys their application artifacts only to specific namespaces in the workload cluster. In the application onboarding flow, an application team creates a namespace and a service account with a role binding that grants access only to that namespace. To enforce tenant isolation in the workload cluster, the Flux Kustomization responsible for reconciling the application repository is configured to impersonate this service account. This ensures that the reconciliation fails if an application team attempts to make any changes to objects in namespaces other than the one created for their application. The namespace, RBAC and service account manifests are applied to the cluster using familiar Git workflow of the application team creating a PR and the governance team reviewing and then merging it.

The following diagram below depicts how the application onboarding flow would look like:

Figure 1. Onboarding a new application whose manifests exist in a separate Git repo

The flow starts with application team creating a Pull Request (PR) on gitops-workloads repository with the manifests needed to onboard the application — a template directory that comprises the complete set of manifests required for application onboarding is added to gitops-workloads repository to help the onboarding process. An application team can clone this directory using the helper script, specifying the application name, the target cluster name, the branch/directory in the application repository that needs to be reconciled into the workload cluster, and the SSH keys for connecting to the application repository, and create a PR. The governance team reviews the manifests that the PR comprises, approves and merges it. This triggers Flux to pull the application onboarding manifests from the cluster/application specific folder under the gitops-workloads repository (step 1), and apply them to the workload cluster (step 2). Next, Flux pulls the manifests from the application repository (step 3) and applies them to the workload cluster (step 4).

Automating AWS IAM roles for service accounts (IRSA) setup in GitOps

At a high level, IRSA configuration consists of two parts:

IRSA prerequisite — an AWS IAM OIDC identity provider must be created for the workload cluster using its OpenID Connect (OIDC) issuer URL. This is a one-time setup carried out as part of the workload cluster provisioning and bootstrapping flow.
IRSA setup for an app or a tool — this involves creating an AWS IAM role with the required permissions and configuring a Kubernetes service account to assume the IAM role.

With regard to the IRSA prerequisites, Crossplane Managed Resource (MR) OpenIDConnectProvider is used to create the AWS IAM OIDC provider. The Crossplane Composition used to provision the workload cluster encapsulates an instance of this MR. Crossplane allows you to patch from one composed resource in a Composition to another, by using a Composite Resource (XR) as an intermediary. This feature is used to extract OIDC issuer URL from the Cluster MR that’s mapped to the workload cluster and use it to instantiate the OpenIDConnectProvider MR.

To setup IRSA for an application or a tool, the AWS IAM roles and policies are created using the Crossplane MRs, namely, Role and Policy, respectively. To implement this step with a generalized approach, we must dynamically resolve the following parameters:

account ID referenced by the Kubernetes annotation in the service account that associates it with an AWS IAM role
OIDC provider URL of the cluster referenced by the trust policy of the AWS IAM role
account ID referenced by the trust policy of the AWS IAM role

To fully automate the creation and configuration of the service account and AWS IAM trust policy, the above parameters are exposed using a Kubernetes ConfigMap in the workload cluster. To create this ConfigMap, we make use of the Crossplane Kubernetes Provider, which is designed to enable deployment and management of arbitrary Kubernetes resources in clusters. This ConfigMap is one of the composed resources within the Composition used to provision the workload cluster. Crossplane patches are used to populate this ConfigMap with values from other composed resources. These values are then used to replace placeholder variables such as ACCOUNT_ID and OIDC_PROVIDER used in the manifests for IRSA related artifacts. This can be done during the GitOps workflow using Flux variable substitution, a feature that enables basic templating for Kubernetes manifests, providing a map of key/value pairs holding the placeholder variables to be substituted in the final YAML manifest.

IRSA prerequisites

The following diagram depicts how IRSA prerequisites are fulfilled as part of the workload cluster provisioning and bootstrapping flow:

Figure 2. Setting up IRSA prerequisites

First, Flux in the management cluster pulls the manifests from the gitops-system repository (step 1), which include the XR for provisioning a workload Amazon EKS cluster. Flux applies these manifests to the management cluster (step 2). This triggers the Crossplane AWS provider to create the workload cluster (steps 3–4). Post cluster-creation, this provider creates the OpenIDConnectProvider MR, which sets up the AWS IAM OIDC identity provider for the workload cluster (step 5). Finally, the Crossplane Kubernetes provider creates the ConfigMap in the workload cluster, exposing parameters such as OIDC URL and AWS account ID (step 6). This ConfigMap is used to configure IRSA for applications deployed to the workload cluster

IRSA setup for applications

Now, let’s see how the app-specific part of IRSA is taken care of in the application onboarding flow. The diagram below depicts the onboarding flow when the application has dependencies on cloud resources running outside the Amazon EKS cluster (e.g., Amazon DynamoDB table, Amazon SQS queue, etc.).

Figure 3. Onboarding a new application that has dependencies on cloud resources running outside the Amazon EKS cluster

Crossplane is installed on the workload cluster as part of the cluster provisioning and bootstrapping flow as discussed in Part 2. For creating the AWS IAM artifacts required to configure IRSA for an application, the corresponding Crossplane MRs are included in the application onboarding manifests, pulled by Flux (step 1), and applied to the workload cluster (step 2). During the reconciliation step, Flux substitutes the placeholders in the manifests such as ACCOUNT_ID and OIDC_PROVIDER with values from the ConfigMap. Subsequently, Crossplane in the workload cluster creates the AWS IAM artifacts.

To provision any AWS-managed resources that the application depends on, the manifests of the corresponding Crossplane MRs are added by the application team to the application repository, pulled by Flux (step 3), and applied to workload cluster, along with the standard Kubernetes resources like Deployment, Service, etc. (step 4). Subsequently, Crossplane in the workload cluster provisions the cloud resources.

IRSA setup for tools

The IRSA configuration for Crossplane running on the workload cluster is a special case as Crossplane is not available yet to create the needed AWS IAM resources (i.e., a chicken or the egg problem). To solve this, the Crossplane in the management cluster is used for creating the AWS IAM resources needed for Crossplane running in the workload cluster.

Conclusion

In this post, we showed you how application onboarding can be addressed in a multi-cluster GitOps system with support for fully decentralized model, where each application team can bring their repository and get it reconciled into workload cluster. We showed how governance teams can control which application gets onboarded into which workload cluster through the Git PR process. We have also demonstrated how to align with Amazon EKS security best practices when accessing AWS APIs by configuring IRSA using Flux and Crossplane.

In this 3-part series, we demonstrated how to build an extensible and flexible multi-cluster GitOps system based on a hub-and-spoke model that addresses the platform and application teams’ requirements. The series covered use cases, such as managing the lifecycle of Amazon EKS clusters, bootstrapping them with various tools, deploying applications to the provisioned clusters, and managing the lifecycle of associated managed resources such as Amazon SQS queues and Amazon DynamoDB tables while implementing security best practices. We highly recommend that you try Amazon EKS Multi-cluster GitOps workshop for hands-on experience.

The full implementation of the solution outlined in this blog series is available in this GitHub repository.

Containers