Containers
How to manage EKS Pod Identities at scale using Argo CD and AWS ACK
In today’s blog post, we’ll explore how to manage at scale the association of Kubernetes service accounts with IAM roles through the Amazon Elastic Kubernetes Service (EKS) Pod Identity association. We will be using Argo CD, a popular GitOps delivery tool, and the AWS Controllers for Kubernetes (ACK) to automate the association, but as we will see this sometimes causes a critical challenge: the EKS Pod Identity API is eventually consistent. We need to verify the association is available for the credentials before the application pods are deployed. Our focus will be on correct deployment, thus addressing that challenge, without causing side effects. We’ll demonstrate how to automate this process, maintaining the GitOps workflow.
What is GitOps?
GitOps is a modern approach for continuous delivery that uses Git as a single source of truth for declarative infrastructure and applications. Argo CD is a tool that enables teams to implement powerful GitOps workflow that significantly enhances the development and deployment for application teams. The GitOps approach enables organizations to achieve faster, more reliable deployments while maintaining a clear audit trail of all changes in Git history.
EKS Pod Identity: Simplifying IAM Permissions for Kubernetes Applications
Amazon EKS Pod Identity introduces a streamlined approach to managing AWS Identity and Access Management (IAM) permissions for applications running on Amazon EKS clusters. This feature addresses challenges in managing permissions across multiple EKS clusters, offering advantages over the IAM Roles for Service Accounts (IRSA) method, especially in the GitOps workflow.
EKS Pod Identity maintains the principle of least privilege by providing fine-grained control at the pod level. This improves security by reducing the potential attack surface compared to node-level permissions. The feature seamlessly integrates with EKS clusters running Kubernetes version 1.24 and above, working out-of-the-box with existing AWS services and SDKs.
By addressing key pain points in IAM management for Kubernetes applications, EKS Pod Identity offers a more efficient, secure, and user-friendly solution: you simply link your IAM role to your service account via the EKS Pod Identity API. As organizations scale their container deployments, this feature provides a valuable tool for simplifying access control and enhancing overall cluster management.
Overview of Amazon Controllers for Kubernetes (ACK)
The AWS Controllers for Kubernetes (ACK) is a set of custom controllers that allow you to manage AWS resources directly through Kubernetes custom resources. It provides a Kubernetes-native way to manage AWS services using Custom Resource Definitions (CRDs) to represent AWS resources. The controllers reconcile the desired state in Kubernetes with actual AWS resources and there is support for many AWS services like Amazon EKS, Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), etc. In this post, we are using the ACK service controller for Amazon EKS to manage EKS Pod Identity associations, allowing pods to assume IAM roles through Kubernetes service account.
Let’s see how we can combine the usage of Argo CD and ACK to deploy applications while addressing the eventually consistent nature of the EKS Pod Identity API.
Prerequisites
- AWS Command Line interface (CLI) installed
- Kubernetes command line tool (kubectl) installed
- The jq tool installed
- Helm CLI installed
- Access to an existing EKS cluster with Argo CD installed. For this post, we used the ArgoCD on Amazon EKS Blueprints pattern with Terraform.
High-level overview of the steps
- Install Amazon EKS Pod Identity Agent
- Install the AWS Controllers for Kubernetes (ACK)
- Deploy the sample app with the possible inconsistent behavior
- Deploy the sample app with the job validating permissions
Install Amazon EKS Pod Identity Agent
The Pod Identity Agent runs as a Kubernetes DaemonSet on your nodes and only provides credentials to pods on the node that it runs on. For more information, you can visit the documentation. From a terminal that is configured to connect to your cluster, confirm you’re connected to the right cluster.
If you’re using the EKS blueprints, you can run the following commands:
Note – You do not need to install the EKS Pod Identity Agent on EKS Auto Mode Clusters. This capability is built into EKS Auto Mode.
Install the AWS Controllers for Kubernetes (ACK)
For this post, we will install the ACK service controller for Amazon EKS along with its required role. Next, we will use the PodIdentityAssociation, a custom resource from this controller, to map an IAM role to a service account in the EKS cluster.
1. Creation of a role for our ACK controller and the EKS Pod Identity association
2. Install the ACK controller for EKS with Helm, edit region as needed
For detailed information on the ACK installation, refer to the documentation. The installation steps demonstrate installing ACK service controller for S3. It’s possible to use the environment variable SERVICE=eks
for installing ACK service controller for EKS instead.
Accessing Argo CD
In the next steps, we will deploy our application through Argo CD. While we will not be doing any action, this can be useful to see what is happening. If you’re using the Argo CD from the EKS Blueprints, you can run the following command to setup a port-forward and receive the password for your Argo CD instance.
By using the http://localhost:8080
in your browser with the credentials from the output, you’ll be able to find the application we will deploy next.
Deploy the sample application highlighting the problem
1 . Clone the code repository for the sample application project
Browse the files in the sample-argocd-ack-eks
project. Since our application deploys a few components and we want to control the ordering, we used the helm.sh/hook-weight
annotation through a Helm chart in a declarative GitOps way. If you don’t want to use Helm it’s also possible to get the right ordering via Argo CD Sync waves.
Resources annotated with lower wave numbers are deployed first by Argo CD. Notice that in both service-account.yaml
and pod-identity-association.yaml
the weight (or sync wave number) is set to -50. This means we want Argo CD to create the service account and PodIdentityAssociation resources in the same wave that runs before any other resources are created (since wave number is negative).
- application.yaml – Define the Argo CD application using a custom resource definition
- application-with-job.yaml – Define the Argo CD application using a custom resource definition and the job validating IAM role. There is one parameter difference from the application.yaml above to enable the job in our Helm chart.
- The chart folder contains the details of our Helm chart.
- values.yaml – Default value for our chart
- Chart.yaml – Definition of our chart
- The chart/templates folder contains the manifest files for the Kubernetes resources that will be deployed.
- service-account.yaml – The service account definition
- pod-identity-association.yaml – The custom resource
PodIdentityAssociation
which associates the service account to the IAM role. This custom resource will be managed by ACK controller for EKS. The association must be returned by the API before the pods are started, or they will not have the necessary IAM permissions. deployment.yaml
file in the templates folder contains the manifest for deploying the container application as a Kubernetes deployment.job.yaml
– defines a Kubernetes job which will check that certain environment variables are set confirming that an IAM role has been successfully associated to the pods service account. If the environment variables are not set, then the job fails with a non-zero exit code. Notice the Helm hook annotation in the job manifest filehelm.sh/hook-weight: "50
” and deployment manifest filehelm.sh/hook-weight: "100"
which means Argo CD will create the job and wait for its successful completion before attempting to create the application deployment.
2. Create the IAM role to be used by the application
There should be a trust policy for this role to be assumed by the service principal pods.eks.amazonaws.com. Copy the ARN of this IAM role for next step.
3. Open application.yaml file, the following parameters can be replaced with your own
- roleArn – Replace with the ARN of IAM role created in the step above (will be replaced via export)
- Change any parameter needed to match your cluster in Argo CD. Currently it will deploy to a cluster name getting-started-gitops in the sample-argocd-ack-eks namespace.
4. Create the Argo CD application
5. Confirm the role from inside our container
Due to the eventually consistent nature of the EKS Pod Identity API, coupled with the speed of automation, we may encounter failures in associating the role with the service account. In our setup, our pods can end up with the node group role (arn:aws:sts::123456789012:assumed-role/initial-eks-node-group-…). At scale, this can become a recurring problem and lead to multiple disruption per day for your teams.
Note: the fallback role depends on your hop count setting, see IMDSv2 configuration for more information
Run the aws sts get-caller-identity
from inside the container.
Let’s look at a way to automate that will be compatible with automated deployment.
Validate the correct role is passed via a job
Using a job, we can validate that the EKS Pod Identity Agent is giving us the credentials of the configured IAM role before starting the main application pods (here a Kubernetes deployment). To enable this, we change a parameter passed to the helm chart:
This job validates the IAM role from the session credentials with the role from the environment variable. To enable the job:
- Remove the previously created deployment and delete the PodIdentityAssociation to reproduce the initial state.
- In the application-with-job.yaml, the jobWorkaround parameter was changed and pass to helm.
- Install the application by using the role created in the previous section (same role).
When deployment is completed, using the same command as before, we can confirm we got the expected role.
We can see our job performed the task of verifying IAM role was successfully associated with the service account (here a second job was needed before the role was available):
Alternative Solution: Changing ARGOCD_SYNC_WAVE_DELAY
Argo CD allows you to control the deployment order of specific resources through sync waves. The default delay between waves is 2 seconds. We experimented with this delay and have seen a reduction in failure by increasing its value. While this may work in some cases, this had the adverse effect of slowing down all deployments (even if they don’t use Pod Identity). So, while this approach is simpler, it may not fit everyone’s use case.
Cleanup
To clean up, follow the following steps:
- Delete application from Argo CD
- Delete IAM role
- Helm uninstall the ACK controller for EKS and role
- Delete the EKS Pod Identity Agent add-ons
- Destroy the rest of the Terraform stack
Conclusion
In this post, we explored how to effectively manage EKS Pod Identity associations at scale when using Argo CD and AWS Controllers for Kubernetes (ACK). We demonstrated the challenges that arise from the eventually consistent nature of the EKS Pod Identity API and presented two solutions to address this:
- Using a validation job that verify the EKS Pod Identity association is properly configured before deploying the application workloads
- Adjusting the
ARGOCD_SYNC_WAVE_DELAY
parameter, though this may impact overall deployment speed
By implementing these solutions, you can reliably automate the deployment of applications that require specific IAM permissions while maintaining GitOps practices. The approach of using a validation job provides a more targeted solution that doesn’t impact other deployments, making it suitable for production environments.
We encourage you to try these approaches in your own environment and choose the solution that best fits your specific needs. To learn more about EKS Pod Identity, visit the AWS documentation or explore the sample code provided in this post’s repository.
About the authors
Mathieu Bruneau is a Containers Specialist Solutions Architect at Amazon Web Services Canada. He’s been bridging discussions between Operations and Developers teams’ way before the term DevOps became popular. Math is located in Montreal, Canada and enjoys spending time with wife and his 3 boys, either playing video games or throwing some frisbees around.
Ahmed Elhosary is a Senior Technical Account Manager (TAM) with AWS. He is a member of the Canada East Enterprise support team and the technical field community for containers.
Martin Guy Lapointe is a Senior Solutions Architect with AWS Canada.
Rishi Gera is a Senior Solutions Architect with AWS Canada.