Containers

How to manage EKS Pod Identities at scale using Argo CD and AWS ACK

In today’s blog post, we’ll explore how to manage at scale the association of Kubernetes service accounts with IAM roles through the Amazon Elastic Kubernetes Service (EKS) Pod Identity association. We will be using Argo CD, a popular GitOps delivery tool, and the AWS Controllers for Kubernetes (ACK) to automate the association, but as we will see this sometimes causes a critical challenge: the EKS Pod Identity API is eventually consistent. We need to verify the association is available for the credentials before the application pods are deployed. Our focus will be on correct deployment, thus addressing that challenge, without causing side effects. We’ll demonstrate how to automate this process, maintaining the GitOps workflow.

What is GitOps?

GitOps is a modern approach for continuous delivery that uses Git as a single source of truth for declarative infrastructure and applications. Argo CD is a tool that enables teams to implement powerful GitOps workflow that significantly enhances the development and deployment for application teams. The GitOps approach enables organizations to achieve faster, more reliable deployments while maintaining a clear audit trail of all changes in Git history.

EKS Pod Identity: Simplifying IAM Permissions for Kubernetes Applications

Amazon EKS Pod Identity introduces a streamlined approach to managing AWS Identity and Access Management (IAM) permissions for applications running on Amazon EKS clusters. This feature addresses challenges in managing permissions across multiple EKS clusters, offering advantages over the IAM Roles for Service Accounts (IRSA) method, especially in the GitOps workflow.

EKS Pod Identity maintains the principle of least privilege by providing fine-grained control at the pod level. This improves security by reducing the potential attack surface compared to node-level permissions. The feature seamlessly integrates with EKS clusters running Kubernetes version 1.24 and above, working out-of-the-box with existing AWS services and SDKs.

By addressing key pain points in IAM management for Kubernetes applications, EKS Pod Identity offers a more efficient, secure, and user-friendly solution: you simply link your IAM role to your service account via the EKS Pod Identity API. As organizations scale their container deployments, this feature provides a valuable tool for simplifying access control and enhancing overall cluster management.

Overview of Amazon Controllers for Kubernetes (ACK)

The AWS Controllers for Kubernetes (ACK) is a set of custom controllers that allow you to manage AWS resources directly through Kubernetes custom resources. It provides a Kubernetes-native way to manage AWS services using Custom Resource Definitions (CRDs) to represent AWS resources. The controllers reconcile the desired state in Kubernetes with actual AWS resources and there is support for many AWS services like Amazon EKS, Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), etc. In this post, we are using the ACK service controller for Amazon EKS to manage EKS Pod Identity associations, allowing pods to assume IAM roles through Kubernetes service account.

Let’s see how we can combine the usage of Argo CD and ACK to deploy applications while addressing the eventually consistent nature of the EKS Pod Identity API.

Prerequisites

High-level overview of the steps

  1. Install Amazon EKS Pod Identity Agent
  2. Install the AWS Controllers for Kubernetes (ACK)
  3. Deploy the sample app with the possible inconsistent behavior
  4. Deploy the sample app with the job validating permissions

Install Amazon EKS Pod Identity Agent

The Pod Identity Agent runs as a Kubernetes DaemonSet on your nodes and only provides credentials to pods on the node that it runs on. For more information, you can visit the documentation. From a terminal that is configured to connect to your cluster, confirm you’re connected to the right cluster.

If you’re using the EKS blueprints, you can run the following commands:

eval "$(terraform output -raw configure_kubectl)"
# Updated context arn:aws:eks:us-west-2:123456789012:cluster/getting-started-gitops in /tmp/getting-started-gitops
kubectl get pods -A
# NAMESPACE NAME READY STATUS RESTARTS AGE
# argocd argo-cd-argocd-application-controller-0 1/1 Running 0 4d1h
...

Note – You do not need to install the EKS Pod Identity Agent on EKS Auto Mode Clusters. This capability is built into EKS Auto Mode.

aws eks create-addon --cluster-name getting-started-gitops --addon-name eks-pod-identity-agent --addon-version v1.3.9-eksbuild.1

Install the AWS Controllers for Kubernetes (ACK)

For this post, we will install the ACK service controller for Amazon EKS along with its required role. Next, we will use the PodIdentityAssociation, a custom resource from this controller, to map an IAM role to a service account in the EKS cluster.

1. Creation of a role for our ACK controller and the EKS Pod Identity association

cat <<EOF > trust-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}
EOF

# Create the IAM role using the trust policy
export ACKROLE=sample-argocd-ack-eks-ackrole
aws iam create-role --role-name $ACKROLE --assume-role-policy-document file://trust-policy.json

# Let's add an inline policy to create/delete Eks Pod Identity association, and pass role permissions
cat <<EOF > policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ackekspodidentitypolicy",
            "Effect": "Allow",
            "Action": [
                "eks:CreatePodIdentityAssociation",
                "eks:TagResource"
            ],
            "Resource": "arn:aws:eks:us-west-2:123456789012:cluster/getting-started-gitops"
        },
        {
            "Sid": "ackekspodidentitypolicytag",
            "Effect": "Allow",
            "Action": [
                "eks:DescribePodIdentityAssociation",
                "eks:DeletePodIdentityAssociation",
                "eks:TagResource"
            ],
            "Resource": "arn:aws:eks:us-west-2:123456789012:podidentityassociation/getting-started-gitops/*"
        },
        {
            "Sid": "PassRoleToACKEksRole",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "iam:GetRole"
            ],
            "Resource": "arn:aws:iam::123456789012:role/sample-argocd-ack-eks-role" # Name of the role that we will create for our app
        }
    ]
}
EOF
# Edit region and account id in the Resource to match your current cluster
aws iam put-role-policy --role-name $ACKROLE --policy-name "ack-eks-podidentity-policy" --policy-document file://policy.json

# Associate the Service Account with the IAM Role through an EKS Pod Identity association, replace the role ARN and run the following command:
aws eks create-pod-identity-association --cluster-name getting-started-gitops --namespace ack-system --service-account ack-eks-controller --role-arn arn:aws:iam::123456789012:role/sample-argocd-ack-eks-ackrole

2. Install the ACK controller for EKS with Helm, edit region as needed

export SERVICE=eks
export RELEASE_VERSION=$(curl -sL https://api.github.com/repos/aws-controllers-k8s/${SERVICE}-controller/releases/latest | jq -r '.tag_name | ltrimstr("v")')
export ACK_SYSTEM_NAMESPACE=ack-system
export AWS_REGION=us-west-2
aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws
helm install --create-namespace -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller \
oci://public.ecr.aws/aws-controllers-k8s/$SERVICE-chart --version=$RELEASE_VERSION --set=aws.region=$AWS_REGION

For detailed information on the ACK installation, refer to the documentation. The installation steps demonstrate installing ACK service controller for S3. It’s possible to use the environment variable SERVICE=eks for installing ACK service controller for EKS instead.

Accessing Argo CD

In the next steps, we will deploy our application through Argo CD. While we will not be doing any action, this can be useful to see what is happening. If you’re using the Argo CD from the EKS Blueprints, you can run the following command to setup a port-forward and receive the password for your Argo CD instance.

terraform output -raw configure_argocd | bash&
# Updated context arn:aws:eks:us-west-2:123456789012:cluster/getting-started-gitops in /tmp/getting-started-gitops
# Context "arn:aws:eks:us-west-2:123456789012:cluster/getting-started-gitops" modified.
# 'admin:login' logged in successfully
# Context 'port-forward' updated
# ArgoCD Username: admin
# ArgoCD Password: <Password>
# Port Forward: http://localhost:8080
# Forwarding from 127.0.0.1:8080 → 8080
# Forwarding from [::1]:8080 → 8080
# Region and account id in the context would be as per your current cluster

By using the http://localhost:8080 in your browser with the credentials from the output, you’ll be able to find the application we will deploy next.

Deploy the sample application highlighting the problem

1 . Clone the code repository for the sample application project

git clone https://github.com/aws-samples/sample-argocd-ack-eks.git

Browse the files in the sample-argocd-ack-eks project. Since our application deploys a few components and we want to control the ordering, we used the helm.sh/hook-weight annotation through a Helm chart in a declarative GitOps way. If you don’t want to use Helm it’s also possible to get the right ordering via Argo CD Sync waves.

Resources annotated with lower wave numbers are deployed first by Argo CD. Notice that in both service-account.yaml and pod-identity-association.yaml the weight (or sync wave number) is set to -50. This means we want Argo CD to create the service account and PodIdentityAssociation resources in the same wave that runs before any other resources are created (since wave number is negative).

  • application.yaml – Define the Argo CD application using a custom resource definition
  • application-with-job.yaml – Define the Argo CD application using a custom resource definition and the job validating IAM role. There is one parameter difference from the application.yaml above to enable the job in our Helm chart.
  • The chart folder contains the details of our Helm chart.
    • values.yaml – Default value for our chart
    • Chart.yaml – Definition of our chart
  • The chart/templates folder contains the manifest files for the Kubernetes resources that will be deployed.
    • service-account.yaml – The service account definition
    • pod-identity-association.yaml – The custom resource PodIdentityAssociation which associates the service account to the IAM role. This custom resource will be managed by ACK controller for EKS. The association must be returned by the API before the pods are started, or they will not have the necessary IAM permissions.
    • deployment.yaml file in the templates folder contains the manifest for deploying the container application as a Kubernetes deployment.
    • job.yaml – defines a Kubernetes job which will check that certain environment variables are set confirming that an IAM role has been successfully associated to the pods service account. If the environment variables are not set, then the job fails with a non-zero exit code. Notice the Helm hook annotation in the job manifest file helm.sh/hook-weight: "50” and deployment manifest file helm.sh/hook-weight: "100" which means Argo CD will create the job and wait for its successful completion before attempting to create the application deployment.

2. Create the IAM role to be used by the application

There should be a trust policy for this role to be assumed by the service principal pods.eks.amazonaws.com. Copy the ARN of this IAM role for next step.

# First, create the trust policy JSON file (trust-policy.json)
cat <<EOF > trust-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
    {
        "Effect": "Allow",
        "Principal": {
            "Service": "pods.eks.amazonaws.com"
        },
        "Action": [
            "sts:AssumeRole",
            "sts:TagSession"
        ]
    }
    ]
}
EOF

# Create the IAM role using the trust policy
aws iam create-role --role-name sample-argocd-ack-eks-role --assume-role-policy-document file://trust-policy.json
# Attach  here more permissions as needed.

3. Open application.yaml file, the following parameters can be replaced with your own

  • roleArn – Replace with the ARN of IAM role created in the step above (will be replaced via export)
  • Change any parameter needed to match your cluster in Argo CD. Currently it will deploy to a cluster name getting-started-gitops in the sample-argocd-ack-eks namespace.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd-ack-eks
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/aws-samples/sample-argocd-ack-eks.git
    path: chart
    targetRevision: main
    helm:
      parameters:
        - name: "roleArn"
          value: "arn:aws:iam::123456789012:role/sample-argocd-ack-eks-role"
        - name: "clusterName"
          value: "getting-started-gitops"
        - name: "jobWorkaround"
          value: "disabled"
  destination:
    namespace: sample-argocd-ack-eks
    server: https://kubernetes.default.svc
  syncPolicy:
    syncOptions:
      - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true

4. Create the Argo CD application

export ROLEARN=<ARN from CreateRole above>
cat application.yaml | sed -e "s#arn:aws:iam::123456789012:role/sample-argocd-ack-eks-role#$ROLEARN#" | kubectl apply -f -

5. Confirm the role from inside our container

Due to the eventually consistent nature of the EKS Pod Identity API, coupled with the speed of automation, we may encounter failures in associating the role with the service account. In our setup, our pods can end up with the node group role (arn:aws:sts::123456789012:assumed-role/initial-eks-node-group-…). At scale, this can become a recurring problem and lead to multiple disruption per day for your teams.

Note: the fallback role depends on your hop count setting, see IMDSv2 configuration for more information

Run the aws sts get-caller-identity from inside the container.

kubectl exec -ti -n sample-argocd-ack-eks deploy/argo-ack-eks-static -- aws sts get-caller-identity
# Example Output of the node role (instead of the EKS Pod Identity Role)
# {
# "UserId": "AIDACKCEVSQ6C2EXAMPLE:i-0bdba4fef68f7596d",
# "Account": "123456789012",
# "Arn": "arn:aws:sts::123456789012:assumed-role/initial-eks-node-group-20241113183043383700000003/i-0bdba4fef68f7596d"
# }

Let’s look at a way to automate that will be compatible with automated deployment.

Validate the correct role is passed via a job

Using a job, we can validate that the EKS Pod Identity Agent is giving us the credentials of the configured IAM role before starting the main application pods (here a Kubernetes deployment). To enable this, we change a parameter passed to the helm chart:

# Source: argo-ack-eks/templates/job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: argo-ack-eks-static
  name: argo-ack-eks-static
  annotations:
    helm.sh/hook-weight: "50"
spec:
  template:
    spec:
      serviceAccountName: argo-ack-eks-static
      restartPolicy: Never
      containers:
- name: argo-ack-eks
          image: {{ .Values.image }} # We need the aws-cli
          env:
          - name: EXPECTED_ASSUMED_ROLE_ARN
            value: {{ regexReplaceAll ":role/" .Values.roleArn ":assumed-role/" }}
          command:
            - sh
            - "-c"
            - "env | grep AWS_CONTAINER ; if [ $? -ne 0 ]; then echo 'No env set'; exit 1; else echo 'Found'; env; fi"
            - aws sts get-caller-identity --output text --query "Arn" | grep -qi "$EXPECTED_ASSUMED_ROLE_ARN"; then echo "Found"; else exit 1; fi
  backoffLimit: 4
  activeDeadlineSeconds: 60

This job validates the IAM role from the session credentials with the role from the environment variable. To enable the job:

  1. Remove the previously created deployment and delete the PodIdentityAssociation to reproduce the initial state.
kubectl delete -f application.yaml 
# Finalizers will delete all resources via cascade
  1. In the application-with-job.yaml, the jobWorkaround parameter was changed and pass to helm.
- name: "jobWorkaround"
  value: "enabled" # Change this line
  1. Install the application by using the role created in the previous section (same role).
export ROLEARN=<ARN from CreateRole above>
cat application-with-job.yaml | sed -e "s#arn:aws:iam::123456789012:role/sample-argocd-ack-eks-role#$ROLEARN#" | kubectl apply -f -

When deployment is completed, using the same command as before, we can confirm we got the expected role.

kubectl exec -ti -n sample-argocd-ack-eks deploy/argo-ack-eks-static -- aws sts get-caller-identity
# {
# "UserId": "AIDACKCEVSQ6C2EXAMPLE:eks-getting-st-argo-ack-e-01bf907c-2456-4bb0-b669-370d85255a3a",
# "Account": "123456789012",
# "Arn": "arn:aws:sts::123456789012:assumed-role/sample-argocd-ack-eks-role/eks-getting-st-argo-ack-e-01bf907c-2456-4bb0-b669-370d85255a3a"
# }

We can see our job performed the task of verifying IAM role was successfully associated with the service account (here a second job was needed before the role was available):

kubectl describe jobs -n sample-argocd-ack-eks
# <snip>
#Events:
# Type Reason Age From Message
# ---- ------ ---- ---- -------
# Normal SuccessfulCreate 93s job-controller Created pod: argo-ack-eks-static-27hdf
# Normal SuccessfulCreate 83s job-controller Created pod: argo-ack-eks-static-jf9q5
# Normal Completed 80s job-controller Job completed

Alternative Solution: Changing ARGOCD_SYNC_WAVE_DELAY

Argo CD allows you to control the deployment order of specific resources through sync waves. The default delay between waves is 2 seconds. We experimented with this delay and have seen a reduction in failure by increasing its value. While this may work in some cases, this had the adverse effect of slowing down all deployments (even if they don’t use Pod Identity). So, while this approach is simpler, it may not fit everyone’s use case.

Cleanup

To clean up, follow the following steps:

  • Delete application from Argo CD
kubectl delete -f application-with-job.yaml
  • Delete IAM role
aws iam delete-role --role-name sample-argocd-ack-eks-role
  • Helm uninstall the ACK controller for EKS and role
export SERVICE=eks
export ACK_SYSTEM_NAMESPACE=ack-system
export AWS_REGION=us-west-2
helm uninstall -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller
aws eks delete-pod-identity-association --cluster-name getting-started-gitops --association-id <associationId>
aws iam delete-role-policy --role-name sample-argocd-ack-eks-ackrole --policy-name ack-eks-podidentity-policy
aws iam delete-role --role-name sample-argocd-ack-eks-ackrole
  • Delete the EKS Pod Identity Agent add-ons
aws eks delete-addon --cluster-name getting-started-gitops --addon-name eks-pod-identity-agent
  • Destroy the rest of the Terraform stack
terraform destroy

Conclusion

In this post, we explored how to effectively manage EKS Pod Identity associations at scale when using Argo CD and AWS Controllers for Kubernetes (ACK). We demonstrated the challenges that arise from the eventually consistent nature of the EKS Pod Identity API and presented two solutions to address this:

  1. Using a validation job that verify the EKS Pod Identity association is properly configured before deploying the application workloads
  2. Adjusting the ARGOCD_SYNC_WAVE_DELAY parameter, though this may impact overall deployment speed

By implementing these solutions, you can reliably automate the deployment of applications that require specific IAM permissions while maintaining GitOps practices. The approach of using a validation job provides a more targeted solution that doesn’t impact other deployments, making it suitable for production environments.

We encourage you to try these approaches in your own environment and choose the solution that best fits your specific needs. To learn more about EKS Pod Identity, visit the AWS documentation or explore the sample code provided in this post’s repository.


About the authors

Mathieu Bruneau is a Containers Specialist Solutions Architect at Amazon Web Services Canada. He’s been bridging discussions between Operations and Developers teams’ way before the term DevOps became popular. Math is located in Montreal, Canada and enjoys spending time with wife and his 3 boys, either playing video games or throwing some frisbees around.

Ahmed Elhosary is a Senior Technical Account Manager (TAM) with AWS. He is a member of the Canada East Enterprise support team and the technical field community for containers.

Martin Guy Lapointe is a Senior Solutions Architect with AWS Canada.

Rishi Gera is a Senior Solutions Architect with AWS Canada.