Running stateful workloads with Amazon EKS on AWS Fargate using Amazon EFS

With Amazon Elastic Kubernetes Service (EKS), you have the choice to run Kubernetes pods on EC2 instances or AWS Fargate. AWS Fargate, a serverless compute engine for containers, allows you to run Kubernetes workloads without creating and managing servers, scaling your data plane, right-sizing EC2 instances, or dealing with worker nodes upgrades. Fargate, thus far, has been ideal for running stateless containerized workloads in a secure and cost-effective manner. Secure because Fargate runs each pod in a VM-isolated environment and patches nodes automatically. Cost-effective because, in Fargate, you only pay for the compute resources you have configured for your pod. The recently released native integration with Amazon Elastic File System (EFS) supplies the missing piece of the puzzle needed to run stateful Kubernetes workloads on Fargate.

With WordPress as the sample workload, this post shows you how to run stateful Kubernetes workloads on Fargate using Amazon EFS. WordPress is an open-source content management system (CMS) for building websites and blogs. Fargate support for EFS enables you to run applications that need to persist data outside of the container file system, like WordPress, without undertaking undifferentiated heavy lifting. Just as Fargate allows you to run serverless containers, EFS also offers highly available, durable, and petabyte-scale storage without servers.

Amazon EFS provides massively parallel shared access that automatically grows and shrinks as files are added and removed. Multiple containers and EC2 instances can simultaneously perform read and write operations on shared EFS file systems. Having a persistent storage layer for your pods makes Fargate suitable for Kubernetes workloads like data analytics, media processing, content management, web serving, and many others that require functionalities like low latency, high throughput, and read-after-write consistency.

Stateful workloads in Kubernetes

While containers by themselves are ephemeral, Kubernetes supports running stateful workloads by attaching persistent volumes to pods. A pod with a persistent volume attached can store data that can outlive the pod itself. If the pod crashes or terminates, another pod attaches the volume and resumes the work without losing data.

The Kubernetes Container Storage Interface (CSI) helps you run stateful containerized applications. CSI drivers provide a CSI interface that allows Kubernetes clusters to manage the lifecycle of persistent volumes. Amazon EKS makes it easier for you to run stateful workloads by offering CSI drivers for these three AWS storage services:

Amazon EFS (supports Fargate and EC2): a fully managed, scalable, and elastic file system well suited for big data analytics, web serving and content management, application development and testing, media and entertainment workflows, database backups, and container storage. EFS stores your data redundantly across multiple Availability Zones (AZ) and offers low latency access from Kubernetes pods irrespective of the AZ in which they are running.
Amazon EBS (supports EC2 only): a block storage service that provides direct access from EC2 instances and containers to a dedicated storage volume designed for both throughput and transaction-intensive workloads at any scale.
FSx for Lustre (supports EC2 only): a fully managed, high-performance file system optimized for workloads such as machine learning, high-performance computing, video processing, financial modeling, electronic design automation, and analytics. With FSx for Lustre, you can quickly create a high-performance file system linked to your S3 data repository and transparently access S3 objects as files.

Currently, pods running on Fargate can use Amazon EFS to store data. Fargate automatically installs the Amazon EFS CSI driver for you, but if you also have EC2 nodes in your cluster, you will have to install the EFS CSI driver yourself.

StatefulSets with Amazon EFS

Kubernetes allows requesting and associating persistent storage with pods using persistent volumes and persistent volume claims. StatefulSets create volumes on the fly using a volumeClaimTemplate. This is called dynamic provisioning, which allows StatefulSets to create storage volumes on-demand, as it creates pods. Without dynamic provisioning, you must create persistent volume(s) manually before StatefulSets can create pods.

EFS support for dynamic provisioning is under development, and you can track the feature here. This feature will add support for dynamic provisioning via EFS access points. The EFS CSI driver will provision a new persistent volume by creating an access point on an existing EFS file system. Even though the EFS CSI driver doesn’t support dynamic provisioning, you can still use EFS to provide storage for StatefulSets. You’ll just have to create volumes manually before you create a StatefulSet.

Solution

We will create an Amazon EKS cluster and create a Fargate profile that enables Kubernetes to run pods on AWS Fargate. Once the cluster is ready, we will use Helm to install WordPress, which will be exposed publicly using an Application Load Balancer.

The WordPress pods can run in any of the three AZs within the AWS Region. Pods in each AZ will mount the EFS file system using the local EFS mount target in that AZ. We will also use Amazon RDS for MySQL to create a MySQL database instance for WordPress database.

We are aware of the lack of a caching layer (like ElastiCache for Memcached) in this architecture. You can follow this guide to learn about speeding up WordPress with Amazon ElastiCache for Memcached.

You will need the following to complete the tutorial:

Let’s start by setting a few environment variables:

WOF_AWS_REGION=us-west-2 <-- Change this to match your region
WOF_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
WOF_EKS_CLUSTER=eks-fargate-stateful

Create an EKS cluster

Create a new EKS cluster without any EC2-based worker nodes. eksctl makes it easier to create an EKS cluster in which pods in the default and kube-system namespaces run on Fargate.

eksctl create cluster \
  --name $WOF_EKS_CLUSTER \
  --version 1.18 \
  --region $WOF_AWS_REGION \
  --fargate

With the --fargate option, eksctl creates a pod execution role and Fargate profile, and patches the coredns deployment so that it can run on Fargate.

Store the VPC ID and it’s CIDR block into environment variables:

WOF_VPC_ID=$(aws eks describe-cluster --name $WOF_EKS_CLUSTER --query "cluster.resourcesVpcConfig.vpcId" --region $WOF_AWS_REGION --output text)
WOF_CIDR_BLOCK=$(aws ec2 describe-vpcs --vpc-ids $WOF_VPC_ID --query "Vpcs[].CidrBlock" --region $WOF_AWS_REGION --output text)

Create an EFS filesystem

We need to create an EFS filesystem before we can create a persistent volume.

Create an EFS file system:

WOF_EFS_FS_ID=$(aws efs create-file-system \
  --creation-token WordPress-on-Fargate \
  --encrypted \
  --performance-mode generalPurpose \
  --throughput-mode bursting \
  --tags Key=Name,Value=WordpressVolume \
  --region $WOF_AWS_REGION \
  --output text \
  --query "FileSystemId")

Create an EFS access point:

WOF_EFS_AP=$(aws efs create-access-point \
  --file-system-id $WOF_EFS_FS_ID \
  --posix-user Uid=1000,Gid=1000 \
  --root-directory "Path=/bitnami,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=777}" \
  --region $WOF_AWS_REGION \
  --query 'AccessPointId' \
  --output text)

EFS access points are application-specific entry points into an EFS file system that make it easier to manage application access to shared datasets. Access points can enforce a user identity, including the user’s POSIX groups, for all file system requests that are made through the access point. They can also enforce a different root directory for the file system so that clients can only access data in the specified directory or its sub-directories. To further understand EFS security model and how it works with containers, please read Massimo Re Ferre’s developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 2.

Next we need a security group for the file system that allows inbound NFS traffic (port 2049):

WOF_EFS_SG_ID=$(aws ec2 create-security-group \
  --description WordPress-on-Fargate \
  --group-name WordPress-on-Fargate \
  --vpc-id $WOF_VPC_ID \
  --region $WOF_AWS_REGION \
  --query 'GroupId' --output text)
  
aws ec2 authorize-security-group-ingress \
  --group-id $WOF_EFS_SG_ID \
  --protocol tcp \
  --port 2049 \
  --cidr $WOF_CIDR_BLOCK

Create EFS mount targets for the volume in all subnets used in the Fargate profile that eksctl created:

for subnet in $(aws eks describe-fargate-profile \
  --output text --cluster-name $WOF_EKS_CLUSTER\
  --fargate-profile-name fp-default  \
  --region $WOF_AWS_REGION  \
  --query "fargateProfile.subnets"); \
do (aws efs create-mount-target \
  --file-system-id $WOF_EFS_FS_ID \
  --subnet-id $subnet \
  --security-group $WOF_EFS_SG_ID \
  --region $WOF_AWS_REGION); \
done

In the EKS cluster that the command above creates, EKS schedules pods on Fargate across multiple AZs. Fargate pods in each AZ will mount the EFS file system using an EFS mount target in that AZ.

Create a persistent volume

Create a persistent volume and a persistent volume claim using your EFS file system:

echo "
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  attachRequired: false
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: wordpress-efs-pv
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc
  csi:
    driver: efs.csi.aws.com
    volumeHandle: $WOF_EFS_FS_ID::$WOF_EFS_AP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: wordpress-efs-uploads-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 25Gi   
" | kubectl apply -f -

Deploy the AWS Load Balancer Controller

We will use an Application Load Balancer to distribute traffic to the pods that run WordPress. We have to install the AWS Load Balancer Controller to use ALB as ingress for Kubernetes workloads. We also need to create an IAM role for the controller, so it has permissions to manage ALBs on your behalf.

Let’s associate an OIDC provider with the EKS cluster and create an IAM role for the controller:

## Associate OIDC provider
eksctl utils associate-iam-oidc-provider \
  --region $WOF_AWS_REGION \
  --cluster $WOF_EKS_CLUSTER\
  --approve

## Download the IAM policy document
curl -S https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v2_ga/docs/install/iam_policy.json -o iam-policy.json

## Create an IAM policy
aws iam create-policy \
  --policy-name AWSLoadBalancerControllerIAMPolicy \
  --policy-document file://iam-policy.json
  
## Create a service account
eksctl create iamserviceaccount \
  --cluster=$WOF_EKS_CLUSTER \
  --region $WOF_AWS_REGION \
  --namespace=kube-system \
  --name=aws-load-balancer-controller \
  --override-existing-serviceaccounts \
  --attach-policy-arn=arn:aws:iam::$WOF_ACCOUNT_ID:policy/AWSLoadBalancerControllerIAMPolicy \
  --approve

The AWS Load Balancer Controller uses cert-manager to inject certificate configuration into the webhooks. Create a Fargate profile for cert-manager namespace so Kubernetes can schedule cert-manager pods on Fargate:

eksctl create fargateprofile \
  --cluster $WOF_EKS_CLUSTER \
  --name cert-manager \
  --namespace cert-manager \
  --region $WOF_REGION

Install the AWS Load Balancer Controller using Helm:

helm repo add eks https://aws.github.io/eks-charts

kubectl apply -k "github.com/aws/eks-charts/stable/aws-load-balancer-controller//crds?ref=master"

helm install aws-load-balancer-controller \
  eks/aws-load-balancer-controller \
  --namespace kube-system \
  --set clusterName=$WOF_EKS_CLUSTER \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set vpcId=$WOF_VPC_ID \
  --set region=$WOF_AWS_REGION

Create a MySQL instance

WordPress uses MySQL version 5.0.15 or greater (or any version of MariaDB) to store posts, comments, settings, and user information. Please see WordPress documentation for an overview of the WordPress database schema.

Let’s create a MySQL instance for WordPress using Amazon RDS:

## Get VPC's private subnets
WOF_PRIVATE_SUBNETS=$(aws eks describe-fargate-profile \
  --fargate-profile-name fp-default  \
  --cluster-name $WOF_EKS_CLUSTER\
  --region $WOF_AWS_REGION \
  --query "fargateProfile.[subnets]" --output text | awk -v OFS="," '{for(i=1;i<=NF;i++)if($i~/subnet/)$i="\"" $i "\"";$1=$1}1')

## Create a DB subnet group
aws rds create-db-subnet-group \
  --db-subnet-group-name wp-mysql-subnet \
  --subnet-ids "[$WOF_PRIVATE_SUBNETS]" \
  --db-subnet-group-description "Subnet group for MySQL RDS" \
  --region $WOF_AWS_REGION
  
## Create database instance
 aws rds create-db-instance \
  --db-instance-identifier wp-db \
  --db-instance-class db.t3.micro \
  --db-name wordpress \
  --db-subnet-group-name wp-mysql-subnet \
  --engine mysql \
  --master-username admin  \
  --master-user-password supersecretpassword \
  --allocated-storage 20 \
  --no-publicly-accessible \
  --region $WOF_AWS_REGION

Database creation can take up to five minutes. You can check the status of the database using watch:

watch aws rds describe-db-instances \
  --db-instance-identifier wp-db \
  --region $WOF_AWS_REGION \
  --query "DBInstances[].DBInstanceStatus"

When the output says available, you can proceed to the next steps. The WordPress application will fail to initialize if it cannot connect to the database.

Once the database instance is available, store the RDS DB endpoint:

WOF_RDS_Endpoint=$(aws rds describe-db-instances \
  --db-instance-identifier wp-db \
  --region $WOF_AWS_REGION \
  --query "DBInstances[].Endpoint.Address" \
  --output text)

We recommend that you create a multi-AZ RDS cluster in production environments. Please see modifying a DB instance to be a Multi-AZ deployment for the procedure. You can also use Amazon Aurora Serverless, which automatically starts up, shuts down, and scales database capacity based on your application’s needs. It allows you to run your database without managing servers, much like Fargate and EFS.

Authorize inbound MySQL traffic (port 3305) in the security group attached to the MySQL database instance:

## Get the security group attached to the RDS instance
WOF_RDS_SG=$(aws rds describe-db-instances \
  --db-instance-identifier wp-db \
  --region $WOF_AWS_REGION \
  --query "DBInstances[].VpcSecurityGroups[].VpcSecurityGroupId" \
  --output text)
  
## Accept MySQL traffic
aws ec2 authorize-security-group-ingress \
  --group-id $WOF_RDS_SG \
  --cidr $WOF_CIDR_BLOCK \
  --port 3306 \
  --protocol tcp \
  --region $WOF_AWS_REGION

Deploy WordPress

By default, the Bitnami WordPress image that we’re going to use stores the WordPress data and configurations at the /bitnami path of the container. Configuring /bitnami to point to a shared EFS file system allows us to scale the WordPress pods while running them in multiple Availability Zones simultaneously.

We will deploy WordPress using Helm. Let’s create a Helm values file that contains the WordPress configuration:

cat > values.yaml <<EOF
## Database Settings
externalDatabase:
  host: $WOF_RDS_Endpoint
  user: admin
  password: supersecretpassword
  database: wordpress

## Disable MariaDB
mariadb:
  enabled: false

## run multiple WordPress pods
replicaCount: 3

## Use EFS pvc
persistence:
  existingClaim: wordpress-efs-uploads-pvc

## Change from LoadBalancer to ClusterIP service type since ALB will expose  
service:
  type: ClusterIP

## Increase pod resources  
resources:
  requests:
    cpu: 1000m
    memory: 1024Mi
EOF

Install WordPress:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install myblog -f values.yaml bitnami/wordpress

Create an ingress so users external to the cluster can access WordPress:

echo "
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: wordpress-ingress
  annotations:
    kubernetes.io/ingress.class: "alb"
    alb.ingress.kubernetes.io/scheme: "internet-facing"
    alb.ingress.kubernetes.io/healthcheck-path: "/index.php"
    alb.ingress.kubernetes.io/success-codes: "200,201,302"
    alb.ingress.kubernetes.io/target-type: "ip"
  labels:
    app: wordpress-ingress
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: myblog-wordpress
              servicePort: 80
" | kubectl apply -f -

This WordPress deployment is configured to listen for HTTP traffic only. You can implement TLS encryption by creating a certificate using Amazon Certificate Manager and annotating the Kubernetes ingress with the certificate’s ARN as explained here.

Test data persistence

The setup has three pods running WordPress currently. Let’s login to WordPress and make some changes. After modifying the defaults, we will terminate all WordPress pods and recreate them to verify that the modifications persist.

Get the WordPress service’s DNS name and open the address in your web browser:

echo $(kubectl get ingress wordpress-ingress \
  -o jsonpath="{.status.loadBalancer.ingress[].hostname}")/wp-admin/

Navigate to the address shown in the previous command’s output, and you will be taken to the WordPress admin portal. Follow the steps on the website to complete the set up of your sample WordPress site:

Once you complete the installation, you can change the theme of the WordPress site. WordPress will store the theme files in /bitnami/wp-content/themes/ folder. Once we set the theme, we’ll delete the WordPress pods and recreate them. The new pods will still have access to the data from older pods, and the site theme shouldn’t revert to the default theme.

Login to the WordPress dashboard with the credentials you used during the installation process. Click on the name of the site, located at the top-right corner.

You will be taken to the site, choose themes from the site menu:

In the themes menu, select a new theme and activate it:

Return to the site and verify that your site now uses the theme you selected. After that, delete the WordPress pods by scaling the WordPress deployment to zero:

kubectl scale deployment myblog-wordpress --replicas=0

If you refresh the page on your WordPress site, ALB will return an error as there are no backend pods. Scale the deployment to three pods:

kubectl scale deployment myblog-wordpress --replicas=3

Once the pods are running, refresh the page on your WordPress site, and you’ll see the site with the theme changes you applied. We can conclude that WordPress data and configuration survived pod termination.

Cleanup

Use the following commands to delete resources created during this post:

helm delete myblog
kubectl delete ingress wordpress-ingress
helm delete aws-load-balancer-controller --namespace kube-system
eksctl delete iamserviceaccount --cluster $WOF_EKS_CLUSTER --name aws-load-balancer-controller --namespace kube-system --region $WOF_AWS_REGION
kubectl delete pvc wordpress-efs-uploads-pvc
kubectl delete pv wordpress-efs-pv
aws rds delete-db-instance --db-instance-identifier wp-db --skip-final-snapshot --region $WOF_AWS_REGION
## Wait until the database is deleted
aws rds delete-db-subnet-group --db-subnet-group-name wp-mysql-subnet
for mount_target in $(aws efs describe-mount-targets --file-system-id $WOF_EFS_FS_ID --region $WOF_AWS_REGION --query 'MountTargets[].MountTargetId' --output text); do aws efs delete-mount-target --mount-target-id $mount_target; done
## Wait for a few seconds 
sleep 30
aws efs delete-file-system --file-system-id $WOF_EFS_FS_ID --region $WOF_AWS_REGION
aws ec2 delete-security-group --group-id $WOF_EFS_SG_ID --region $WOF_AWS_REGION
eksctl delete cluster $WOF_EKS_CLUSTER --region $WOF_AWS_REGION

Conclusion

The integration with AWS Fargate and Amazon EFS file systems allows you to run stateful workloads using Amazon EKS and Amazon ECS. Amazon EFS enables thousands of pods or EC2 instances to read and write to a shared volume simultaneously. This allows you to use Fargate with many solutions like web hosting, content management, media processing workflows, and many more.

Further reading

Containers