Containers

Backup and restore your Amazon EKS cluster resources using Velero

September 9th, 2023: This post was originally published December 1, 2021. We’ve updated the walkthrough instructions of this blog post to support the latest EKS versions and changes to the Velero Helm chart.


Companies worldwide are adopting containers to encapsulate their microservices, and many of them choose Kubernetes for automating deployment, scaling, and managing their containerized applications. As the number of these microservices grows, it becomes increasingly important to have a centralized backup mechanism in place to:

  • Protect applications in case of physical and logical errors
  • Perform migrations between Kubernetes clusters
  • Replicate production clusters to development and testing environments

Velero is a popular open-source tool that can provide Kubernetes cluster disaster recovery, data migration, and data protection. Velero can back up Kubernetes cluster resources and persistent volumes to externally supported storage backend on demand or by schedule.

AWS customers can leverage this solution to centrally back up and restore Kubernetes objects and applications from and to Amazon Elastic Kubernetes Service (Amazon EKS), our managed solution that helps you provide highly available and secure Kubernetes clusters and automates key tasks such as patching, node provisioning, and updates. This means that customers can also use Velero to migrate from self-hosted Kubernetes to Amazon EKS.

In this blog post, we will focus on how to use Velero to back up, restore, and migrate your Amazon EKS cluster resources and understand the backup options that Velero offers to decide which approach best suits your organization’s use case.

Overview of Velero

In this section, you will familiarize yourself with how Velero integrates with Amazon EKS, the customizations that this tool offers for backing up and restoring applications, and the backup-restore workflow.

Velero and Amazon EKS

An application-level backup in Amazon EKS targets two components:

  • Kubernetes objects and configurations stored in the etcd key/value store
  • Application data stored in persistent volumes

In Amazon EKS, the etcd key/value store is managed by AWS and is only accessible through the Kubernetes API Server. Velero leverages the Kubernetes API to retrieve this data from the key/value store. This approach provides more flexibility than accessing etcd directly because with API calls, you can easily filter resources by namespace, resource type, or label. For example, you could limit the scope of your backups to a specific application, filtering by labels, or save your current RBAC strategy, filtering by object type.

Velero also takes snapshots of the cluster’s persistent volumes and restores them alongside the cluster’s objects (details in next section).

Backup and restore operations are declared as Kubernetes Custom Resource Definition (CRD) objects and are managed by controllers that process these new CRD objects to perform backups, restores, and all related operations. When creating these backup and restore CRD objects, you can specify the following customizations:

  • Filter resources: restrict the scope of a backup or restore filtering by namespace, object type, or label. When restoring, you can also filter by excluding namespaces and object types.
  • Choose the backup type: create on-demand backup or set schedules to initiate backups automatically at recurring intervals.
  • Set retention times: indicate how long you want to retain backups.
  • Specify hooks: configure pre- and post-hooks to run custom commands in containers before and after a backup or restore operation.

Backup and restore workflow

Velero consists of two components:

  • A Velero server pod that runs in your Amazon EKS cluster
  • A command-line client (Velero CLI) that runs locally

Whenever we issue a backup against an Amazon EKS cluster, Velero performs a backup of cluster resources in the following way:

  1. The Velero CLI makes a call to the Kubernetes API server to create a backup CRD object.
  2. The backup controller:
    1. Checks the scope of the backup CRD object, namely if we set filters.
    2. Queries the API server for the resources that need a backup.
    3. Compresses the retrieved Kubernetes objects into a .tar file and saves it in Amazon S3.

Backup workflow with Velero

Similarly, whenever we issue a restore operation:

  1. The Velero CLI makes a call to Kubernetes API server to create a restore CRD that will restore from an existing backup.
  2. The restore controller:
    1. Validates the restore CRD object.
    2. Makes a call to Amazon S3 to retrieve backup files.
    3. Initiates restore operation.

Restore workflow with Velero

Velero also performs backup and restore of any persistent volume in scope:

  1. If you are using Amazon Elastic Block Store (Amazon EBS), Velero will create Amazon EBS snapshots of persistent volumes in scope.
  2. For any other volume type (except hostPath), use Velero’s Restic integration to take file-level backups of the contents of your volumes. At the time of writing, Restic is in Beta, and therefore not recommended for production-grade backups.

In the next section, we will show how to back up an application in Amazon EKS and the related EBS volumes.

Walkthrough

The following sections will demonstrate how you can use Velero to back up an application in one cluster and restore the application in another. We will use the popular open-source Ghost publishing platform to demonstrate how to backup and restore not only an application definition but also its state stored on an EBS volume using a Persistent Volume Claim (PVC).

Prerequisites

To be able to follow along with the next steps, you will need to have the following prerequisites:

The two EKS clusters we used for this walkthrough are in the same account, but this is not a hard requirement for using Velero. In this case, you can still use this blog post as a guideline and adjust IAM and S3 bucket permissions accordingly.

Notice that the commands in the following sections are written in Bash.

Install Velero

There are a few steps required to install Velero using EKS best practices. First, we will create an S3 bucket to store the backups. We will then use IAM roles for service accounts to grant Velero necessary AWS permissions to perform backup and restore operations. Lastly, we will install the Velero CLI to simplify how we interact with this tool.

Create an S3 Bucket to store backups

Velero uses S3 to store EKS backups when running in AWS. Run the following command to create an S3 bucket for Velero. Be sure to use a unique bucket name like <company-fqdn>-eks-velero-backups.

Replace <BUCKETNAME> and <REGION> with your own values below.

Replace <BUCKETNAME> and <REGION> with your own values below.
BUCKET=<BUCKETNAME>
REGION=<REGION>
aws s3 mb s3://$BUCKET --region $REGION

Although Amazon S3 stores your data across multiple geographically distant Availability Zones by default, compliance requirements might dictate that you store data at even greater distances. Cross-Region Replication allows you to replicate data between distant AWS Regions to satisfy these requirements.

IAM Policy

Velero performs a number of API calls to resources in EC2 and S3 to perform snapshots and save the backup to the S3 bucket. The following IAM policy will grant Velero the necessary permissions.

cat > velero_policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF

aws iam create-policy \
    --policy-name VeleroAccessPolicy \
    --policy-document file://velero_policy.json

Create Service Accounts for Velero

The best practice for providing AWS policies to applications running on EKS clusters is to use IAM Roles for Service Accounts. eksctl provides an easy way to create the required IAM role and scope the trust relationship to the velero-server Service Account.

Replace <CLUSTERNAME> with the name of your Primary and Recovery EKS cluster.

PRIMARY_CLUSTER=<CLUSTERNAME>
RECOVERY_CLUSTER=<CLUSTERNAME>
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)

eksctl create iamserviceaccount \
    --cluster=$PRIMARY_CLUSTER \
    --name=velero-server \
    --namespace=velero \
    --role-name=eks-velero-backup \
    --role-only \
    --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy \
    --approve

eksctl create iamserviceaccount \
    --cluster=$RECOVERY_CLUSTER \
    --name=velero-server \
    --namespace=velero \
    --role-name=eks-velero-recovery \
    --role-only \
    --attach-policy-arn=arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy \
    --approve

The --namespace=velero flag ensures that only an application running in the velero namespace will be able to access the IAM Policy created in the previous step.

Install Velero in both EKS Clusters

The instructions below include the necessary steps to install Velero using the Helm chart. Note the chart is pinned to version 5.0.2 which installs Velero version v1.11.1. If you want to install a newer Velero version, please be sure to adjust the values files below, including matching the Velero AWS plugin version to the correct Velero version using the compatibility matrix.

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts

cat > values.yaml <<EOF
configuration:
  backupStorageLocation:
  - bucket: $BUCKET
    provider: aws
  volumeSnapshotLocation:
  - config:
      region: $REGION
    provider: aws
initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.7.1
  volumeMounts:
  - mountPath: /target
    name: plugins
credentials:
  useSecret: false
serviceAccount:
  server:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-backup"
EOF

cat > values_recovery.yaml <<EOF
configuration:
  backupStorageLocation:
  - bucket: $BUCKET
    provider: aws
  volumeSnapshotLocation:
  - config:
      region: $REGION
    provider: aws
initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.7.1
  volumeMounts:
  - mountPath: /target
    name: plugins
credentials:
  useSecret: false
serviceAccount:
  server:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::${ACCOUNT}:role/eks-velero-recovery"
EOF

We need to install the Velero server twice: once in the Primary cluster and again in the Recovery cluster. You can use kubectl config (kubectl cheat sheet) or kubectx to view the contexts for both clusters and easily switch contexts.

For easier management of kubectl config, we add our clusters to kubeconfig with an alias:

PRIMARY_CONTEXT=primary
RECOVERY_CONTEXT=recovery
aws eks --region $REGION update-kubeconfig --name $PRIMARY_CLUSTER --alias $PRIMARY_CONTEXT
aws eks --region $REGION update-kubeconfig --name $RECOVERY_CLUSTER --alias $RECOVERY_CONTEXT

We can check that we have these new contexts with the following command:

kubectl config get-contexts

Checking new context with command

The “*” indicates which is the context we are in.

Change the context to your Primary cluster and install Velero:

kubectl config use-context $PRIMARY_CONTEXT
helm install velero vmware-tanzu/velero --version 5.0.2 \
    --create-namespace \
    --namespace velero \
    -f values.yaml

Now change the context to your Recovery cluster and proceed to install Velero:

kubectl config use-context $RECOVERY_CONTEXT
helm install velero vmware-tanzu/velero --version 5.0.2 \
    --create-namespace \
    --namespace velero \
    -f values_recovery.yaml

We can check that the Velero server was successfully installed by running this command in each context:

kubectl get pods -n velero

Checking Velero server install

Install the Velero CLI

Velero operates by submitting commands as CRDs. To take a backup of the cluster, you submit to the cluster a backup CRD. These can be difficult to create by hand, so the Velero team has created a CLI that makes it easy to perform backups and restores. We will be using the Velero CLI to create a backup of the Primary cluster and restore to the Recovery cluster.

Installation instructions vary depending on your operating system. Follow the instructions to install Velero here.

Backup and restore an example application

With Velero installed, we will move forward with installing an application in our Primary cluster that we will back up and restore in our Recovery cluster. Customers will be able to follow the steps below to back up and restore their own applications in their own Amazon EKS clusters as well.

Install Ghost app (and create a post)

Ghost will serve as our sample application that we will back up on the Primary cluster and restore to the Recovery cluster. We will use the Bitnami Helm chart as it’s commonly deployed and well-tested. This chart depends on the Bitnami MySQL chart that will serve as the persistent data store for the blog application. The MySQL data will be stored in an EBS volume that will be snapshotted by Velero as part of performing the backup.

Now we switch to the Primary cluster’s context and install Ghost (ignore the notification ERROR: you did not provide an external host that appears when you install Ghost. This will be solved with the following commands):

kubectl config use-context $PRIMARY_CONTEXT

helm install ghost oci://registry-1.docker.io/bitnamicharts/ghost \
    --create-namespace \
    --namespace ghost

export APP_HOST=$(kubectl get svc --namespace ghost ghost --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
export GHOST_PASSWORD=$(kubectl get secret --namespace "ghost" ghost -o jsonpath="{.data.ghost-password}" | base64 -d)
export MYSQL_ROOT_PASSWORD=$(kubectl get secret --namespace "ghost" ghost-mysql -o jsonpath="{.data.mysql-root-password}" | base64 -d)
export MYSQL_PASSWORD=$(kubectl get secret --namespace "ghost" ghost-mysql -o jsonpath="{.data.mysql-password}" | base64 -d)

helm upgrade ghost oci://registry-1.docker.io/bitnamicharts/ghost \
    --namespace ghost \
    --set service.type=LoadBalancer \
    --set ghostHost=$APP_HOST \
    --set ghostPassword=$GHOST_PASSWORD \
    --set mysql.auth.rootPassword=$MYSQL_ROOT_PASSWORD \
    --set mysql.auth.password=$MYSQL_PASSWORD

We can check that the install was successful by running this command:

kubectl get pods -n ghost

Running ghost install command

Create a blog post to demonstrate backup and restore of persistent volume

After the Helm chart installation is complete, the Chart README will be displayed in the console. It includes:

  1. The Blog URL
  2. The Admin URL
  3. The default admin username
  4. Instructions to use kubectl to retrieve the password

You can optionally sign in to the Ghost Admin console (using the Admin URL displayed above) and create an example blog post that will be included in the backup and restore process. This will demonstrate that the backup includes not only the application deployment configuration but also the state of the blog database, which includes all of the posts.

To create a post, first select Posts in the left-hand navigation pane.

Creating a blog post in the Ghost console

Then select New Post in the top right-hand corner of the page.

Selecting New Post in Ghost console

You can add a post title and write some content. When you are ready to save your sample blog post, select the Publish dropdown menu item in the top right corner of the page and then choose the Publish button in the dropdown.

To view your blog with your newly added content, open a new browser tab and enter the blog URL. You will see the Ghost blog with the default theme along with your new blog post and a few other sample blogs that present in the default installation.

Viewing your new post in browser

Create Backup

Create a backup of the Primary cluster. Be sure your kubectl context is set to the Primary cluster before running the command below.

velero backup create ghost-backup

We can see how a Velero backup CRD looks like by using the -o flag, which outputs the backup CRD YAML without actually submitting the backup creation to the Velero server.

velero backup create test -o yaml

backing up namespaces

In the backup CRD, you can see that we are backing up all namespaces as the includedNamespaces array includes the star wildcard. Even though we are backing up the entire cluster, we can choose individual components of the cluster by using selectors. This gives us the ability to back up a single namespace, which may include a single application, for example.

Validate that the backup was successful

Let’s check on the status of the backup and validate that the backup has been completed successfully.

velero backup describe ghost-backup

Look for the field Phase: in the output of the command. If the current Phase is InProgress, then wait a few moments and try again until you see the Phase: Completed. You can see additional details of the backup, including information such as the start time and completion time, along with the number of items backed up.

Additional backup details

We can also see the backup files created by Velero in the Amazon S3 bucket we previously created:

aws s3 ls $BUCKET/backups/ghost-backup/

Backup files created by Velero

Restore the application into the Recovery cluster

Switch your kubectl context to your Recovery cluster.

kubectl config use-context $RECOVERY_CONTEXT

Use the following command to restore only the Ghost application into the Recovery cluster.

velero restore create ghost-restore \
    --from-backup ghost-backup \
    --include-namespaces ghost

Validate the restore was successful

Let’s check on the status of the restore and validate that the restore has been completed successfully.

velero restore describe ghost-restore

Look for Phase: Completed in the output. If you see Phase: InProgress, then wait a few moments and run the command again. Then retrieve the URL of the LoadBalancer for the Ghost blog in the Recovery cluster:

kubectl -n ghost get svc ghost

Verify your blog has been restored by visiting the URL under EXTERNAL-IP. You should see the Ghost blog along with any example blog posts you created in previous steps.

Congratulations! You just successfully backed up your Primary cluster and restored your application in the Recovery cluster.

Notice that for your production backup/restore/DR operation, this is the point where you’d want to move your prod DNS records to point to the Recovery cluster, after validating that the service is working as expected.

Cleaning up

To avoid incurring future charges, delete the resources. If you used eksctl to create your clusters, you can use eksctl delete cluster <clustername> to delete the clusters.

We also need to delete the S3 bucket that we used to store our backup and the IAM role used by Velero.

aws s3 rb s3://$BUCKET --force  
aws iam delete-policy --policy-arn arn:aws:iam::$ACCOUNT:policy/VeleroAccessPolicy

Conclusion

There are a few different disaster recovery and migration strategies. In this blog post, we showed how Velero ensures quick recovery from failures and disaster events as well as seamless migrations for applications and cluster resources in Amazon EKS. We highlighted the options that this tool offers and showed the process of backing up and restoring to a new cluster a stateful application. Similarly, customers can also migrate, replicate their own applications and Kubernetes resources to other Amazon EKS clusters, or restore previous application states.

This approach enables you to centralize operations for disaster recovery or migration events as opposed to simply redirecting the CI/CD pipeline to deploy into the new EKS cluster. This is because CI/CD pipelines used to deploy and update applications in Kubernetes may perform actions that are not needed in these situations; moreover, one has to think about a separate approach to deal with data persistence. An alternative could be to create a specific CI/CD pipeline for such events.

In the case of self-managed Kubernetes clusters, customers can also use this open-source tool for a migration to Amazon EKS. To dive deeper into this use case, we suggest following the best practices described in this blog post.

If you have any comments or questions, please leave them in the comments section.

Aaron Miller

Aaron Miller

Aaron Miller is a Principal Specialist Solutions Architect at Amazon Web Services. He helps customers modernize, scale and adopt best practices for their containerized workloads. Prior to AWS, Aaron worked at Docker and Heptio helping customers move workloads to Kubernetes.

Federica Ciuffo

Federica Ciuffo

Federica is a Solutions Architect at Amazon Web Services. She is specialized in container services and is passionate about building infrastructure with code. Outside of the office, she enjoys reading, drawing, and spending time with her friends, preferably in restaurants trying out new dishes from different cuisines.