Containers
Migrating Amazon EKS clusters from gp2 to gp3 EBS volumes
Kubernetes (sometimes referred to as K8s) is an open-source container orchestration engine and a fast-growing project hosted by the Cloud Native Computing Foundation (CNCF). K8s has a massive adoption on premises and in the cloud for running stateless and stateful containerized workloads. Stateful workloads require persistent storage. To support on-premises and cloud-provider-related infrastructure like storage and networking, Kubernetes source code originally included so-called “in-tree plugins.” Storage and cloud vendors who wanted to add new storage systems or features or just wanted to fix bugs had to rely on the Kubernetes release cycle. To decouple the life cycle of Kubernetes from vendor-specific implementations, the development of Container Storage Interface (CSI), a standard for exposing arbitrary block and file storage systems to containerized workloads on container orchestration systems like Kubernetes, was initiated. Customers can now benefit from the latest CSI driver without having to wait for new Kubernetes version releases.
AWS launched their managed Kubernetes service Amazon Elastic Kubernetes Service (Amazon EKS) at re:Invent 2017. In September 2019, AWS released support for the Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver in Amazon EKS, and in May 2021 AWS announced the general availability of this CSI plugin. The Kubernetes Amazon EBS related in-tree storage plugin—called a provisioner of type “kubernetes.io/aws-ebs”—only supports Amazon EBS types io1, gp2, sc1, and st1, and it does not support volume snapshot related features.
Our customers are now asking when and how to migrate an EKS cluster from the Amazon EBS in-tree plugin to the Amazon EBS CSI driver to make use of additional EBS volume types (like gp3 and io2) and take advantage of new features (like Kubernetes Volume Snapshots).
Container Storage Interface (CSI) migration infrastructure has been in beta feature state since Kubernetes v1.17 and is described in detail in this K8s blog post. It is important to understand that the migration will eventually remove the in-tree plugin from the Kubernetes source code, and all migrated volumes will be controlled by the CSI driver. It does not mean that those migrated PVs will get new features and attributes of the CSI driver. It only supports features that are already supported by in-tree drivers as described here.
This blog post will walk you through a migration scenario and outline the necessary steps in detail.
Note: If you are already migrated from in-tree provisioner to EBS CSI or started with EBS CSI using the default gp2 based StorageClass have a look at the following blog post “Simplifying Amazon EBS volume migration and modification on Kubernetes using the EBS CSI Driver“.
Prerequisites
You need an EKS cluster with version 1.17 or newer and a corresponding version of kubectl. Make sure you are authorized to install the Amazon EBS CSI-related objects.
Kubernetes uses so-called feature gates to implement the storage migration. The CSIMigration and CSIMigrationAWS feature for Amazon EBS, when enabled, redirects all plugin operations from the existing in-tree plugin to the ebs.csi.aws.com
CSI driver. Please note that Amazon EKS has not yet turned on the features CSIMigration
and CSIMigrationAWS
for Amazon EBS migration. Nevertheless, you can already use the Amazon EBS CSI driver in parallel to the in-tree plugin.
For the sake of the demo, we will create a dynamic PersistentVolume (PV), which we are going to migrate later.
We use dynamic volume provisioning as described in the K8s documentation.
The in-tree storage driver-based default StorageClass (SC) gp2 will be used to create a PersistentVolumeClaim (PVC) :
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 242d
$ cat ebs-gp2-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-gp2-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: gp2
$ kubectl apply -f ebs-gp2-claim.yaml
persistentvolumeclaim/ebs-gp2-claim created
The PVC is created in status “pending” because the gp2 StorageClass has a Volume Binding Mode (attribute volumeBindingMode) of WaitForFirstConsumer and there is no pod yet consuming the PVC.
$ kubectl get pvc ebs-gp2-claim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ebs-gp2-claim Pending gp2 45s
So let’s create a pod (our “demo application”) that uses the PVC:
$ cat test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: app-gp2-in-tree
spec:
containers:
- name: app
image: centos
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: ebs-gp2-claim
$ kubectl apply -f test-pod.yaml
pod/app-gp2-in-tree created
After a few seconds the pod is created:
$ kubectl get po app-gp2-in-tree
NAME READY STATUS RESTARTS AGE
app-gp2-in-tree 1/1 Running 0 16s
This will dynamically provision the underlying PV pvc-646fef81-c677-46f4-8f27-9d394618f236, which is now bound to the PVC “ebs-gp2-claim”
$ kubectl get pvc ebs-gp2-claim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ebs-gp2-claim Bound pvc-646fef81-c677-46f4-8f27-9d394618f236 1Gi RWO gp2 5m3s
Let’s quickly check if the volume contains some data:
$ kubectl exec app-gp2-in-tree -- sh -c "cat /data/out.txt"
…
Thu Sep 16 13:56:34 UTC 2021
Thu Sep 16 13:56:39 UTC 2021
Thu Sep 16 13:56:44 UTC 2021
Let’s have a look at the details of the PV:
$ kubectl get pv pvc-646fef81-c677-46f4-8f27-9d394618f236
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-646fef81-c677-46f4-8f27-9d394618f236 1Gi RWO Delete Bound default/ebs-gp2-claim gp2 2m54s
$ kubectl get pv pvc-646fef81-c677-46f4-8f27-9d394618f236 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
…
labels:
topology.kubernetes.io/region: eu-central-1
topology.kubernetes.io/zone: eu-central-1c
name: pvc-646fef81-c677-46f4-8f27-9d394618f236
…
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: ext4
volumeID: aws://eu-central-1c/vol-03d3cd818a2c2def3
…
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1c
- key: topology.kubernetes.io/region
operator: In
values:
- eu-central-1
persistentVolumeReclaimPolicy: Delete
storageClassName: gp2
…
The PV was (as expected) created by the “kubernetes.io/aws-ebs” provisioner as shown in the annotation. The “awsElasticBlockStore.volumeId” attribute within the spec section shows the actual Amazon EBS Volume ID “vol-03d3cd818a2c2def3” together with the AWS Availability Zone (AZ) the EBS volume was created – eu-central-1c in this case. EBS volumes and EC2 instances are zonal (not Regional) resources. The nodeAffinity section advises the kube-scheduler to provision the pod on a node in the same AZ where the PV is created.
The following command is a short-form to retrieve the Amazon EBS details:
$ kubectl get pv pvc-646fef81-c677-46f4-8f27-9d394618f236 –o jsonpath='{.spec.awsElasticBlockStore.volumeID}'
aws://eu-central-1c/vol-03d3cd818a2c2def3
We want to use this PV for the storage migration scenarios described later. To ensure that the PV will not be deleted when the corresponding PVC, using it, is deleted we are going to patch the “VolumeReclaimPolicy” to “Retain”. Note: This is only possible on the PV level not the SC level!
$ kubectl patch pv pvc-646fef81-c677-46f4-8f27-9d394618f236 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
persistentvolume/pvc-646fef81-c677-46f4-8f27-9d394618f236 patched
$ kubectl get pv pvc-646fef81-c677-46f4-8f27-9d394618f236
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-646fef81-c677-46f4-8f27-9d394618f236 1Gi RWO Retain Bound default/ebs-gp2-claim gp2 9m4s
To install the Amazon EBS CSI driver, follow our GitHub documentation. The required high-level steps are:
- Attach the IAM permissions required for Amazon EBS operations to either the worker node profile or alternatively, following least privilege, use IRSA (IAM roles for service account) to create a properly annotated ServiceAccount (SA)
- Install the external volume snapshot controller related K8s objects (CRD, RBAC resources, deployment, and validating webhook) using YAML
- Install Amazon EBS CSI driver using YAML or use corresponding Helm chart (use existing service account if you use IRSA)
Double-check that the Amazon EBS CSI related Kubernetes components are registered to the K8s API server:
$ kubectl api-resources | grep "storage.k8s.io/v1"
volumesnapshotclasses snapshot.storage.k8s.io/v1 false VolumeSnapshotClass
volumesnapshotcontents snapshot.storage.k8s.io/v1 false VolumeSnapshotContent
volumesnapshots snapshot.storage.k8s.io/v1 true VolumeSnapshot
csidrivers storage.k8s.io/v1 false CSIDriver
csinodes storage.k8s.io/v1 false CSINode
csistoragecapacities storage.k8s.io/v1beta1 true CSIStorageCapacity
storageclasses sc storage.k8s.io/v1 false StorageClass
volumeattachments storage.k8s.io/v1 false VolumeAttachment
Confirm they are up and running:
$ kubectl get po -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node,snapshot-controller)'
NAME READY STATUS RESTARTS AGE
ebs-csi-controller-569b794b57-md99s 6/6 Running 0 6d15h
ebs-csi-controller-569b794b57-trkks 6/6 Running 0 6d15h
ebs-csi-node-4fkb8 3/3 Running 0 6d14h
ebs-csi-node-vc48t 3/3 Running 0 6d14h
snapshot-controller-6984fdc566-4c49f 1/1 Running 0 6d15h
snapshot-controller-6984fdc566-jlnbn 1/1 Running 0 6d15h
$ kubectl get csidrivers
NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE
ebs.csi.aws.com true false false <unset> false Persistent 7d19h
Migration scenario
First we will discuss the migration scenario at a high level.
We are conducting a physical storage migration by copying the in-tree based PV data using Amazon EBS snapshots as an external snapshot mechanism (note: K8s in-tree Amazon EBS plugin does not support volume snapshots!) and import this data using the CSI Volume Snapshots feature of the CSI Amazon EBS driver.
Now we will guide you through the migration in detail.
We start by taking a snapshot of the in-tree plugin based PV pvc-646fef81-c677-46f4-8f27-9d394618f236 via the AWS API.
$ kubectl get pv pvc-646fef81-c677-46f4-8f27-9d394618f236 -o jsonpath='{.spec.awsElasticBlockStore.volumeID}'
aws://eu-central-1c/vol-03d3cd818a2c2def3
$ aws ec2 create-snapshot --volume-id vol-03d3cd818a2c2def3 --tag-specifications 'ResourceType=snapshot,Tags=[{Key="ec2:ResourceTag/ebs.csi.aws.com/cluster",Value="true"}]'
{
…
"SnapshotId": "snap-06fb1faafc1409cc5",
…
"State": "pending",
"VolumeId": "vol-03d3cd818a2c2def3",
"VolumeSize": 1,
…
}
Wait until snapshot is in state “completed”
$ aws ec2 describe-snapshots --snapshot-ids snap-06fb1faafc1409cc5
{
"Snapshots": [
{
…
"Progress": "100%",
"SnapshotId": "snap-06fb1faafc1409cc5",
..
"State": "completed",
"VolumeId": "vol-03d3cd818a2c2def3",
…
}
]
}
Now create a VolumeSnapshotClass object:
$ cat vsc-ebs-csi.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: ebs-csi-aws
driver: ebs.csi.aws.com
deletionPolicy: Delete
$ kubectl apply -f vsc-ebs-csi.yaml
volumesnapshotclass.snapshot.storage.k8s.io/ebs-csi-aws created
$ kubectl get volumesnapshotclass
NAME DRIVER DELETIONPOLICY AGE
ebs-csi-aws ebs.csi.aws.com Delete 12s
Next we have to create a VolumeSnapshotContent object that uses the AWS snapshot snap-06fb1faafc1409cc5 and already references a VolumeSnapshot we will create in a next step. This seems odd but is necessary for the bidirectional binding of VolumeSnapshotContent and VolumeSnapshot for preexisting snapshots!
$ cat vsc-csi.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: imported-aws-snapshot-content
spec:
volumeSnapshotRef:
kind: VolumeSnapshot
name: imported-aws-snapshot
namespace: default
source:
snapshotHandle: snap-06fb1faafc1409cc5 # <-- snapshot to import
driver: ebs.csi.aws.com
deletionPolicy: Delete
volumeSnapshotClassName: ebs-csi-aws
$ kubectl apply -f vsc-csi.yaml
volumesnapshotcontent.snapshot.storage.k8s.io/imported-aws-snapshot-content created
$ kubectl get volumesnapshotcontent imported-aws-snapshot-content
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT VOLUMESNAPSHOTNAMESPACE AGE
imported-aws-snapshot-content true 1073741824 Delete ebs.csi.aws.com ebs-csi-aws imported-aws-snapshot default 12s
Now we need to create the VolumeSnapshot that references the VolumeSnapshotContent object:
$ cat vs-csi.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: imported-aws-snapshot
namespace: default
spec:
volumeSnapshotClassName: ebs-csi-aws
source:
volumeSnapshotContentName: imported-aws-snapshot-content
$ kubectl apply -f vs-csi.yaml
volumesnapshot.snapshot.storage.k8s.io/imported-aws-snapshot created
$ kubectl get volumesnapshot imported-aws-snapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
imported-aws-snapshot true imported-aws-snapshot-content 1Gi ebs-csi-aws imported-aws-snapshot-content 33m 60s
During this migration we like to benefit from the new Amazon EBS gp3 storage class. In order to do so we have to create a CSI-based gp3 storage class! Because we want this SC to be the default one we remove the annotation from the Amazon EKS default SC gp2 first:
$ kubectl annotate sc gp2 storageclass.kubernetes.io/is-default-class-
storageclass.storage.k8s.io/gp2 annotated
$ kubectl get sc gp2
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 243d
$ cat gp3-def-sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
$ kubectl apply -f gp3-def-sc.yaml
storageclass.storage.k8s.io/gp3 created
$ kubectl get sc gp3
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 6s
The VolumeSnapshot we created earlier can be used to create a PersistentVolumeClaim. As the storage class we use the new gp3 CSI based SC.
$ cat pvc-vs-csi.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: imported-aws-snapshot-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3
resources:
requests:
storage: 1Gi
dataSource:
name: imported-aws-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
$ kubectl apply -f pvc-vs-csi.yaml
persistentvolumeclaim/imported-aws-snapshot-pvc created
$ kubectl get pvc imported-aws-snapshot-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
imported-aws-snapshot-pvc Pending gp3 54s
Note that the PVC is still in pending status because the gp3 SC uses a volumeBindingMode of WaitForFirstConsumer. So we have to create an application (pod) again to create an underlying PV. For demo purposes, we just mount the PVC without writing new data and use “kubectl exec” to have a look at the data of the snapshot:
$ cat test-pod-snap.yaml
apiVersion: v1
kind: Pod
metadata:
name: app-imported-snapshot-csi
spec:
containers:
- name: app
image: centos
args:
- sleep
- "10000"
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: imported-aws-snapshot-pvc
$ kubectl apply -f test-pod-snap.yaml
pod/app-imported-snapshot-csi created
A PV pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20 was automatically created and the pod is running with access to the migrated data:
$ kubectl get pvc imported-aws-snapshot-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
imported-aws-snapshot-pvc Bound pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20 1Gi RWO gp3 11m
$ kubectl get po app-imported-snapshot-csi
NAME READY STATUS RESTARTS AGE
app-imported-snapshot-csi 1/1 Running 0 85s
$ kubectl exec app-imported-snapshot-csi -- sh -c "cat /data/out.txt" | more
Thu Sep 16 13:56:04 UTC 2021
Thu Sep 16 13:56:09 UTC 2021
Thu Sep 16 13:56:14 UTC 2021
…
The PVC, as expected, uses CSI based gp3 SC:
$ kubectl get pv pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20 1Gi RWO Delete Bound default/imported-aws-snapshot-pvc gp3 3m34s
$ kubectl get pv pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: ebs.csi.aws.com
creationTimestamp: "2021-09-17T09:57:15Z"
finalizers:
- kubernetes.io/pv-protection
- external-attacher/ebs-csi-aws-com
name: pvc-25d2d19d-6ede-47d2-bd2e-32d45832ec20
…
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: imported-aws-snapshot-pvc
namespace: default
…
csi:
driver: ebs.csi.aws.com
fsType: ext4
volumeAttributes:
storage.kubernetes.io/csiProvisionerIdentity: 1630589410219-8081-ebs.csi.aws.com
volumeHandle: vol-036ef87c533d529de
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.ebs.csi.aws.com/zone
operator: In
values:
- eu-central-1c
persistentVolumeReclaimPolicy: Delete
storageClassName: gp3
volumeMode: Filesystem
status:
phase: Bound
Cleanup
To avoid unnecessary costs, clean up your environment after performing the demo migration.
$ kubectl delete pod app-imported-snapshot-csi
$ kubectl delete pvc imported-aws-snapshot-pvc
$ kubectl delete volumesnapshotcontent imported-aws-snapshot-content
$ kubectl delete volumesnapshot imported-aws-snapshot
$ aws ec2 delete-snapshot --snapshot-ids <snap-id>
$ kubectl delete pv <pvc-id>
For future use, you can leave the CSI driver in your EKS cluster.
Conclusion
Amazon EKS provides customers with a managed control plane, options for managing the data plane (managed node groups), and managed cluster add-ons for critical components like AWS VPC CNI, CoreDNS, and kube-proxy. Once all features related to Amazon EBS CSI migration are finalized, AWS will take care of the heavy lifting of implementing all the bits and pieces on the managed control plane and data plane for you!
This blog post described how you can even start today migrating your workloads to PersistentVolumes, which supports all the new capabilities and features of the Amazon EBS CSI driver.
We hope this post helps with your Kubernetes projects. If you have questions or suggestions, please leave a comment.