Containers

EKS Persistent Volumes for Instance Store

The Kubernetes project is made up of a number of special interest groups (SIGs) that focus on a particular part of the Kubernetes ecosystem. The Storage SIG is focused on different types of storage (block and file) and ensuring that storage is available to containers when they are scheduled. One of the subprojects of the Storage SIG is the Local Volume Static Provisioner, and it is a Container Storage Interface (CSI) driver that creates Kubernetes PersistentVolumes for persistent disks attached during instance startup.

This post discusses deploying the Local Volume Static Provisioner CSI driver using Amazon EKS managed node groups and pre-bootstrap commands to expose the NVMe EC2 instance store drives as Kubernetes PV objects. Customers may wish to leverage the local NVMe storage volumes to achieve higher performance than what’s possible from the general-purpose Amazon EBS boot volume.

Note that instance storage volumes are for temporary storage, and the data is lost when the Amazon Elastic Compute Cloud (Amazon EC2) instance is stopped or terminated. To persist data stored in instance store volumes across the lifecycle of an instance, you need to handle replication at the application layer.

Storage in Kubernetes

A Kubernetes PersistentVolume (PV) is a cluster resource that defines the capabilities and location of a single storage volume. A PersistentVolumeClaim (PVC) defines a request for a certain amount of storage resources. When a Kubernetes Pod needs storage, it references a PVC. Kubernetes then works to match the PVC to an available PV or automatically provision a new PV if the CSI driver supports dynamic provisioning.

Below is an example of a Kubernetes Pod, PV, and PVC. The manifest first defines a PV named my-pv , which offers 5 GiB of storage and is classified as being of storage class nfs. Next, the manifest creates a PVC named nfs-claim , which requests 4 GiB of nfs storage. Finally, the Pod named app mounts the storage, which is claimed by the PVC of nfs-claim.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: nfs
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /tmp
    server: 172.17.0.2
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: nfs
    resources:
    requests:
      storage: 4Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: nfs-claim

Walkthrough

This post will walk you through the following steps to install and test the Local Volume Static Provisioner:

  1. Create a service account with cluster level permissions
  2. Create a ConfigMap for CSI driver
  3. Create a DaemonSet to deploy the CSI driver
  4. Two options for creating Amazon EKS managed node groups with boot scripts to expose the NVMe instance store to Kubernetes Pods
  5. Clean up

Prerequisites

We need a few prerequisites and tools to successfully run through these steps. Ensure you have the following in your working environment:

Installing the Local Storage Static Provisioner

The Local Volume Static Provisioner CSI handles both the detection and creation of PVs for local disks mounted in a predefined file system path. Installing requires deploying a DaemonSet and granting Kubernetes API and host level permissions.

Kubernetes Service Accounts and Permissions

The CSI driver needs permission to issue API calls to the Kubernetes control plane to manage the lifecycle of the PVs. The manifest below defines a Kubernetes service account and attaches a Kubernetes cluster role that grants the necessary Kubernetes API permissions.

Copy and save the manifest below as service-account.yaml

# K8s service account for CSI Driver
apiVersion: v1
kind: ServiceAccount
metadata:
  name: local-volume-provisioner
  namespace: kube-system
---
# List of Permissions 
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: local-storage-provisioner-node-clusterrole
rules:
- apiGroups: [""]
  resources: ["persistentvolumes"]
  verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["watch"]
- apiGroups: ["", "events.k8s.io"]
  resources: ["events"]
  verbs: ["create", "update", "patch"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get"]
---
# Attach the K8s ClusterRole to our K8s ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: local-storage-provisioner-node-binding
  namespace: kube-system
subjects:
- kind: ServiceAccount
  name: local-volume-provisioner
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: local-storage-provisioner-node-clusterrole
  apiGroup: rbac.authorization.k8s.io

Run the following command to create the ServiceAccount, ClusterRole, and ClusterRoleBinding:

kubectl apply -f service-account.yaml

CSI Driver ConfigMap

The Local Volume Static Provisioner CSI driver stores in a Kubernetes ConfigMap where to look for mounted EC2 NVMe instance store volumes and how to expose them as PVs. The below ConfigMap specifies that Local Volume Static Provisioner look for mounted NVMe instance store volumes in the /mnt/fast-disk directory.

Kubernetes StorageClass specifies a type of storage available in the cluster. The manifest includes a new StorageClass of fast-disks to identify that the PVs relate to NVMe instance store volumes.

Copy and save the manifest below as config-map.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-disks
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
# Supported policies: Delete, Retain
reclaimPolicy: Retain
---
# Configuration for our Local Persistent Volume CSI Driver
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-volume-provisioner-config
  namespace: kube-system

data:
  # Adds the node's hostname as a label to each PV created
  nodeLabelsForPV: |
    - kubernetes.io/hostname

  storageClassMap: |
    fast-disks:
      # Path on the host where local volumes of this storage class
      # are mounted under.
      hostDir: /mnt/fast-disks

      # Optionally specify mount path of local volumes.
      # By default, we use same path as hostDir in container.
      mountDir: /mnt/fast-disks
 
      # The /scripts/shred.sh is contained in the CSI drivers container
      # https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/deployment/docker/scripts/shred.sh
      blockCleanerCommand:
        - "/scripts/shred.sh"
        - "2"

      # The volume mode of PV defines whether a device volume is
      # intended to use as a formatted filesystem volume or to remain in block
      # state. Value of Filesystem is implied when omitted.
      volumeMode: Filesystem
      fsType: ext4
      
      # name pattern check
      # only discover local disk mounted to path matching pattern("*" by default).
      namePattern: "*"

Run the following command to create the StorageClass and ConfigMap.

kubectl apply -f config-map.yaml

CSI Driver DaemonSet

The Local Volume Static Provisioner CSI Driver runs on each Amazon EKS node needing its NVMe instance store volumes exposed as Kubernetes PVs. Often Kubernetes clusters have multiple instance types in the cluster, where some nodes might not have NVMe instance store volumes. The DaemonSet in the following manifest specifies a nodeAffinity selector to only schedule the DaemonSet on an Amazon EKS node with a label of fast-disk-node and corresponding value of either pv-raid or pv-nvme.

Copy and save the following manifest as daemonset.yaml

# The Local Persistent Volume CSI DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: local-volume-provisioner
  namespace: kube-system
  labels:
    app.kubernetes.io/name: local-volume-provisioner
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: local-volume-provisioner 
  template:
    metadata:
      labels:
        app.kubernetes.io/name: local-volume-provisioner
    spec:
      serviceAccountName: local-volume-provisioner
      containers:
          # The latest version can be found in the changelog.
          # In production, one might want to use the container digest hash 
          # over version for improved security.
          # https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/CHANGELOG.md
        - image: "registry.k8s.io/sig-storage/local-volume-provisioner:v2.5.0"
          # In production you might want to set this to use a locally cached 
          # image by setting this to: IfNotPresent
          imagePullPolicy: "Always"
          name: provisioner 
          securityContext:
            privileged: true
          env:
          - name: MY_NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: MY_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          ports:
            # List of metrics at
            # https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/cee9e228dc28a4355f664b4fe2236b1857fe4eca/pkg/metrics/metrics.go
            - name: metrics
              containerPort: 8080
          volumeMounts:
            - name: provisioner-config
              mountPath: /etc/provisioner/config
              readOnly: true             
            - mountPath:  /mnt/fast-disks 
              name: fast-disks
              mountPropagation: "HostToContainer" 
      volumes:
        - name: provisioner-config
          configMap:
            name: local-volume-provisioner-config
        - name: fast-disks
          hostPath:
            path: /mnt/fast-disks
      # Only run CSI Driver on the `fast-disk` tagged nodegroup 
      affinity:
        nodeAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
            nodeSelectorTerms: 
            - matchExpressions:
              - key: fast-disk-node
                operator: In
                values:
                - "pv-raid"
                - "pv-nvme"

Run the following command to create the DaemonSet.

kubectl apply -f daemonset.yaml

Amazon EKS Managed Node Group – Pre-bootstrap Commands

The ConfigMap deployed has the Local Volume Static Provisioner CSI Driver looking for disks mounted in the /mnt/fast-disks directory and running on nodes with a label of fast-disk-node and a value of pv-raid or pv-nvme. Now we need to configure our Amazon EKS managed node group to spin up EC2 instances with the fast-disk-node label and, on startup, to mount the NVMe instance store volumes to the /mnt/fast-disks directory.

This post goes over two approaches:

  1. Multiple persistent volumes, one for each NVMe instance store volume
  2. One single persistent volume RAID-0 array across all the NVMe instance store volumes

Both of these options deliver high random I/O performance and very low latency storage volumes to your Kubernetes Pods. The two approaches offer different options depending on the use case.

In Option 1, a persistent volume is created for each NVMe instance store volume. The i3.8xlarge instance used in this blog has four NVMe volumes. Option 1 will create four persistent volumes for the use case when multiple Pods need fast storage. Option 2 creates a single persistent volume using RAID-0, which is useful when only a single Pod needs fast storage.

Option 1: Multiple Persistent Volumes, One for Each NVMe Instance Store

Using the eksctl utility, we create a new Amazon EKS managed node group. In this example, we’ve requested two i3.8xlarge EC2 instances. In the metadata, replace eksworkshop-eksctl and us-west-2 with your respective EKS cluster’s name and AWS Region.

Copy and save the manifest below as pv-nvme-nodegroup.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  # Replace with your EKS Cluster's name
  name: eksworkshop-eksctl
  # Replace with the AWS Region your cluster is deployed in
  region: us-west-2

managedNodeGroups:
  # Name to give the managed node-group
  - name: eks-pv-nvme-ng
    # Label the nodes that they contain fast-disks 
    labels: { fast-disk-node: "pv-nvme" }
    instanceType: i3.8xlarge
    desiredCapacity: 2
    volumeSize: 100 # EBS Boot Volume size
    privateNetworking: true
    preBootstrapCommands:
      - |
        # Install NVMe CLI
        yum install nvme-cli -y
        
        # Get a list of instance-store NVMe drives
        nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
        readarray -t nvme_drives <<< "$nvme_drives"

        for disk in "${nvme_drives[@]}"
        do
          # Format the disk to ext4
          mkfs.ext4 -F $disk
          
          # Get disk UUID
          uuid=$(blkid -o value -s UUID $disk)
   
          # Create a filesystem path to mount the disk
          mount_location="/mnt/fast-disks/${uuid}"
          mkdir -p $mount_location
   
          # Mount the disk
          mount $disk $mount_location
          
          # Mount the disk during a reboot
          echo $disk $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab 
        done

Run the following command to create the Amazon EKS managed node group.

eksctl create nodegroup -f pv-nvme-nodegroup.yaml

Option 2: Single Persistent Volume RAID-0 array across all the NVMe instance stores

Similar to the last example, we use the eksctl utility and create a new Amazon EKS managed node group. However, this time a software RAID-0 array across the NVMe instance store volumes is created.

In the metadata, replace eksworkshop-eksctl and us-west-2 with your respective EKS cluster’s name and AWS Region.

Copy and save the manifest below as pv-raid-nodegroup.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  # Replace with your EKS Cluster's name
  name: eksworkshop-eksctl
  # Replace with the AWS Region your cluster is deployed in
  region: us-west-2

managedNodeGroups:
  # Name to give the managed node-group
  - name: eks-pv-raid-ng
    # Label the nodes that they contain fast-disks 
    labels: { fast-disk-node: "pv-raid" }
    instanceType: i3.8xlarge
    desiredCapacity: 2
    volumeSize: 100 # EBS Boot Volume size
    privateNetworking: true
    preBootstrapCommands:
      - |
        # Install NVMe CLI
        yum install nvme-cli -y
        
        # Get list of NVMe Drives
        nvme_drives=$(nvme list | grep "Amazon EC2 NVMe Instance Storage" | cut -d " " -f 1 || true)
        readarray -t nvme_drives <<< "$nvme_drives"
        num_drives=${#nvme_drives[@]}
        
        # Install software RAID utility
        yum install mdadm -y
        
        # Create RAID-0 array across the instance store NVMe SSDs
        mdadm --create /dev/md0 --level=0 --name=md0 --raid-devices=$num_drives "${nvme_drives[@]}"

        # Format drive with Ext4
        mkfs.ext4 /dev/md0

        # Get RAID array's UUID
        uuid=$(blkid -o value -s UUID /dev/md0)
   
        # Create a filesystem path to mount the disk
        mount_location="/mnt/fast-disks/${uuid}"
        mkdir -p $mount_location
        
        # Mount RAID device
        mount /dev/md0 $mount_location
        
        # Have disk be mounted on reboot
        mdadm --detail --scan >> /etc/mdadm.conf 
        echo /dev/md0 $mount_location ext4 defaults,noatime 0 2 >> /etc/fstab

Run the following command to create the Amazon EKS managed node group.

eksctl create nodegroup --config-file=pv-raid-nodegroup.yaml

Viewing Persistent Volumes and DaemonSets

After the Amazon EKS managed node group is created, the Local Volume Static Provisioner will be scheduled as a DaemonSet on each of the managed node group’s EC2 instances. The DaemonSet will discover the mounted NVMe instance store volumes mounted in /mnt/fast-disks and expose them as persistent volumes.

To view the DaemonSets running, run the following command:

kubectl get daemonset --namespace=kube-system

To view the persistent volumes, run the following command:

kubectl get pv

Clean up

First, we remove the instance store node groups. Note that the eksctl command starts a CloudFormation script that can take a few minutes before the nodes and associated resources are terminated.

# Replace `eksworkshop-eksctl` with your EKS Cluster's name
eksctl delete nodegroup --cluster=eksworkshop-eksctl --name=eks-pv-nvme-ng
eksctl delete nodegroup --cluster=eksworkshop-eksctl --name=eks-pv-raid-ng

Then delete the persistent volumes.

kubectl get pv --no-headers | awk '$6=="fast-disks" { print $1 }' | xargs kubectl delete pv

Finally, remove the associated Kubernetes objects.

kubectl delete -n=kube-system daemonset local-volume-provisioner 
kubectl delete -n=kube-system configmap local-volume-provisioner-config
kubectl delete storageclass fast-disks
kubectl delete clusterrolebinding local-storage-provisioner-node-binding
kubectl delete clusterrole local-storage-provisioner-node-clusterrole
kubectl delete -n=kube-system serviceaccount local-volume-provisioner 

Conclusion

In this post we described how to use the Local Volume Static Provisioner CSI driver developed by the Kubernetes Storage special interest group. By using the Amazon EKS managed node groups pre-bootstrap commands, you can customize the provisioning of NVMe instance store volumes to meet your unique PV needs.

Application developers can use this deployment pattern to provide each pod access to an isolated instance store or a shared storage layer for cross-pod access.

The Local Volume Static Provisioner is a third-party open-source software released under the Apache 2 license. We encourage you to visit the AWS Containers Roadmap page on GitHub to stay in touch with the latest additions and upcoming features to AWS container services.