How do I troubleshoot issues with my Amazon EFS volume mounts in Amazon EKS?
Last updated: 2021-12-22
I receive the following errors in my pods when I mount Amazon Elastic File System (Amazon EFS) volumes in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster:
- "Output: mount.nfs4: mounting fs-18xxxxxx.efs.us-east-1.amazonaws.com:/path-in-dir:/ failed, reason given by server: No such file or directory"
- "Output: Failed to resolve "fs-xxxxxx.efs.us-west-2.amazonaws.com" - check that your file system ID is correct"
- "mount.nfs4: access denied by server while mounting 127.0.0.1:/"
- "mount.nfs: Connection timed out"
How do I troubleshoot this?
Before you begin the following troubleshooting steps, verify that you have the following:
- An Amazon EFS file system created with a mount target in each of the worker node subnets.
- A valid EFS storage class (from the GitHub website) definition using the efs.csi.aws.com provisioner.
- A valid PersistentVolumeClaim (PVC) definition and PersistentVolume definition. This isn't needed if you're using dynamic provisioning (from the GitHub website).
- The Amazon EFS CSI driver installed in the cluster.
Verify that the mount targets are configured correctly
Make sure to create the EFS mount targets in each Availability Zone where the EKS nodes are running. For example, if your worker nodes are spread across us-east-1a and us-east-1b, make sure to create mount targets in both Availability Zones for the EFS file system that you're trying to mount. If you don't correctly create the mount targets, then the pods that are mounting the EFS file system return an error similar to the following:
Output: Failed to resolve "fs-xxxxxx.efs.us-west-2.amazonaws.com" - check that your file system ID is correct.
Verify that the security group associated with your EFS file system allows NFS traffic
The security group that's associated with your EFS file system must have an inbound rule that allows NFS traffic (port 2049) from the CIDR for your cluster's VPC. If the security group of the EFS Mount targets aren't allowing NFS traffic, then the pods mounting the EFS file system return an error similar to the following:
"mount.nfs: Connection timed out"
Verify that the subdirectory is created in your EFS file system if you're mounting the pod to a subdirectory
When you add sub paths in persistent volumes, the EFS CSI driver doesn't create the subdirectory path in the EFS file system as part of the mount operation. The directories must be already present for the mount operation to succeed. If the sub path isn't present in the file system, then the pods fail with the following error:
Output: mount.nfs4: mounting fs-18xxxxxx.efs.us-east-1.amazonaws.com:/path-in-dir:/ failed, reason given by server: No such file or directory
Confirm that the cluster's VPC uses the Amazon DNS server
When you use the EFS CSI driver to mount the EFS, the EFS mount helper in the EFS CSI driver requires that the VPC uses the Amazon DNS server for the VPC. Note: The EFS service’s file system DNS has an AWS architectural limitation. Only the Amazon provided DNS can resolve the EFS service's file system DNS.
Verify the DNS server by logging in to the worker node and running the following command:
nslookup fs-4fxxxxxx.efs.region.amazonaws.com <amazon provided DNS IP> <amazon provided DNS IP = VPC network(10.0.0.0) range plus two>
Note: Replace region with your AWS Region.
If the cluster VPC is using a custom DNS server, then you must configure the custom DNS server to forward all *.amazonaws.com requests to the Amazon DNS server. If these requests aren't forwarded, then the pods fail with an error similar to the following:
Output: Failed to resolve "fs-4 fxxxxxx.efs.us-west-2.amazonaws.com" - check that your file system ID is correct.
Verify that you have "iam" mount options in the persistent volume definition when using a restrictive file system policy
In some cases, the EFS file system policy is configured to restrict mount permissions to specific IAM roles. If this is the case, then the EFS mount helper in the EFS CSI driver requires that the -o iam mount option pass during the mount operation. Include the spec.mountOptions property so that the CSI driver can add the iam mount option (from the GitHub website).
Example PersistentVolume specification:
apiVersion: v1 kind: PersistentVolume metadata: name: efs-pv1 spec: mountOptions: - iam . . . . . .
If you don't add the iam mount option when you use a restrictive file system policy, then the pods fail with an error similar to following:
mount.nfs4: access denied by server while mounting 127.0.0.1:/
Verify that the Amazon EFS CSI driver controller service account is annotated with the correct IAM role and the IAM role has the required permissions
Run the following command to verify that the service account used by the efs-csi-controller pods has the correct annotation:
kubectl describe sa efs-csi-controller-sa -n kube-system
Verify that the following annotation is present:
Verify that the IAM OIDC provider for the cluster was created, and the IAM role has the required permissions (from the GitHub website) to perform EFS API calls. Also, verify that the IAM role's trust policy trusts the service account efs-csi-controller-sa.
Verify that the EFS CSI driver pods are running
The EFS CSI driver is made up of controller pods that are run as a deployment and node pods that are run as a daemonset. Run the following commands to verify that these pods are running in your cluster:
kubectl get all -l app.kubernetes.io/name=aws-efs-csi-driver -n kube-system
Verify the EFS mount operation from the EC2 worker node where the pod is failing to mount the file system
Log in to the Amazon EKS worker node where the pod is scheduled. Then, use the EFS mount helper to try to manually mount the EFS file system to the worker node. You can run the following command to test:
sudo mount -t -efs -o tls file-system-dns-name efs-mount-point/
If the worker node can mount the file system, then review the efs-plugin logs from the CSI controller and CSI node pods.
Check the EFS CSI driver pod logs
Check the CSI driver pod logs to determine the cause of the mount failures. If the volume is failing to mount, then review the efs-plugin logs. Run the following commands to retrieve the efs-plugin container logs:
kubectl logs deployment/efs-csi-controller -n kube-system -c efs-plugin kubectl logs daemonset/efs-csi-node -n kube-system -c efs-plugin