How can I troubleshoot the error "nfs: server 127.0.0.1 not responding" when mounting my EFS file system?
Last updated: 2022-07-18
My Amazon Elastic File System (Amazon EFS) server isn't responding and hangs with the error message "nfs: server 127.0.0.1 not responding". How can I troubleshoot this?
The following are common reasons why you might see the server not responding error:
- The NFS client can't connect to the EFS server.
- A reboot or shutdown of the instance occurred. Or. any other disconnection from the EC2 instance occurred. These occurrences cause a network disconnection between the NFS client and the EFS server. This behavior isn't conformant with the TCP RFC. Disconnections might cause responses from Amazon EFS to an Amazon Elastic Compute Cloud (Amazon EC2) instance or an NFS client to be blocked for multiple minutes.
- The noresvport mount option wasn't used when mounting the file system using an NFS client.
- There might be an issue with the kernel version causing EFS mount failure. For example, there are a number of known NFS client issues with RHEL6 that cause symptoms similar to unresponsive file systems. In earlier kernel versions of RHEL6.X the file system might become unavailable and fail to remount. NFS connection hangs might occur in Amazon EFS if you're running:
- RHEL or CentOS 7.6 or later (kernel version of 3.10.0-957).
- Any other Linux distribution with kernel version 4.16 through 4.19.
1. Use the noresvport mount option when mounting your file system. This option makes sure that the NFS client uses the new TCP source port when a network connection must be reestablished. Using noresvport makes sure that the EFS file system has uninterrupted availability after a network recovery event.
$ sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport mount-target-ip:/ mnt
If you're using the EFS mount helper, then the noresvport option is present by default. If you're using NFS to mount, then you must add this parameter explicitly. For more information, see Recommended NFS mount options.
2. Check the kernel version. There might be issues with the particular kernel version, such as RHEL or CentOS 7.6 or later (kernel version of 3.10.0-957), that might cause the file system mount failure. If you're running one of these kernel versions, reboot to recover access to the file system. To confirm that the kernel version is the issue, verify the output from the ps command when you're unable to run ls:
$ ps auxwwwm | grep <mount_point_IP>
If the kernel version is faulty, then upgrade the kernel. It's a best practice to use the current generation Linux NFS4v.1 client or later for better performance.
3. Verify that the client can connect to the server by running the following command:
telnet <ip-of-efs> 2049
Review the NFS client logs (EC2 instance OS logs) under /var/log/messages for errors. The logs might be under the /var/log/syslog or /var/log/dmesg directory, depending on your OS.
Also, if you mounted the file system using the EFS mount helper, review the EFS util logs under the /var/log/amazon/efs directory. The EFS mount helper has a built-in logging mechanism.
4. Verify that you can connect to your EC2 instance.
5. Verify if EC2 is being overloaded due to resource over-utilization. You can do this by monitoring EC2 metrics in Amazon CloudWatch, such as CPUUtilization and network-related metrics. Resources might include CPU, memory, application-level issues, and so on.
- Memory over-utilization: This might occur when the RAM is overutilized. Over-utilization means that the instance is running out of memory space, if, for example, an application starts consuming more RAM. Over-utilization causes Out Of Memory (OOM) errors. When initiated, these errors terminate processes that have a high OOM score or is consuming more memory. Ideally when OOM errors initiates, the instance remains inaccessible.
To temporarily resolve OOM errors, reboot the system to free up memory space.
For a longer term solution, monitor system resource usage using tools such as "atop" and "top". Then, move to a different instance type that better suits your workload. For more information, see Why is my EC2 Linux instance becoming unresponsive due to over-utilization of resources?
- Network performance: Review the network performance of the instance. Sometimes, even if CloudWatch metrics show low network utilization, there might be micro-bursting. Micro-bursting sends a high amount of traffic from a workload within a few seconds. Micro-bursting typically lasts for less than a minute. This burst is obscured in CloudWatch graphs and Amazon Elastic Block Store (Amazon EBS) stats because the smallest interval used within these tools is one minute. Monitor micro-bursting behavior using tools such as sar, nload, iftop. For more information, see Why is my Amazon Elastic Compute Cloud (Amazon EC2) instance exceeding its network limits when average utilization is low?
6. Review the EFS CloudWatch metrics and verify if throttling occurs at the EFS-level. This means that EFS is performing beyond its capacity. If you're using Bursting Throughput mode, then review the BurstBalance CloudWatch metric to determine if the burst balance is depleted. Also, review the permitted throughput CloudWatch metrics to determine if you're using higher throughput than the provisioned amount. For more information on burst credits, see How do Amazon EFS burst credits work?
If your applications need nearly continuous throughput, use Provisioned Throughput mode. Before switching to Provisioned Throughput mode from Bursting Throughput, consider how much throughput to provision. To determine the minimum amount of provisioned throughput needed, check the average throughput usage for your file system for the previous two weeks. Note the highest peak amount, rounded up to the next megabyte. For more information, see What throughput modes are available in EFS and what is the right throughput mode for my workload?