Why is my EFS file system performance slow?
Last updated: 2022-06-08
My Amazon Elastic File System (Amazon EFS) performance is very slow. What are some common reasons for slow performance and how do I troubleshoot them?
The distributed, multi-Availability Zone architecture of Amazon EFS results in a small latency overhead for each file operation. The overall throughput generally increases as the average I/O size increases because the overhead is amortized over a larger amount of data.
Amazon EFS performance relies on multiple factors, including the following:
- Storage class of EFS.
- Performance and throughput modes.
- Type of operations performed on EFS (such as metadata intensive, and so on).
- Properties of data stored in EFS (such as size and number of files).
- Mount options.
- Client side limitations.
Storage class of EFS
For more information, see Performance summary.
Performance and throughput modes
Amazon EFS offers two performance modes, General Purpose and Max I/O. Applications can scale their IOPS elastically up to the limit associated with the performance mode. To determine what performance mode to use, see What are differences between General Purpose and Max I/O performance modes in Amazon EFS?
File-based workloads are typically spiky, driving high levels of throughput for short periods, but driving lower levels of throughput for longer periods. Amazon EFS is designed to burst to high throughput levels for periods of time.
The configured throughput and IOPS affects the performance of Amazon EFS. It's a best practice to benchmark your workload requirements to help you select the appropriate throughput and performance modes. When you select provisioned throughput, select the values that accommodate your workload requirements properly. In the case of bursting throughput mode, you can increase the size of Amazon EFS using dummy files to increase the baseline throughput. To analyze the throughput and IOPS that's consumed by your file system, see Using metric math with Amazon EFS.
Amazon EFS also scales up to petabytes of storage volume and has two modes of throughput: bursting and provisioned. In bursting mode, the greater the size of the EFS file system, the higher the throughput scaling. For provisioned mode, a throughput for your file system is set in MB/s, independent of the amount of data. For more information on throughput modes, see How do Amazon EFS burst credits work?
Types of operations performed on the EC2 instance
Metadata I/O operations
EFS performance suffers in the following situations:
- When the file sizes are small because it's a distributed system. This distributed architecture results in a small latency overhead for each file operation. Due to this per-operation latency, overall throughput generally increases as the average I/O size increases because the overhead is amortized over a larger amount of data.
- Performance on shared file systems suffers if a workload or operation generates many small files serially. This causes the overhead of each operation to increase.
- Metadata I/O occurs if your application performs metadata-intensive, operations such as, "ls," "rm," "mkdir," "rmdir," "lookup," "getattr," or "setattr", and so on. Any operation that requires the system to fetch for the address of a specific block is considered to be a metadata-intensive workload. For more information, see the following:
Metering: How Amazon EFS reports file system and object sizes.
Optimizing small-file performance.
- If you mount the file system using amazon-efs-utils, then the recommended mount options are applied by default.
- Using non-default mount options potentially degrades performance. For example, using lower rsize and wsize, lowering or turning off Attribute Caching. You can check the output of mount command to see the mount options currently in place:
For more information, see Mount the file system on the EC2 instance and test.
fs-EXAMPLE3f75f.efs.us-east-1.amazonaws.com:/ on /home/ec2-user/efs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<EXAMPLEIP>,local_lock=none,addr=<EXAMPLEIP>)
NFS client version
The Network File System (NFS) version 4.1 (NFSv4) protocol provides better performance for parallel small-file read operations (greater than 10,000 files per second) compared to NFSv4.0 (less than 1,000 files per second).
For more information, see NFS client mount settings.
Bottleneck at the EC2 instance
If your application using the file system isn't driving the expected performance from EFS, optimize the application. Also, benchmark the host or service that your application is hosted on, such as Amazon EC2, AWS Lambda, and so on. A resource crunch on the EC2 instance might affect your application's ability to use EFS effectively.
To check if EC2 is under-provisioned for your application requirements, monitor Amazon EC2 CloudWatch metrics, such as CPU, Amazon Elastic Block Store (Amazon EBS), and so on. Analyzing various metrics on your application architecture and resource requirements helps you determine whether you should reconfigure your application or instance according to your requirements.
Using the 4.0+ Linux kernel version
For optimal performance and to avoid a variety of known NFS client bugs, it's a best practice to use an AMI that has a Linux kernel version 4.0 or newer.
An exception to this rule is RHEL and CentOS 7.3 and newer. The kernel for these operating systems received backported versions of the fixes and enhancements applied to NFS v4.1. For more information, see NFS support.
When copying files using cp command, you might experience slowness. This is because the copy command is a serial operation, meaning that it copies each file one at a time. If the file size for each file is small, the throughput to send that file is small.
You might also notice latency when sending files. The distributed nature of EFS means that it must replicate to all mount points, so there is overhead per file operation. Therefore, latency in sending files is expected behavior.
It's a best practice to run parallel I/O operations, such as using rsync. If you are using rsync, be aware that cp and rsync work in serial (single-threaded) operations instead of parallel operations. This makes copying slower. Use tools such as fpart or NU Parallel. Fpart is a tool that helps you sort file trees and pack them into "partitions". Fpart comes with a shell script called fpsync that wraps fpart and rsync to launch several rsync in parallel. Fpsync provides its own embedded scheduler. By doing this you can complete these tasks faster than using the more common serial method.
For more information, see Amazon EFS performance tips.