Amazon EFS introduces 3X read-throughput increase at no additional charge
Companies are increasingly using Amazon EFS with containers and serverless compute to modernize their applications. Many of these applications are read-heavy, meaning they read data more often than they write. To improve the overall performance of read-heavy applications, we recently launched a 3x increase to the read throughput of all Amazon EFS file systems, available at no additional charge. This read throughput increase helps machine learning training workloads load data faster, especially those with large datasets of videos and hi-res images that do not fit into memory. For customers using AWS Lambda and Amazon EFS together for elastically scaling machine learning inference applications, the increased throughput helps during times when concurrency increases and newly created Lambda containers must pull the model into memory from the file system. Other examples of read-heavy workloads that benefit from the increased throughput include content management systems, Monte Carlo simulations, genomics research, and financial risk calculations.
How does it work?
With the increase in read throughput, Amazon EFS now meters read operations at 1/3 the rate of write and metadata operations. For example, a file system configured for 1 GB/s of provisioned throughput can now drive up to 3 GB/s for a read-heavy workload. For bursting mode file systems, this change allows read-heavy workloads to burst to 3x higher levels of throughput, and sustain up to a 3x higher average throughput. For mixed read/write workloads, the increase in overall throughput depends on the mix of read and write operations. For example, consider a workload driving 750 MB/s of read operations and 250 MB/s of write and metadata operations for a total throughput of 1000 MB/s. With this change the workload will be metered at 500 MB/s total, with 750/3=250 MB/s from read operations and 250 MB/s from write and metadata operations, resulting in a 2x increase in total throughput the workload can drive.
As part of the launch, we introduced a new CloudWatch metric, MeteredIOBytes. You can use this metric to monitor your metered file system throughput, comparing it against the PermittedThroughput CloudWatch metric to see how close you are to your throughput limit. We do this calculation for you in the Amazon EFS console as part of the Throughput utilization (%) graph. If you want to monitor MeteredIOBytes in a custom CloudWatch dashboard, you can use the metric math expression shared in our documentation to get started.
To illustrate the read throughput increase, I wrote a simple Lambda application that first writes and then reads 10-GB files at a concurrency of 50. The application clears the operating system cache after writing the files to ensure the numbers for read are accurate. I ran this application with a file system with 1 GB/s of provisioned throughput. The following graph shows the TotalIOBytes and MeteredIOBytes CloudWatch metrics during a run of the application. As you can see, the write portion of the workload drives up to 1 GB/s while the read portion can drive up to 3 GB/s. You can also see that during both portions of the workload the metered throughput is 1 GB/s.
To learn more about Amazon EFS, including its regional availability and durability, lifecycle management capabilities, and use cases, visit the Amazon EFS homepage. You can also read more about Amazon EFS performance in the Amazon EFS documentation. If you have any comments or questions, please don’t hesitate to leave them in the comments section.