AWS Storage Blog

Scalable, cloud-native file storage at pennies per GB-month with Amazon EFS

Tens of thousands of customers including T-Mobile, MicroStrategy, HERE, and LoanLogics are storing up to petabytes of data in Amazon Elastic File System (Amazon EFS), to power use cases such as lift-and-shift of enterprise applications, large scale analytics, and persistent file storage for containers, at a blended price point of just $0.096/GB-month. In doing so, these customers are saving hundreds of thousands of dollars per year on their cloud file storage. You might be asking yourself, how is this possible?

By using EFS Lifecycle Management to transparently tier files across EFS’s storage classes – Standard and Infrequent Access (EFS IA) – customers automatically save money as their access patterns change, and effectively lower the EFS storage price by two thirds! From a storage perspective, think of it as an 80-20 rule: in the fullness of time, 80% of your data is ‘cold’ or not accessed very often, and 20% of your data is ‘hot’ or actively used. This ratio is generally accepted by industry analysts such as IDC, and validated by our own analysis of customer usage patterns. Focusing on the US East (N. Virginia) region as the pricing barometer, by using EFS Standard for active workloads and EFS IA for colder data, this leads to an effective blended storage price of $0.096/GB-month (20% * $0.30 + 80% * $0.045).

In July, we announced a few enhancements to EFS IA to make it even easier to cost-effectively store your files in the cloud. You can now choose from one of four Lifecycle Management age-off policies, to tune when data lands in EFS IA according to your workload. In addition to the 30 day age-off policy currently available, we’ve added options for 14, 60, and 90 days (with more to come – see below for detail). Additionally, we’re now supporting Lifecycle Management for all EFS file systems, including those created before EFS IA launched in General Availability this past February. EFS IA is a great example of how we innovate at AWS simply by listening to our customers’ feedback.

EFS Infrequent Access – the birth of an idea

As a fully managed, elastic, and scalable file storage service, EFS provides regional multi-AZ availability (designed for 4 9’s, with an SLA for 3 9’s) and durability (designed for 11 9’s) right out of the box, without requiring you to set up and manage any infrastructure or provision storage capacity up-front. Customers like Kellogg’s, BBC, and Faculty always tell us they love how simple EFS is – how easy it is to use. However, they also told us that EFS could become quite expensive when storing large amounts of data for long periods of time.

Given that feedback, we went off and looked at ways that we could extend the lowest possible price to customers while still providing all the benefits and features that they expect. Our data scientists analyzed EFS usage data using tools like Amazon Athena and Amazon SageMaker and identified a few noteworthy trends. Arguably, the biggest of which was a strong correlation between the age of a file and how likely it is to be read or written. In other words, the older a file is, the less likely it is to be used frequently.

So right there was an opportunity – what if we could provide a way for customers to automatically save money as their access patterns change, by moving older files to more cost-effective storage media that’s still readily accessible when it’s needed?

It’s about time

And that’s how we came up with the idea for the Amazon EFS Infrequent Access storage class. EFS IA provides price/performance that’s cost-optimized for files not accessed every day, by storing infrequently accessed files on less expensive storage media (think solid-state drives vs. spinning disks). Along with EFS IA, we introduced EFS Lifecycle Management, a capability that monitors the access patterns (i.e. reads and writes) for files in your file system, and automatically moves them into the lower-cost storage class according to a pre-defined policy, yielding automatic cost savings as data naturally ages off over time.

With EFS IA and Lifecycle Management, we’ve eliminated the need to manually manage your data to control cost. EFS transparently serves data from both storage classes in a common file system namespace, so you don’t need to modify your applications or worry about which of your files are actively used and which are infrequently accessed.

What’s the catch?

Naturally, the first thing customers wonder when using EFS IA is what they have to give up to save so much money (up to 85% relative to the EFS Standard storage class). While all EFS features are supported when using EFS IA, there are two items we recommend you keep in mind.

Earlier we said that EFS IA has price/performance that’s cost-optimized for files not accessed every day. Here’s what we mean by that: while there are many factors that contribute to an application’s performance – and latency is just one of them – you can expect single-digit millisecond operation latencies on average when using EFS Standard, and double-digit millisecond operation latencies when using EFS IA.

The second thing to keep in mind with EFS IA is its additional pricing dimension. Unlike EFS Standard’s single price point, which includes all network and access charges, with EFS IA, you pay for storage separately from data access. We chose to price EFS IA this way so that we could extend the lowest possible storage price to customers, as our data shows that most files aren’t accessed very frequently, with many never accessed at all in a given month. EFS IA is designed for “infrequently accessed” files, after all.

Getting started with EFS IA

With a product as new as EFS IA, customers haven’t yet developed and shared best practices for planning and deployment. In May, we published a whitepaper that gives guidance for migrating Linux-based file system workloads to the cloud, including using EFS IA, but we wanted to take the opportunity with the launch of EFS IA’s new capabilities to reinforce the expectations we have based on analyzing file system usage. Of course, your mileage will vary, but here’s what we recommend you plan for if you don’t have statistics readily available on your data and access patterns.

In addition to the 80-20 rule we mentioned earlier in this post, when using EFS IA, you’ll save money if you access your IA files once per day or less on average. In practice, what we see is actually far lower: based on observed usage patterns to date, we expect that once a file ages off, it’s accessed in full (for example, a sequential read of the entire file) about once a month.

It’s really easy to get started – simply enable Lifecycle Management using the EFS Console or API. You may ask, “I really want to test out EFS IA to see if it works for me – do I really need to wait 14 days for my data to age off so I can test?” No, we have ways to enable your tests. Reach out to us directly or through your account team for more info on how we can help.

Continuing to innovate, driven by YOUR feedback!

In addition to requests for testing EFS IA, we often get questions from customers like, “what if I want to always keep some of my files on EFS Standard, since they’re latency sensitive?” or “what happens if a rogue script accesses lots of my data accidentally – could my bill go up?” There are workarounds and monitors you can put in place to alleviate some of these concerns, but what we have available now is just the beginning, and we know we can deliver even more to you, our customers.

Besides just for testing purposes, we know customers want shorter Lifecycle Management policies to load data more directly into EFS IA and more quickly realize their cost savings. We also know customers want policy options that move files back and forth between storage classes to optimize both cost and performance. We truly value feedback like this, and we’re actively investigating ways to put even more power and flexibility around storage classes into our customers’ hands.

What else would you like to see from EFS next? Whether it’s one of the topics mentioned in this blog or something else entirely, we’d like to hear from you.

Joe Travaglini

Joe Travaglini

For the past 4+ years, Joe Travaglini has been a product manager on the Amazon Elastic File System team, responsible for EFS’s security and compliance roadmap, and product lead for the launch of EFS Infrequent Access. Prior to EFS, Joe was Director of Products at Sqrrl, a cybersecurity analytics startup acquired by AWS in 2018.