Best practices for using Amazon EFS for container storage
Tens of thousands of companies are storing petabytes of data on Amazon Elastic File System (Amazon EFS), many of them using EFS to store data for containerized applications. Amazon EFS file systems can be attached to containers launched by both Amazon Elastic Container Service (ECS) and Elastic Kubernetes Service (EKS). Amazon EFS is a natural choice for container storage because it is a fully managed service that is simple to set up and elastically scales as you add or remove data, just like your container infrastructure. It is also scalable to petabytes of data and gigabytes per second of aggregate throughput and thousands of IOPS. In this blog I’ll share some frequently asked questions about best practices for using EFS with containerized applications.
Do I need shared storage for my container?
Generally, shared file storage storage is useful for long-running containers that need to be resilient to failure and containers that need to share data with each other. Some examples I commonly see include:
- Content management applications like WordPress and Drupal benefit from scaling out to multiple instances for performance and redundancy, and need to share uploads, plugins, and templates across multiple instances.
- Developer tools like JIRA, Artifactory, and Git need to share data between instances for high availability, with code and artifacts persisted to multiple AWS availability zones for durability.
- Machine learning frameworks like MXNet and Tensorflow need to access data through a file system interface, and having durable shared storage allows several users and jobs to run in parallel on the same set of data.
- Shared notebook systems like Jupyter and Jupyterhub need durable storage for notebooks and user workspaces, and having shared storage allows data scientists to collaborate easily.
Amazon Elastic Block Storage (EBS) is another option for persisting container data, and is often a good choice when the data doesn’t need to be shared with other containers, such as with databases like MySQL or distributed systems like Kafka and Cassandra.
Should I create a new file system or use an existing one?
In most cases you’ll want to create a new file system for each new application that needs to share or persist data, since it allows you to secure and manage all data for your application in one location. There are two exceptions to this. First, if a file system already exists with the data that your application needs, you can simply connect that existing file system to your container instead of creating a new one. If your container only needs to read the shared data, both ECS and EKS allow you to mount file systems as read-only.
Another reason you may want to share file systems is scale. Each file system you create provisions at least one mount target, and you can create up to 400 mount targets per Amazon Virtual Private Cloud (VPC). If you have more than 400 applications in a VPC that each require their own shared file storage, you can partition a single file system into many using directories. Even if you have less than 400 applications, sharing a file system between multiple containers may be simpler to manage, and provide higher aggregate throughput for file systems in bursting mode since the throughput of the file system is based on the total amount of data stored. To do this, once your file system is created, you can create a directory for each unique application, like /myapplication1 and /myapplication2. When you mount the file system to your container, you specify the source directory as /myapplication1, and the container will be rooted to that location and unable to see data for other applications. From a security point of view, this solution works best when you trust the administrators launching the applications, since they are responsible for scoping each application to only the directory that contains its data.
How many mount targets do I need?
Both ECS and EKS launch containers in multiple Availability Zones. To ensure that your application has access to EFS no matter where it is launched, we recommend creating an EFS mount target in all availability zones of the region you’re in. If you create your file system using the EFS Console this will happen by default. There is no incremental fee for creating additional mount targets for your file system.
We recommend configuring your containers to your file system using its DNS name, with the format file-system-id.efs.aws-region.amazonaws.com. When this DNS name is used, lookups will automatically resolve to the mount target in the same availability zone as the application, optimizing networking cost and performance. Services and frameworks that integrate natively with EFS, like ECS and EKS, do this automatically.
Can I encrypt my data?
You can encrypt both EFS data in transit and data at rest. You can enable data at rest encryption when you create your file system using either an AWS-managed or customer-managed customer master key (CMK). Encryption of data at in transit is configured on a per-connection basis. To encrypt data in transit, first ensure that the container host you’re using has the EFS mount helper installed, and configure it to mount with the “-o tls” option.
What performance mode should I choose?
EFS file systems come in two performance flavors: General Purpose and Max I/O. General Purpose is usually the best choice for interactive applications that benefit from lower per-operation latency like content management systems, developer tools, and data science notebooks. Max I/O file systems are a good choice for analytics and machine learning training or any other workload that performs parallel operations from hundreds or thousands of containers, is looking for the highest possible aggregate throughput and IOPS, and isn’t sensitive to the latency of an individual operation.
Should I use provisioned throughput?
Most customers that use EFS for container storage enable Provisioned Throughput in order to provide a consistent experience to end users. Since the throughput of a Bursting Throughput file system is based on the amount of data stored, you may find that the throughput available to a newly-provisioned application is less than what you need. To adjust for this, you can configure Provisioned Throughput with the exact throughput that your end users need. For example, customers commonly configure Jenkins file systems with 50-150 MiB/s of Provisioned Throughput, and Nexus or Artifactory repository file systems with 512-1024 MiB/s.
The nice thing about Provisioned Throughput pricing is you only pay for the amount of throughput above the baseline rate you have received in Bursting Throughput given the amount of data you are storing, and if the amount of data you’re storing allows for a higher throughput than you’ve provisioned we’ll give you the higher of the two. For example, if you provision a Jenkins container on a file system with 50 MiB/s of Provisioned Throughput, you will gradually pay less for throughput as your total storage approaches 1 TiB, after which you no longer pay for Provisioned Throughput and your allowed throughput is instead based on the 50 MiB/s per TiB stored sliding scale. If your storage dips below 1 TiB, your throughput won’t go under 50 MiB/s, so your end users will have a consistent experience.
Can I use infrequent access to lower my cost?
Absolutely. The EFS Infrequent Access (IA) storage class can lower the price of storing data in EFS by up to 92%. To take advantage of this, you can configure Lifecycle Management on your file system and specify the time period after which files than have not been read or written should transition to IA, which can be as low as 14 days or as high as 90 days. For example, artifact repositories like Nexus and Artifactory can transition over 80% of their data to IA because typically only the latest build artifacts are frequently accessed, resulting in a blended cost of $0.08 per GB-month, based on pricing in the N. Virginia (us-east-1) region.
Can I back up my data?
With AWS Backup, it’s simple to back up data for all of your containerized application’s file systems. To get started, visit the AWS Backup console and configure a backup plan, then configure the backup plan to assign resources by tag, for example “K:Backup, V:DailyBackup”. Then, when creating file systems for your containers, you add a tag for “K:Backup, V:DailyBackup”. AWS Backup will automatically detect your file system, and start backing it up according to the backup policy that you configured. If you ever accidentally delete data and need to restore from a backup, you can instruct AWS Backup to restore your backup to either a directory in the existing file system or to a new file system.
What should I monitor?
EFS emits several CloudWatch metrics that are useful for monitoring your file system. We published a sample dashboard that includes metric math expressions and alarm thresholds that make it easy to get started. In particular, I recommend keeping an eye on:
- Throughput utilization: This shows the relationship between the throughput your application is using compared to the total throughput available, either calculated or provisioned. This can show you whether you need to enable Provisioned Throughput or adjust the amount you configured for your application.
- IOPS utilization: If you have a General Purpose file system you can drive up to 7,000 aggregate IOPS. This metric shows you how many IOPS you’re driving as a percentage of this limit. If you’re reaching 100% utilization, you should consider splitting your data across multiple file systems or using a Max I/O file system with no IOPS limit.
- Burst credit balance: When your file system is in Bursting Throughput mode, your burst credit balance determines the amount of throughput you can drive. When you have burst credits, you can drive up to 100 MiB/s per TiB of storage, or 100 MiB/s, whichever is higher. When you run out of burst credits, you can drive 50 MiB/s per TiB of storage. You should set an alarm on burst credit balance so you are notified if you’re running low, since your application performance may change as a result. If you see that you are about to run out of burst credits, you can enable Provisioned Throughput to ensure your throughput stays at a consistent value for your application.
How can I get started?
Any more questions?
I hope the guidance above is useful to all of you who are deploying containers that need shared storage. Feel free to comment below with any questions that I may not have answered.