Developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 2

Welcome to Part 2 of this blog post series on how to use Amazon EFS with Amazon ECS and AWS Fargate. For reference, these are the blog posts in this series:

Part 1: This blog provides the background about the need for this integration, its scope and provides a high-level view of the use cases and scenarios this feature unlocks and that it enables for our customers
Part 2: [this blog post] A deep dive on how EFS security works in container deployments based on ECS and Fargate with some high-level considerations around regional ECS and EFS deployments best practices
Part 3: A practical example, including reusable code and commands, of a containerized application deployed on ECS tasks that use EFS

The target audience for this blog post is developers using Amazon ECS and AWS Fargate that want to learn how to deploy regionally resilient and scalable stateful services using the integration with Amazon EFS.

Part 3 is where we will put all this theory to work!

Amazon ECS and AWS Fargate architecture for availability, cost, and scale

Amazon ECS is a regionally distributed container orchestrator fully managed by AWS. The ECS control plane exposes a regional endpoint and that is the only thing you need to be aware of. Because of this, you get resiliency and scale out of the box without having to think about it.

As alluded to in Part 1 of this series, ECS supports deploying tasks on both customer managed Amazon EC2 instances as well as on AWS Fargate. Both EC2 and Fargate resources can be provisioned using saving plans and spot capacity creating a flexible mix of architectural and purchasing options. The integration between ECS and EFS works across all these options.

Below is a visual representation of the regional ECS centralized control plane (fully managed) that deploys ECS tasks on both EC2 instances and the Fargate fleet. The deployments occur across Availability Zones for resiliency and scaling purposes and can mix spot and On-Demand Instance resources for pricing optimization.

Amazon EFS architecture for availability, cost, and scale

Similarly to Amazon ECS, Amazon EFS is a regional service. It is a distributed, managed, fully elastic file sharing solution fully managed by AWS. An EFS customer doesn’t need to worry about any infrastructure component.

When you create an EFS file system, it gets created on an infrastructure that spans a number of Availability Zones (at least 3) for durability. However this is independent from how you access the file system. For availability, the file system can be concurrently mounted and accessed from all the Availability Zones in the Region. The way this file system is exposed to the users is through EFS mount targets. Mount targets are logical endpoints of your file system that you can create, if you so choose, in all subnets of your VPC. You choose the Availability Zones where you want to create these mount targets by picking the proper subnets at file system creation time. These mount targets have their own VPC IP address and internal DNS names.

Cost and scale are where EFS really shines. By entirely abstracting infrastructure capacity, EFS allows the customers to scale infinitely and only pay for what they are actually using. You don’t have a minimum nor a maximum file system size. It is important to keep into account the various tangential EFS Quotas and Limits as you design your solution. If you want to read more about some best practices around performance and generic best practices when using containers and EFS together, this is a good read.

This is the complete picture of an ECS deployment leveraging EFS. Tasks deployed across various Availability Zones will connect to the closest mount target in the same Availability Zone:

A deep dive into the Amazon EFS security model

This is the area where this blog post is going to go deep. In addition to the fact that security is our highest priority, the EFS security model is very rich and flexible.

The EFS security model has two macro dimensions:

Network security. This answer the question “can my task access the EFS mount target elastic network interface (ENI) from a network routing and network security perspective?”
Client security. This answer the question “does my task have permissions to read from and write to the EFS file system?”

In the following two sections, we will expand and dive into these two dimensions.

The EFS network security model

The EFS network security model is based on standard AWS network security constructs (namely, security groups and network access control policies) and it is fairly immediate to understand if you know them already.

As we outlined above, each EFS mount target is an elastic network interface (ENI) that gets connected to a VPC subnet. Customers can opt in and opt out of subnets as they see fit. This is how customers can choose for which Availability Zones they want to enable EFS mount targets. Each mount target can be assigned up to five security groups. By default, if no other SG is specified, the VPC default security group is selected.

This is an example of how a default configuration would look like when I create a new EFS file system:

What this means, in simple terms, is that every ECS task in the same VPC that uses the same default security group will have access to these mount points. Everything outside of the VPC, by default, has network access denied.

It is very likely that your network security posture will require more sophistication, but the way you’d secure the network flow between your EFS mount targets and your ECS tasks isn’t any different than how you would secure other constructs inside your VPC. The key point to remember is that the security group(s) you assign to the mount targets need allow in-bound traffic for the NFS protocol (port 2049). Please refer to the EFS documentation for additional details on this aspect.

Similarly to security groups, network ACL (network access control lists) can be used to control traffic to and from the various subnets. It is important to note that NACLs in a given VPC allow all traffic by default.

The EFS client security model

While this section is largely focused on EFS security primitives, we will look at this in the context of consuming EFS from ECS tasks. At the end of this section you should have a better understanding of the options to configure EFS storage in your task definition:

We will start with some key core tenets that govern how EFS client security works. You may find yourself using these tenets as reference material as you go deeper into this section:

EFS client security is layered and it is the result of the intersection of both AWS access policies (i.e. the merge of resource-based policies and IAM policies) as well as standard POSIX file system level permissions.
Both layers can be used to deny or allow read, write and root access to a given EFS object. Deny always wins (as it should).
EFS implements an access abstraction to the file system called EFS Access Point that enable an application-based pattern. EFS Access Points allow you to assign granular AWS policy permissions at the file system folders level (more on this later).
EFS can squash a root user if the proper AWS access roles don’t allow for that level of access (e.g. if you are trying to run sudo touch filename.txt).

Now that we laid out the main tenets, let’s see how they all come together.

AWS policies

Let’s start from the AWS policies. IAM and resource-based policies have three actions that control access to the EFS resource:

elasticfilesystem:ClientMount (read only access)
elasticfilesystem:ClientWrite (read/write access)
elasticfilesystem:ClientRootAccess (root access)

These actions are themselves layered and build on each other (e.g. ClientWrite doesn’t assume ClientMount, it needs to be specifically listed).

These actions can be defined in policies both at the client level (through IAM policies associated to IAM roles assigned to ECS tasks) or at the EFS file system level (through resource-based policies). If the resource access based policy does not exist, by default at file system creation access is granted to all principals (*).

The client policy (associated to the task as an IAM policy via its task role) needs to be explicitly enabled using the flag “EFS IAM authorization” in the screenshot above (or via the equivalent API). If “EFS IAM authorization“ is enabled, the policy associated to the task role and the EFS resource-based policy are merged and standard policies merging applies. Note that this flag does not pass the IAM policy to EFS alone, it also passes the identity of the task (i.e. IAM role). This is because you may have a rule in the resource-based policy that allows a specific IAM user or role access to a resource. Flagging “EFS IAM authorization“ is how you pass the IAM user or role context to EFS, in addition to the IAM policy. If you do not enable this, the EFS resource-based policy will identify you as ”anonymous.“

POSIX permissions

In addition to AWS policies, EFS client access security is governed by the standard POSIX permission model. Read, write, and execute permissions are defined for the user, for the group the user belongs to and for everyone else. If you have worked as a Unix sysadmin 20+ years ago you may still vaguely remember them (ask me how I know).

The UID:GID that is used to check these permissions is defined on the client. In an EC2 scenario, this would be the Linux user (as defined in /etc/passwd) attempting to access the EFS file system. In the ECS task scenario, this would be the user defined in the Dockerfile of the container image(s) used as part of the ECS task. This could be root (the default) or an explicitly defined user.

One thing that is important to call out is that if you present yourself as root (or impersonificating root by using sudo), and your AWS policies allow the ClientRootAccess action you have full privileged access to the file system.

However, if the AWS policies do not allow the ClientRootAccess action, your user is going to be squashed to a pre-defined UID:GID that is 65534:65534. From this point on, standard POSIX permissions apply: what this user can do is determined by the POSIX file system permissions. For example, a folder owned by any UID:GID other than 65534:65534 that has 666 (rw for owner and rw for everyone) will allow this reserved user to create a file. However, a folder owned by any UID:GID other than 65534:65534 that has 644 (rw for owner and r for everyone) will NOT allow this squashed user to create a file.

Considerations on working with AWS policies and POSIX permissions

As we alluded to, AWS policies and POSIX permissions need to align for the ECS task to be able to do what it needs to do. For example, imagine that an ECS task needs to write to the root directory (“/”) of an EFS file system. If POSIX permissions are in order and allow everyone to rwx but the AWS policies do not allow ClientWrite, the task won’t have permissions to write to the file system.

As you may have figured out at this point, managing client access at the AWS policy level is somewhat easy and scalable. It’s also very “AWS native.” However, working with POSIX permissions in a cloud environment and specifically in a container context, isn’t so much.

POSIX permissions assume a tight control over how you have defined users on your own computer (in /etc/passwd) or in your local network (via tools like NIS). There is nothing like this in the cloud (or at least nothing that would allow this to scale properly in the cloud). Also container images are either created with the root user baked in (a very bad practice) or a random less privileged user. The latter is definitely a better practice but how do you set proper POSIX permissions end-to-end when using an EFS file system if these users are picked random by container image authors? You would need to configure POSIX permissions based on the UID:GID that a specific container uses.

All in all, this approach may have worked fine 20 years ago in a small LAN network but it would be a nightmare to manage at cloud scale and with the current application deployment patterns in the container ecosystem.

EFS Access Points

POSIX permissions made sense in the past for small LAN networks in tightly controlled environments and still make sense for some EC2-based use cases. However, they have been largely outgrown by more recent application-centric deployment models, such as containers. How do you abstract all the no longer required NFS file permissions while honoring how the NFS protocol work and behave?

Enter EFS Access Points. Last year the EFS team introduced a new feature that allows an overall simplification of the EFS file system access patterns increasing the security granularity with an application-first approach to consume EFS. This feature is called EFS Access Points and it enabled two major improvements:

granular IAM policy access at the EFS file system directory level
enforcing upfront a set of UID:GID as well POSIX file system permissions

The advantages of a granular IAM policy are obvious: users now have an option to allow/deny AWS principals at a folder level within a single EFS file system. This is useful in those use cases where you want to use a single file system with different user profiles.

The pre-configuration of UID:GID and file system permissions allows you to abstract completely the POSIX permissions layer.

You could implement a simplified EFS access strategy that:

creates a directory owned by an arbitrary UID:GID
assigns rwx access on a per need basis (e.g. 755)
enables the mapping of the client POSIX user to the above arbitrary UID:GID
controls access to this specific Access Point via AWS policies assigned to ECS tasks

What happens in this case is that, regardless of the POSIX user that is configured inside the EFS client (the container), all the EFS calls will be done by impersonating the arbitrary UID:GID configured on the Access Point. In a way, the access point masks the actual container user with the user you configured at the access point thus simplifying drastically the configurations required. You are effectively declaring that, regardless what POSIX user that container uses, you are always impersonating the arbitrary UID:GID user you configured.

This is only an extreme example. EFS Access Points supports configurations that allow multiple scenarios.

EFS Access Points enable three high-level use cases for ECS:

sharing a file system between multiple applications (e.g. to share throughput). In this case, each access point gives you a separate jailed directory for each application so they can’t see each other.
sharing a file system between services that need to share data, as it allows you to create IAM policies at the ECS task level where one service has write-only access to a directory, another has read-only, and another has read/write.
securely provisioning limited access to ML datasets to training containers. You can create an access point that allows read-only access to a /noPII directory for instance.

Conclusions

This concludes Part 2 of this series. In this blog, we laid out the background of how the EFS and ECS technologies work with a focus on availability, cost, scale, and a specific drill down on security aspects. In Part 3, we are going to put the theory we learned in this post to good work. Part 3 is where the rubber hits the road. We will implement a simple example that touches on the use cases we introduced in Part 1 as well as that covers as many as the technology aspects we have discussed in this blog. See you there.

Containers

Developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 2

Amazon ECS and AWS Fargate architecture for availability, cost, and scale

Amazon EFS architecture for availability, cost, and scale

A deep dive into the Amazon EFS security model

Conclusions

Resources

Follow