Containers
Developers guide to using Amazon EFS with Amazon ECS and AWS Fargate – Part 3
Welcome to Part 3 of this blog post series on how to use Amazon EFS with Amazon ECS and AWS Fargate. For reference, these are the blog posts in this series:
- Part 1: This blog provides the background about the need for this integration, its scope and provides a high-level view of the use cases and scenarios this feature unlocks and that it enables for our customers
- Part 2: A deep dive on how EFS security works in container deployments based on ECS and Fargate with some high-level considerations around regional ECS and EFS deployments best practices
- Part 3: [this blog post] A practical example, including reusable code and commands, of a containerized application deployed on ECS tasks that use EFS
In this post, we are going to create code examples to try what we learned in Parts 1 and 2. We are going to segment this blog in two main blocks (with two separate examples). They are:
- Stateful standalone tasks to run applications that require file system persistency
- Multiple tasks that access in parallel a shared file system
If you want to read more about the theory behind these, please refer to Part 1. We are now going to dive deep into the code examples.
In these examples, the ECS tasks will be run on Fargate but the exact same workflow would apply if you were to run the same tasks on EC2 instances (using a different task launch type).
Prerequisites and considerations for running the examples
The examples below assume you have a VPC available with at least two public and two private subnets, one per Availability Zone. There are many ways to create such VPC: from the AWS CLI via CloudFormation all the way to the CDK. If you don’t have a standard way to create a temporary VPC for this exercise this would be a good option.
The examples and commands also assume you have a proper AWS CLI v2 environment at the latest version (previous versions may not support the new integration). In addition the client, your user needs the ability to build container images (using Docker) and needs to have the jq
and curl
utilities installed. I used eksutils in an AWS Cloud9 environment but you can use any setup that has the prerequisites mentioned.
While a higher degree of automation could be achieved in the code examples, we tried to create a fair level of interaction so that you understand, at every step, what’s being done. This is primarily a learning exercise. It is not how we would recommend building a production-ready CI/CD pipeline.
Each section may use system variables populated in the previous session so it’s important you keep the same shell context. For convenience, the scripts and commands outlined below echo the content of those variables on terminal as well as they “tee” these to a log file (ecs-efs-variables.log
) if you need them to recreate the context at any point.
From your terminal, let’s start laying out the plumbing with the variables that represents your environment and the initialization of the log file. Failure to set these environment variables, may lead to failures in the examples provided.
Stateful standalone tasks to run applications that require file system persistency
This example mimics an existing application that requires configurations to persist across restarts. The example is based on NGINX and it’s fairly basic but it is intended to be representative of more complex scenarios our customers have that require the features we are going to leverage.
This custom application can only run, by design, standalone. It is effectively a singleton. It stores important configuration information in a file called /server/config.json
. This application is limited to store this information on the file system. No changes can be made to its code and we need to work within the boundaries of the application architecture characteristics.
The information in the configuration file is generated when the application is installed and starts for the first time but it needs to persist when a task restarts. First, start up the application generates a RANDOM_ID
and it saves it into the critically important /server/config.json
file. The unique code id is then imported into the home page of the web server. If the application needs to restart, it checks if the file is there. If it doesn’t exist, it assumes it is the first time the application launches and it will create it. If it exists, it skips its recreation.
This is how this logic is implemented in the startup script (startup.sh
) of this application:
This application is only used during standard working hours in a given timezone and we would like to create a workflow that starts the app at 7AM in the morning and shut it down at 7PM in the night. This will help cutting in half the bill for the application.
Before the EFS integration, if you were to launch this application on an ECS task, upon a restart the RANDOM_ID
in the /server/config.json
file would be lost. The script would re-generate the file with a new id and this would cause the application to break.
We decide to package this application in a container and to do so we author the following Dockerfile
in the same directory where we created the startup.sh
file.
We are now ready to:
- create an ECR repo called standalone-app
- build the image
- log in to ECR
- push the container image to the ECR repo
At this point, we are ready to create a basic ECS task without EFS support to demonstrate the limitations of an ephemeral deployment. Before we do that we need to create an IAM policy document that allows the tasks to assume an execution role. Create a policy document called ecs-tasks-trust-policy.json
and add the following content:
Now we can create a task definition called standalone-app.json
with the following content. Make sure you edit the image with the content of the variable $ECR_STANDALONE_APP_REPO_URI, the account id, and the Region.
In the next batch of commands, we:
- create an ECS cluster to hold our tasks
- create a task execution role (
task-exec-role
) - assign the AWS managed
AmazonECSTaskExecutionRolePolicy
policy to the role - register the task definition above
- create a log group
- create and configure a security group (
standalone-app-SG
) to allow access to port 80
This is the setup we built with the ephemeral task representing our application being recycled:
We are now going to demonstrate what happens when you start and stop this task. To do so, we will create a script that will cycle in a loop. The script will start and stop the application every two minutes, for five times. The script will query the application while the task is running.
This is the standalone-loop-check.sh
script:
Add the execute flag to the script (chmod +x standalone-loop-check.sh
) and launch it. The output should be similar to this:
As you can see, the RANDOM_ID
changes at every restart and this will break the application. We need to find a way to persist the /server/config.json
file across restarts. Enter EFS.
We will shortly configure an EFS file system and made it accessible to the ECS tasks. Before we dive into the AWS CLI commands that will make it happen, we need to create a policy document called efs-policy.json
. This policy, which we will use with the CLI, contains a single rule, which denies any traffic that isn’t secure. The policy does not explicitly grant anyone the ability to mount the file system:
We are now ready to configure the EFS service. In the next batch of commands we are going to:
- create an EFS file system
- set a default policy that enforces in-transit encryption for all clients
- create and configure security group (
efs-SG
) that allows in-bound access on port 2049 (the NFS protocol) fromstandalone-app-SG
- create two mount targets in the two private subnets
- create an EFS Access Point called
standalone-app-EFS-AP
that maps to the directory/server
We are now ready to launch the AWS CLI commands that will create the setup mentioned above:
If you want to know more about why we opted to create an EFS Access Point to mount the /server
directory on the file system, please refer to Part 2 where we talk about the advantage of using access points.
Now that we have our EFS file system properly configured, we need to make our application aware of it. To do so, we are going to:
- create an IAM role (
standalone-app-role
) that grants permissions to map the EFS Access Point - tweak the task definition (s
tandalone-app.json
) to:- add a task role that grants permissions to map the EFS Access Point
- add the directives to connect to the EFS Access Point we created above
Create a policy called standalone-app-task-role-policy.json
and add the following, making sure you properly configure your EFS file system ARN and your EFS Access Point ARN. This information should be on your screen when we printed the variables above or you can refer to the ecs-efs-variables.log
file. This policy grants access to that specific access point we have created.
Open the standalone-app.json
task definition and add the taskRoleArn
, the mountPoints
section, and the volumes
section. You can either recreate the file from this skeleton (to be re-customized) or you can add the above directives to the original standalone-app.json
task definition you have already customized.
We are now ready to launch the batch of commands that will implement the integration between ECS and EFS:
With this, we have decoupled the lifecycle of the task from “the data.” In our case, the data is just a configuration file but it could be anything. This is a visual representation of what we have configured:
Let’s see what happens if we launch the very same script that we used before. As a reminder, this script cycles the task with five consecutive starts and stops every two minutes:
Do you spot anything different from the previous run? Now the RANDOM_ID
persists across application restarts because the configuration (in our example the file /server/config.json
) has been moved out on the EFS share. Also note how the standalone tasks are started in different Availability Zones but they can reach to the same file system from any place. Mission accomplished!
Multiple tasks that access in parallel a shared file system
In this section, we will build on what we have seen so far and we will demonstrate how tasks working in parallel can access a common shared file system. We will keep using our application as a proxy to the infinite possibilities that this pattern allows customers achieve (whether it’s for deploying a scale-out WordPress workload or a parallel machine learning job).
Our (fictitious) application is now serving a broader and distributed community. We no longer can afford to turn it off during non working hours because it’s now serving users 24/7. Not only that, changes have been introduced to the architecture such that now the application can scale out. This is a welcome enhancement given the load it needs to support. There is, however, always the prerequisite of persisting the /server/config.json
file and we now need to solve for how we can allow multiple ECS tasks to access in parallel the same file. We will elect the task we defined in the previous section to be the “master“ of this application with read/write permissions to the EFS /server
folder. In this section, we are going to leverage the same EFS Access Point pointing to the /server
directory and we will provide read only access to a set of 4 ECS tasks to serve the load behind a load balancer.
The approach above shows how you can bypass POSIX permissions and delegate to AWS policies various degrees of access to the EFS file system. Refer to Part 2 of this blog series if you want to read more about this.
We create a new policy document called scale-out-app-task-role-policy.json
. Note this policy grants read only access to the access point. Make sure you properly configure your EFS file system ARN and your EFS Access Point ARN.
We can now create the new task role and attach the policy document we have just created.
Next, we are creating a new task definition called scale-out-app.json
. This file is similar to the standalone-app.json
task definition we used in the previous section with some notable differences:
- the
family
- the
containerDefinitions/name
- the
awslogs-group
andawslogs-stream-prefix
- the
taskRoleArn
(the one we created in this section) - the
accessPointId
Now we can register this new task definition:
And we are now ready to launch the last batch of commands to create the deployment of the scale-out version of the application. These commands do the following:
- create and configure a security group for the scale-out application (
scale-out-app-SG
) - add the
scale-out-app-SG
to theefs-SG
to allow the scale-out app to talk to the EFS mount targets - create and configure an ALB to balance traffic across the four tasks
- create a dedicated log group (
/aws/ecs/scale-out-app
) to collect the logs - create an ECS service that starts the four Fargate tasks
The following diagram shows what we have created:
Let’s see how the application behaves in action now. To do this, we will run a loop where curl
hits the load balancer public DNS name to query the home page. This is the scale-out-loop-check.sh
script:
Add the execute flag to the script (chmod +x scale-out-loop-check.sh
) and launch it. The output should be similar to this:
As you can see, all four tasks are being balanced by the ALB and everyone responds with the same RANDOM_ID
coming from the now shared /server/config.json
file. Again, these tasks by design are deployed across the Availability Zones we have configured and yet they have access to the same data. Mission accomplished!
Tearing down the environment
It is now time to tier down the environment you have created. This is the list of commands to delete the resources we have implemented in this blog post:
Remember to delete the VPC if you created it for the purpose of this exercise.
Conclusions
This concludes our series of blog. In Part 1, we have explored the basics and the context of the ECS and EFS integration and the context. In Part 2, we have explored some of the technical details about architectural considerations with a focus on how to secure access to EFS. In this last part, we tied all together and we showed examples of how you could implement what we have seen in the previous posts. By now you should have the basis to understand the applicability of your integration and the knowledge to build something specific to your needs with it.