Using Amazon EFS to Persist Data from Amazon ECS Containers

My colleagues Jeremy Cowan and Drew Dennis sent a nice guest post that shows how to use Amazon Elastic File System with Amazon ECS.

—

Docker containers are ideal for building microservices because they’re quick to provision, easily portable, and provide process isolation. While these services are generally ephemeral and stateless, there are times when you want to persist data to disk or share it among multiple containers; for example, when you are running MySQL in a Docker container, capturing application logs, or simply using it as temporary scratch space to process data.

In this post, I’ll discuss how to persist data from Docker containers to Amazon Elastic File System (Amazon EFS), a storage service for Amazon EC2 instances based on the NFSv4 protocol.

Note: AWS offers a number of options to store data on Amazon EC2 instances, including Amazon EBS General Purpose, Amazon EBS Provisioned IOPS, and Amazon EFS. Review the I/O characteristics of your workload to select the most appropriate storage.

Amazon EC2 Container Service (Amazon ECS) is a highly-scalable, high performance container management service that supports Docker containers and allows you to run applications easily on a managed cluster of EC2 instances. The ECS service scheduler places tasks—groups of containers used for your application—onto container instances in the cluster, monitors their performance and health, and restarts failed tasks as needed.

Using task definitions, you can define the properties of the containers you want to run together and configure containers to store data on the underlying ECS container instance that your task is running on. Because tasks can be restarted on any ECS container instance in the cluster, you need to consider whether the data is temporary or needs to persist. If your container needs access to the original data each time it starts, you require a file system that your containers can connect to regardless of which instance they’re running on. That’s where EFS comes in.

EFS allows you to persist data onto a durable shared file system that all of the ECS container instances in the ECS cluster can use. Moreover, by using EFS you won’t need to monitor available disk space on your ECS cluster instance because the EFS file system will grow automatically as the amount of data increases. With EFS you only pay for the amount of data that’s stored in the EFS file system. Lastly, data management becomes a lot simpler because all your data can be stored on a single EFS volume.

Provisioning an ECS cluster

For this post, I used an AWS CloudFormation template which is available for download. I’ll walk through what the template does for you.

Note: This template requires you to be enrolled in the Amazon EFS preview.

Networking and security

The first thing the template does is create a VPC with subnets, an Internet gateway, and associated routes. After the network infrastructure is in place, it creates two security groups: one for the EFS file system mount targets and another for the ECS container instances. Two inbound rules and one outbound rule are then added to the ECS security group. The inbound rules allow SSH (22) and MySQL (3306) inbound from anywhere (0.0.0.0/0).

Note: This is for demonstration purposes only. We do not recommend creating rules that allow unfettered access to resources in your VPC.

These rules allow you to connect to the ECS container instances and containers themselves using an SSH and MySQL client which I’ll demonstrate later. The outbound rule allows all traffic outbound to anywhere, and is primarily there to allow the ECS container instances to connect to the EFS file system via mount targets in your VPC. Next, the template adds an inbound and outbound rule to the EFS security group. The inbound rule allows EFS (2049) traffic inbound from the VPC CIDR range. The outbound rule allows all traffic from anywhere. Together, these rules allow your ECS container instances to connect to the EFS mount points. If you’re unfamiliar with how to create security group rules, see Adding Rules to a Security Group.

IAM roles

In addition to creating security group rules, the template creates an IAM instance role with a managed policy that allows the EC2 container instances to register and deregister with the ECS cluster, create an ECS cluster, and a handful of other actions. The policy looks like the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:CreateCluster",
        "ecs:DeregisterContainerInstance",
        "ecs:DiscoverPollEndpoint",
        "ecs:Poll",
        "ecs:RegisterContainerInstance",
        "ecs:Submit*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

For more information about instance roles, see IAM Roles for Amazon EC2.

EFS file systems

After the security infrastructure is in place, the template creates an EFS file system and associated mount points in each of the VPC subnets. As part of the EFS provisioning process, the template adds the EFS security group you created earlier to each mount target. This allows EC2 instances within the VPC to connect to the EFS file system.

Note: When using EFS, we recommend connecting to a mount target in the same Availability Zone as your EC2 instance.

Finally, the template assigns a key-value pair to the volume. For more information about creating EFS file systems, see Getting Started with Amazon Elastic File System.

Load balancer (optional)

The template also provisions an ELB load balancer, which is subsequently added to the ELB security group created earlier. The load balancer is used to distribute traffic across ECS tasks that run on separate and distinct ECS container instances. An ECS service is a set of ECS tasks that run on the cluster. During the configuration of a service, you specify how many instances of a particular task definition to run on the cluster. I’ll discuss creating a service later in this post.

Auto Scaling group and launch configuration

The template creates an Auto Scaling group and launch configuration as well. The Auto Scaling group is used to set the minimum, maximum, and initial size of the ECS cluster and the launch configuration specifies which AMI, instance type, IAM role, user data, and other EC2 instance properties to use when bootstrapping a new instance. While not part of this CloudFormation template, you could create an Amazon CloudWatch alarm to trigger an Auto Scaling event that adds ECS container instances to the cluster automatically when the cluster’s capacity drops below a particular threshold.

Bootstrapping ECS cluster instances

The CloudFormation template creates an Auto Scaling launch configuration based on the user data script below, to bootstrap instances automatically and add them to your cluster; however, if you’ve chosen to create your own environment using an alternate method, you can use the script below as a reference.

Note: This script has multiple dependencies. Carefully review the list below before running the script in your environment:

A cluster named ‘default’
An ECS-optimized AMI from Amazon
An EC2 instance on a public subnet
An EC2 security group that allows all traffic outbound
An EFS security group that allows 2049 inbound and outbound
An instance role with an attached ECS inline policy and an attached AmazonElasticFileSystemReadOnly managed policy
Read and write access to the EFS file system
An EFS file system with the key-value pair, Name:efs-docker. The tag is used by the script to identify the EFS file system to which to connect

For a general overview of Auto Scaling groups and launch configurations, see What Is Auto Scaling?

Content-Type: multipart/mixed; boundary="===============BOUNDARY=="
MIME-Version: 1.0

--===============BOUNDARY==
MIME-Version: 1.0
Content-Type: text/x-shellscript; charset="us-ascii"

#! /bin/bash
#Put your standard user data here
echo "extra standard user data"

--===============BOUNDARY==
MIME-Version: 1.0
Content-Type: text/cloud-boothook; charset="us-ascii"

#cloud-boothook
#Join the default ECS cluster
echo ECS_CLUSTER=default >> /etc/ecs/ecs.config
PATH=$PATH:/usr/local/bin
#Instance should be added to an security group that allows HTTP outbound
yum update
#Install jq, a JSON parser
yum -y install jq
#Install NFS client
if ! rpm -qa | grep -qw nfs-utils; then
    yum -y install nfs-utils
fi
if ! rpm -qa | grep -qw python27; then
	yum -y install python27
fi
#Install pip
yum -y install python27-pip
#Install awscli
pip install awscli
#Upgrade to the latest version of the awscli
#pip install --upgrade awscli
#Add support for EFS to the CLI configuration
aws configure set preview.efs true
#Get region of EC2 from instance metadata
EC2_AVAIL_ZONE=`curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone`
EC2_REGION="`echo \"$EC2_AVAIL_ZONE\" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"
#Create mount point
mkdir /mnt/efs
#Get EFS FileSystemID attribute
#Instance needs to be added to a EC2 role that give the instance at least read access to EFS
EFS_FILE_SYSTEM_ID=`/usr/local/bin/aws efs describe-file-systems --region $EC2_REGION | jq '.FileSystems[]' | jq 'select(.Name=="efs-docker")' | jq -r '.FileSystemId'`
#Check to see if the variable is set. If not, then exit.
if [-z "$EFS_FILE_SYSTEM_ID"]; then
	echo "ERROR: variable not set" 1> /etc/efssetup.log
	exit
fi
#Instance needs to be a member of security group that allows 2049 inbound/outbound
#The security group that the instance belongs to has to be added to EFS file system configuration
#Create variables for source and target
DIR_SRC=$EC2_AVAIL_ZONE.$EFS_FILE_SYSTEM_ID.efs.$EC2_REGION.amazonaws.com
DIR_TGT=/mnt/efs 
#Mount EFS file system
mount -t nfs4 $DIR_SRC:/ $DIR_TGT
#Backup fstab
cp -p /etc/fstab /etc/fstab.back-$(date +%F)
#Append line to fstab
echo -e "$DIR_SRC:/ \t\t $DIR_TGT \t\t nfs \t\t defaults \t\t 0 \t\t 0" | tee -a /etc/fstab
--===============BOUNDARY==--

ECS cluster

After all the previous steps are completed, an ECS cluster is created. With the cluster in place, the template’s Auto Scaling group launches two instances, which automatically join the cluster as they’re bootstrapped.

If you prefer to create a cluster from scratch instead, you can follow the directions in the documentation Setting Up with Amazon ECS.

Creating an ECS task definition

Now that the infrastructure is ready, you can create a task that persists data on the EFS file system. For this example I use MySQL, as you generally want to persist data that’s stored in a database.

Open the ECS console at https://console.aws.amazon.com/ecs/.
Select the cluster created by the CloudFormation template.
Choose Create new task definition.
Give your container a name, e.g., MySQL.
Choose Add volume.
In the Name field, type efs. You use this value to reference the EFS mount point later.
In the Source path field, type /mnt/efs/mysql.
Note: This is the path to your EFS file system mounted on the EC2 container instance.
Choose Add container definition.
Enter a name for the container, e.g., MySQL.
In the Image field, type the name of the container stored in the Docker registry, e.g., mysql, that you’re using to retrieve the office MySQL container from Docker Hub.
Assign the appropriate amount of memory and CPU units.
Assign port mappings, if necessary.
Note: The default port for MySQL is 3306.
In the Source volume field, type the name you gave to the file system, e.g., efs.
In the Container path field, type the name of the directory you want to persist on to the EFS volume, e.g., /var/lib/mysql.
In the Environment variables field, type “MYSQL_ROOT_PASSWORD” for the key and “password” for the value.

Note: This is for demonstration purposes only. We do not recommend using plaintext environment variables for sensitive values.

Alternatively, you can copy and paste the following text in to the JSON tab:

{
  "family": "MySQL",
  "containerDefinitions": [
    {
      "name": "MySQL",
      "image": "mysql",
      "cpu": 10,
      "memory": 500,
      "entryPoint": [],
      "environment": [
        {
          "name": "MYSQL_ROOT_PASSWORD",
          "value": "password"
        }
      ],
      "command": [],
      "portMappings": [
        {
          "hostPort": 3306,
          "containerPort": 3306,
          "protocol": "tcp"
        }
      ],
      "volumesFrom": [],
      "links": [],
      "mountPoints": [
        {
          "sourceVolume": "efs",
          "containerPath": "/var/lib/mysql",
          "readOnly": false
        }
      ],
      "essential": true
    }
  ],
  "volumes": [
    {
      "name": "efs",
      "host": {
        "sourcePath": "/mnt/efs/mysql"
      }
    }
  ]
}

When you’re finished entering all the parameters, choose Add. Upon returning to the task definitions page, select the task you created from the list and choose Run task from the Actions menu. This starts your newly-created task on an EC2 container instance in the cluster.

If you SSH into the container instance where your task is running and run ‘docker exec’ command to connect to the container, you can see the files in the /var/lib/mysql directory. Now, exit the exec session and list the files in the /mnt/efs/mysql directory; you’ll see they’re the same, proving that the files are stored on the EFS file system.

You can also connect to the container using MySQLWorkbench.

Creating a service

While a task can have a finite lifespan, the ECS service scheduler ensures that the specified number of tasks are constantly running and reschedules tasks when a task fails (for example, if the underlying container instance fails for some reason). The service scheduler optionally also makes sure that tasks are registered against an Elastic Load Balancing load balancer. To create a service from the task you created earlier, follow the directions:

Open the Amazon ECS console at https://console.aws.amazon.com/ecs.
In the navigation pane, select Task Definitions.
On the Task Definitions page, choose the name of the task definition you created earlier, e.g., MySQL.
On the Task Definition name page, choose the revision 1.
Review the task definition, and choose Create Service.
On the Create Service page, enter a unique name for your service in the Service name field, e.g., MySQL_Service.
In the Number of tasks field, enter 1.

Note: You should only run one instance of a stateful task in a cluster. Running multiple instances of a stateful task may cause instability, as each task writes to the same directory on the EFS file system.

For more information, see Creating a Service.

Conclusion

In this post, we looked at how you can use EFS to persist data from ECS containers. This allows you to run stateful containers, like MySQL, on ECS without worrying about what happens when your container is restarted on another instance in the cluster. That’s because each instance maintains a connection to the shared EFS file system where your MySQL data is stored.

In fact, if you put the ECS service we created behind an ELB load balancer, your clients don’t have to be reconfigured when your container moves. You can see an example of this in action by creating an ECS cluster with greater than 2 EC2 instances and terminating the instance that the MySQL task is running on. Not only does the MySQL task start on another instance in the cluster, but the terminated instance is replaced with another instance.

The other advantages to using EFS are that you only pay for the amount of storage that’s being consumed, and the file system grows automatically as you add files, eliminating the need for you to monitor disk space and helping you avoid paying for unused capacity.

We hope you found this post useful and look forward to your comments about where you plan to implement this in your current and future projects.

AWS Compute Blog