AWS Compute Blog

Better Together: Amazon ECS and AWS Lambda

My colleague Constantin Gonzalez sent a nice guest post that shows how to create container workers using Amazon ECS.

Amazon EC2 Container Service (Amazon ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.

AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device.

In this post, we show you how to combine the two services to mutually enhance their capabilities: See how you can get more out of Lambda by using it to start ECS tasks and how you can turn your ECS cluster into a dynamic fleet of container workers that react to any event supported by Lambda.

Example Setup: Ray-tracing high-quality images in the cloud

To illustrate this pattern, you build a simple architecture that generates high-quality, ray-traced images out of input files written in a popular, open-source raytracing language called POV-Ray, conveyed by POV and licensed under either POV’s proprietary license (up to version 3.6) or AGPLv3 (version 3.7 onwards). Here’s an overview of the architecture:

To use this architecture, put your POV-Ray scene description file (a POV-Ray .POV file) and its rendering parameters (a POV-Ray .INI file), as well as any supporting other files (e.g., texture images), into a single .ZIP file and upload it to an Amazon S3 bucket. In this architecture, the bucket is configured with an S3 event notification which triggers a Lambda function as soon as the .ZIP file is uploaded.

In similar setups (such as transcoding images), Lambda alone would be sufficient to perform its job on the uploaded object within its allocated time frame (currently 60 seconds). But in this example, we want to support complex rendering jobs that usually take significantly longer.

Therefore, the Lambda function simply takes the event data it received from S3 and sends it as a message into an Amazon Simple Queue Service (SQS) queue. SQS is a fast, reliable, scalable, fully-managed message queuing service. The Lambda function then starts an ECS task that can fetch and process the message from SQS.

Your ECS task contains a simple shell script that reads messages from SQS, extracts the S3 bucket and key information, downloads and unpacks the .ZIP file from S3 and proceeds to starting POV-Ray with the downloaded scene description and data.

After POV-Ray has performed its rendering magic, the script takes the resulting .PNG picture and uploads it back to the same S3 bucket where the original scene description was downloaded from. Then it deletes the message from the queue to avoid duplicate message processing.

The script continues pulling, processing, and deleting messages from the SQS queue until it is fully drained, then it exits, thereby terminating its own container.

Simple and efficient event-driven computing

This architecture can help you:

  • Extend the capabilities of Lambda to support any processing time, more programming languages, or other resource requirements, to take advantage of the flexibility of Docker containers.
  • Extend the capabilities of ECS to allow event-driven execution of ECS tasks: Use any event type supported by Lambda to start new ECS tasks for processing events, run long batch jobs triggered by new data in S3, or any other event-driven mechanism that you want to implement as a Docker container.
  • Get the best of both worlds by coupling the dynamic, event-driven Lambda model with the power of the Docker eco-system.

Step-by-Step

Sounds interesting? Then get started!

This is an advanced architecture example covering a number of AWS services like Lambda, ECS, S3, and SQS in depth as well as using multiple related IAM policies. The underlying resources are in your account and subject to their pricing.

To make it easier for you to follow, we have published all necessary code and scripts on GitHub in the awslabs/lambda-ecs-worker-pattern repository. You might find it even more helpful if you could become familiar with the mentioned services by working through the respective Getting Started documentation first.

Meet the following prerequisites:

This post walks through the steps required for setup, then gives you a simple Python script that can perform all the steps for you.

Step 1: Set up an S3 bucket
Start by setting up an S3 bucket to hold both the POV-Ray input files and the resulting .PNG output pictures. Choose a bucket name and create it:

$ aws s3 mb s3://<YOUR-BUCKET-NAME>

Step 2: Create an SQS queue
Use SQS to pass the S3 notification event data from Lambda to your ECS task. You can create a new SQS queue using the following command:

$ aws sqs create-queue --queue-name ECSPOVRayWorkerQueue

Step 3: Create the Lambda function
The following function reads in a configuration file with the name of an SQS queue, an ECS task definition name, and a whitelist of accepted input file types (.ZIP, in this example).

The config file uses JSON and looks like this (make sure to use your region):

$ cat ecs-worker-launcher/config.js
{
    "queue": "https://<YOUR-REGION>.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue",
    "task": "ECSPOVRayWorkerTask",
    "s3_key_suffix_whitelist": [".zip"]
}

The SQS queue ARN (which looks like: https://eu-west-1.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue) is the output of the preceding command, in which you created your ECS queue. The “task” attribute references the name of an ECS task that you create in a future step.

The Lambda function checks the S3 object key given in the Lambda event against the file type whitelist; in case of a match, it sends a message to the configured SQS queue with the event data and starts the ECS task specified in the configuration file. Here’s the code:

// Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License").
// You may not use this file except in compliance with the License.
// A copy of the License is located at
//
//    http://aws.amazon.com/apache2.0/
//
// or in the "license" file accompanying this file.
// This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and limitations under the License.

// This Lambda function forwards the given event data into an SQS queue, then starts an ECS task to
// process that event.

var fs = require('fs');
var async = require('async');
var aws = require('aws-sdk');
var sqs = new aws.SQS({apiVersion: '2012-11-05'});
var ecs = new aws.ECS({apiVersion: '2014-11-13'});

// Check if the given key suffix matches a suffix in the whitelist. Return true if it matches, false otherwise.
exports.checkS3SuffixWhitelist = function(key, whitelist) {
    if(!whitelist){ return true; }
    if(typeof whitelist == 'string'){ return key.match(whitelist + '$') }
    if(Object.prototype.toString.call(whitelist) === '[object Array]') {
        for(var i = 0; i < whitelist.length; i++) {
            if(key.match(whitelist[i] + '$')) { return true; }
        }
        return false;
    }
    console.log(
        'Unsupported whitelist type (' + Object.prototype.toString.call(whitelist) +
        ') for: ' + JSON.stringify(whitelist)
    );
    return false;
};

exports.handler = function(event, context) {
    console.log('Received event:');
    console.log(JSON.stringify(event, null, '  '));

    var config = JSON.parse(fs.readFileSync('config.json', 'utf8'));
    if(!config.hasOwnProperty('s3_key_suffix_whitelist')) {
        config.s3_key_suffix_whitelist = false;
    }
    console.log('Config: ' + JSON.stringify(config));

    var key = event.Records[0].s3.object.key;

    if(!exports.checkS3SuffixWhitelist(key, config.s3_key_suffix_whitelist)) {
        context.fail('Suffix for key: ' + key + ' is not in the whitelist')
    }

    // We can now go on. Put the S3 URL into SQS and start an ECS task to process it.
    async.waterfall([
            function (next) {
                var params = {
                    MessageBody: JSON.stringify(event),
                    QueueUrl: config.queue
                };
                sqs.sendMessage(params, function (err, data) {
                    if (err) { console.warn('Error while sending message: ' + err); }
                    else { console.info('Message sent, ID: ' + data.MessageId); }
                    next(err);
                });
            },
            function (next) {
                // Starts an ECS task to work through the feeds.
                var params = {
                    taskDefinition: config.task,
                    count: 1
                };
                ecs.runTask(params, function (err, data) {
                    if (err) { console.warn('error: ', "Error while starting task: " + err); }
                    else { console.info('Task ' + config.task + ' started: ' + JSON.stringify(data.tasks))}
                    next(err);
                });
            }
        ], function (err) {
            if (err) {
                context.fail('An error has occurred: ' + err);
            }
            else {
                context.succeed('Successfully processed Amazon S3 URL.');
            }
        }
    );
};

The Lambda function uses the Async.js library to make it easier to program the sequence of events to perform in an event-driven language like Node.js. You can install the library by typing npm install async from within the directory where the Lambda function and its configuration file are located.

To upload the function into Lambda, zip the Lambda function, its configuration file, and the node_modules directory with the Async.js library. In the Lambda console, upload the .ZIP file as described in the Node.js for S3 events tutorial.

For this function to perform its job, it needs an IAM role with a policy that allows access to SQS as well as the right to start tasks on ECS. It also should be able to publish log data to CloudWatch Logs. However, it does not need explicit access to S3, because only the ECS task needs to download the source file from and upload the resulting image to S3. Here is an example policy:

{
    "Statement": [
        {
            "Action": [
                "logs:*", 
                "lambda:invokeFunction",
                "sqs:SendMessage",
                "ecs:RunTask"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:logs:*:*:*",
                "arn:aws:lambda:*:*:*:*",
                "arn:aws:sqs:*:*:*",
                "arn:aws:ecs:*:*:*"
            ]
        }
    ],
    "Version": "2012-10-17"
}

For the sake of simplicity in this post, we used very broadly defined resource identifiers like “arn:aws:sqs:*:*:*”, which cover all the resources of the given types. In a real-world scenario, we recommend that you make resource definitions as specific as possible by adding account IDs, queue names, and other resource ARN parameters.

Step 4: Configure S3 bucket notifications
Now you need to set up a bucket notification for your S3 bucket that triggers the Lambda function as soon as a new object is copied into the bucket.

This is a two-step process:

1. Add permission for S3 to be able to call your Lambda function:

$ aws lambda add-permission \
--function-name ecs-pov-ray-worker\
--region \
--statement-id \
--action "lambda:InvokeFunction" \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3::: \
--source-account \
--profile

2. Set up an S3 bucket notification configuration (note: use the ARN for your Lambda function):

$ aws s3api put-bucket-notification-configuration \
--bucket \
--notification-configuration \
'{"LambdaFunctionConfigurations": [{"Events": ["s3:ObjectCreated:*"], "Id": "ECSPOVRayWorker", "LambdaFunctionArn": "arn:aws:lambda:eu-west-1::function:ecs-worker-launcher"}]}'

If you use these CLI commands, remember to substitute your particular name and Lambda function ARN.

Step 5: Create a Docker image
Docker images contain all of the software needed to run your application on Docker, out of Dockerfiles that describe the steps needed to create that image.

In this step, craft a Dockerfile that installs the POV-Ray ray-tracing application from POV (see important licensing information at the top of this post) as well as a simple shell script that uses the AWS CLI to consume messages from SQS, download and unpack the input data from S3 into the local file system, run the ray-tracer on the input file, then upload the resulting image back to S3, and delete the message from the SQS queue.

Start with the shell script, called ecs-worker.sh:

#!/bin/bash

# Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

#
# Simple POV-Ray worker shell script.
#
# Uses the AWS CLI utility to fetch a message from SQS, fetch a ZIP file from S3 that was specified in the message,
# render its contents with POV-Ray, then upload the resulting .png file to the same S3 bucket.
#

region=${AWS_REGION}
queue=${SQS_QUEUE_URL}

# Fetch messages and render them until the queue is drained.
while [ /bin/true ]; do
    # Fetch the next message and extract the S3 URL to fetch the POV-Ray source ZIP from.
    echo "Fetching messages from SQS queue: ${queue}..."
    result=$( \
        aws sqs receive-message \
            --queue-url ${queue} \
            --region ${region} \
            --wait-time-seconds 20 \
            --query Messages[0].[Body,ReceiptHandle] \
        | sed -e 's/^"\(.*\)"$/\1/'\
    )

    if [ -z "${result}" ]; then
        echo "No messages left in queue. Exiting."
        exit 0
    else
        echo "Message: ${result}."

        receipt_handle=$(echo ${result} | sed -e 's/^.*"\([^"]*\)"\s*\]$/\1/')
        echo "Receipt handle: ${receipt_handle}."

        bucket=$(echo ${result} | sed -e 's/^.*arn:aws:s3:::\([^\\]*\)\\".*$/\1/')
        echo "Bucket: ${bucket}."

        key=$(echo ${result} | sed -e 's/^.*\\"key\\":\s*\\"\([^\\]*\)\\".*$/\1/')
        echo "Key: ${key}."

        base=${key%.*}
        ext=${key##*.}

        if [ \
            -n "${result}" -a \
            -n "${receipt_handle}" -a \
            -n "${key}" -a \
            -n "${base}" -a \
            -n "${ext}" -a \
            "${ext}" = "zip" \
        ]; then
            mkdir -p work
            pushd work

            echo "Copying ${key} from S3 bucket ${bucket}..."
            aws s3 cp s3://${bucket}/${key} . --region ${region}

            echo "Unzipping ${key}..."
            unzip ${key}

            if [ -f ${base}.ini ]; then
                echo "Rendering POV-Ray scene ${base}..."
                if povray ${base}; then
                    if [ -f ${base}.png ]; then
                        echo "Copying result image ${base}.png to s3://${bucket}/${base}.png..."
                        aws s3 cp ${base}.png s3://${bucket}/${base}.png
                    else
                        echo "ERROR: POV-Ray source did not generate ${base}.png image."
                    fi
                else
                    echo "ERROR: POV-Ray source did not render successfully."
                fi
            else
                echo "ERROR: No ${base}.ini file found in POV-Ray source archive."
            fi

            echo "Cleaning up..."
            popd
            /bin/rm -rf work

            echo "Deleting message..."
            aws sqs delete-message \
                --queue-url ${queue} \
                --region ${region} \
                --receipt-handle "${receipt_handle}"

        else
            echo "ERROR: Could not extract S3 bucket and key from SQS message."
        fi
    fi
done

Remember to run chmod +x ecs-worker.sh to make the shell script executable. This permission is copied over to the Docker image you create in the next step, which is to put together a Dockerfile that includes all you need to set up the POV-Ray software and the AWS CLI in addition to your script:

# POV-Ray Amazon ECS Worker

FROM ubuntu:14.04

MAINTAINER FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM>

# Libraries and dependencies

RUN \
  apt-get update && apt-get -y install \
  autoconf \
  build-essential \
  git \
  libboost-thread-dev \
  libjpeg-dev \
  libopenexr-dev \
  libpng-dev \
  libtiff-dev \
  python \
  python-dev \
  python-distribute \
  python-pip \
  unzip \
  zlib1g-dev

# Compile and install POV-Ray

RUN \
  mkdir /src && \
  cd /src && \
  git clone https://github.com/POV-Ray/povray.git && \
  cd povray && \
  git checkout origin/3.7-stable && \
  cd unix && \
  sed 's/automake --w/automake --add-missing --w/g' -i prebuild.sh && \
  sed 's/dist-bzip2/dist-bzip2 subdir-objects/g' -i configure.ac && \
  ./prebuild.sh && \
  cd .. && \
  ./configure COMPILED_BY="FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM>" LIBS="-lboost_system -lboost_thread" && \
  make && \
  make install

# Install AWS CLI

RUN \
  pip install awscli

WORKDIR /

COPY ecs-worker.sh /

CMD [ "./ ecs-worker.sh" ]

Substitute your own name and email address into the COMPILED_BY parameter when using this Dockerfile. Also, this file assumes that you have the ecs-worker.sh script in your current directory when you create the Docker image.

After making sure the shell script is in the local directory and setting up the Dockerfile (and your account/credentials with Docker Hub), you can create the Docker image using the following commands:

$ docker build -t <DOCKERHUB_USER>/<DOCKERHUB_REPOSITORY>:<TAG> .
$ docker login -u <DOCKERHUB_USER> -e <DOCKERHUB_EMAIL>
$ docker push <TAG>

Step 6: Create an ECS task definition
Now that you have a Docker image ready to go, you can create an ECS task definition:

{
    "containerDefinitions": [
        {
            "name": "ECSPOVRayWorker",
            "image": "<DOCKERHUB_USER>/<DOCKERHUB_REPOSITORY>:<TAG>",
            "cpu": 512,
            "environment": [
                {
                    "name": "AWS_REGION",
                    "value": "<YOUR-CHOSEN-AWS-REGION>"
                },
                {
                    "name": "SQS_QUEUE_URL",
                    "value": "https://<YOUR_REGION>.queue.amazonaws.com/<YOUR_AWS_ACCOUNT_ID>/ECSPOVRayWorkerQueue"
                }
            ],
            "memory": 512,
            "essential": true
        }
    ],
    "family": "ECSPOVRayWorkerTask"
}

When using this example task definition file, remember to substitute your own values for the Dockerhub user, repository, and tag as well as your chosen AWS region and SQS queue ARN.

Now, you’re ready to register your task definition with ECS:

$ aws ecs register-task-definition –cli-input-json file://task-definition-file.json

Remember, the ECS task family name “ECSPOVRayWorkerTask” corresponds to the “task” attribute of your Lambda function’s configuration file. This is how Lambda knows which ECS task to start upon invocation; if you decide to name your ECS task definition differently, also remember to update the Lambda function’s configuration file accordingly.

Step 7: Add a policy to your ECS instance role that allows access to SQS and S3
Your SQS queue worker script running inside your Docker container on ECS needs some permissions to fetch messages from SQS, download and upload files to/from S3 and to delete messages from SQS when it’s done.

The following example policy shows the permissions needed for this application, in addition to the standard ECS-related permissions:

{
    "Statement": [
        {
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::/*"
        }
    ],
    "Version": "2012-10-17"
}

You can attach this policy to your ECS instance role or work the missing policy statements into your existing ECS instance role. The former option is preferred as it lets you manage different policies for different tasks in separate policy documents.

Step 8: Test your new ray-tracing service
You’re now ready to test the new Lambda/Docker worker pattern for rendering ray-traced images!

To help you test it, we have provided you with a ready to use .ZIP file, containing a sample POV-Ray scene that you can download from the awslabs/lambda-ecs-worker-pattern GitHub repository.

Upload the .ZIP file to your S3 bucket and, after a few minutes, you should see the final rendered image appear in the same bucket, looking like this:

If it doesn’t work out of the box, don’t panic! Here are some hints on how to debug this scenario:

  • Use the Amazon CloudWatch Logs console and look for errors generated by Lambda. Check out the Troubleshooting section of the AWS Lambda documentation.
  • Log into your ECS container node(s) and use the docker ps -a command to identify the Docker container that was running your application. Use the docker logs command to look for errors. Check out the Troubleshooting section of the ECS documentation.
  • If the Docker container is still running, you can log into it using the docker exec -it /bin/bash command and see what’s going on as it happens.

All in one go

To make setup even easier, we have put together a Python Fabric script that handles all of the above tasks for you. Fabric is a Python module that makes it easy to run commands on remote nodes, transfer files over SSH, run commands locally, and structure your script in a manner similar to a makefile.

You can download the script and Python fabfile that can help set this up, along with instructions, from the awslabs/lambda-ecs-worker-pattern repository on GitHub.

Further considerations

This example is intentionally simple and generic so you can adapt it to a wide variety of situations. When implementing this pattern for your own projects, you may want to consider additional issues.

In this pattern, each Lambda function launches its own ECS container to process the event. When many events occur, multiple containers are launched and this may not be what you want; a single running ECS task can continue processing messages from SQS until the queue is empty. Consider scaling the number of running ECS tasks independently of the number of Lambda function invocations. For more information, see Scaling Amazon ECS Services Automatically Using Amazon CloudWatch and AWS Lambda.

This approach is not limited to launching ECS containers; you can use Lambda to launch any other AWS service or resource, including Amazon Elastic Transcoder jobs, Amazon Simple Workflow Service executions, or AWS Data Pipeline jobs.

Combining ECS tasks with SQS is a very simple, but powerful batch worker pattern. You can use it even without Lambda: whenever you want to get a piece of long-running batch work done, write its parameters into an SQS queue and launch an ECS task in the background, while your application continues normally.

This pattern uses SQS to buffer the full S3 bucket notification event for the ECS task to pick it up. In cases where the parameters to be forwarded to ECS are short and simple (a URL, file name, or simple data structure), you can wrap them into environment variables and specify them as overrides in the run-task operation. This means that for simple parameters, you can stop using SQS altogether.

This pattern can also save on costs. Many traditional computing tasks need a specialized software installation (like the POV-Ray rendering software in this example) but are only used intermittently (such as one time per day or per week) for less than an hour. Keeping machines or fleets for such specialized tasks can create waste because they may not reach a significant use level.
Using ECS, you can share the hardware infrastructure for your batch worker fleets among very different specific worker implementations (a ray-tracing application, ETL process, document converter, etc.). This allows you to drive higher use of your generic ECS fleet while it performs very different tasks, through the ability to run different container images on the same infrastructure. You need fewer hardware resources to accommodate a wide variety of tasks.

Conclusion

This pattern is widely applicable. Many applications that can be seen as batch-driven workers can be implemented as ECS tasks that can be started from a Lambda function, using SQS to buffer parameters.

Look at your infrastructure and try to identify underused EC2 worker instances that could be re-implemented as ECS tasks; run them on a smaller, more efficient footprint and keep all of their functionality. Re-visit event-driven cases where you may have dismissed Lambda before, and try to apply the techniques outlined in this post in order to expand the usefulness of Lambda into more use cases with more complex execution requirements.

We hope you found this post useful and look forward to your comments about where you plan to implement this in your current and future projects.