Category: AWS Lambda


Dynamic Scaling with EC2 Spot Fleet

Tipu Qureshi Tipu Qureshi, AWS Senior Cloud Support Engineer

The RequestSpotFleet API allows you to launch and manage an entire fleet of EC2 Spot Instances with one request. A fleet is a collection of Spot Instances that are all working together as part of a distributed application and providing cost savings. With the ModifySpotFleetRequest API, it’s possible to dynamically scale a Spot fleet’s target capacity according to changing capacity requirements over time. Let’s look at a batch processing application that is utilizing Spot fleet and Amazon SQS as an example. As discussed in our previous blog post on Additional CloudWatch Metrics for Amazon SQS and Amazon SNS, you can scale up when the ApproximateNumberOfMessagesVisible SQS metric starts to grow too large for one of your SQS queues, and scale down once it returns to a more normal value.

There are multiple ways to accomplish this dynamic scaling. As an example, a script can be scheduled (e.g. via cron) to get the value of the ApproximateNumberOfMessagesVisible SQS metric periodically and then scale the Spot fleet according to defined thresholds. The current size of the Spot fleet can be obtained using the DescribeSpotFleetRequests API and the scaling can be carried out by using the new ModifySpotFleetRequest API. A sample script written for NodeJS is available here, and following is a sample IAM policy for an IAM role that could be used on an EC2 instance for running the script:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1441252157702",
      "Action": [
        "ec2:DescribeSpotFleetRequests",
        "ec2:ModifySpotFleetRequest",
        "cloudwatch:GetMetricStatistics"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}/

By leveraging the IAM role on an EC2 instance, the script uses the AWS API methods described above to scale the Spot fleet dynamically. You can configure variables such as the Spot fleet request, SQS queue name, SQS metric thresholds and instance thresholds according to your application’s needs. In the example configuration below we have set the minimum number of instances threshold (minCount) at 2 to ensure that the instance count for the spot fleet never goes below 2. This is to ensure that a new job is still processed immediately after an extended period with no batch jobs.

// Sample script for Dynamically scaling Spot Fleet
// define configuration
var config = {
    spotFleetRequest:'sfr-c8205d41-254b-4fa9-9843-be06585e5cda', //Spot Fleet Request Id
    queueName:'demojobqueue', //SQS queuename
    maxCount:100, //maximum number of instances
    minCount:2, //minimum number of instances
    stepCount:5, //increment of instances
    scaleUpThreshold:20, //CW metric threshold at which to scale up
    scaleDownThreshold:10, //CW metric threshold at which to scale down
    period:900, //period in seconds for CW
    region:'us-east-1' //AWS region
};

// dependencies
var AWS = require('aws-sdk');
var ec2 = new AWS.EC2({region: config.region, maxRetries: 5});
var cloudwatch = new AWS.CloudWatch({region: config.region, maxRetries: 5});

console.log ('Loading function');
main();

//main function
function main() {
//main function
    var now = new Date();
    var startTime = new Date(now - (config.period * 1000));
    console.log ('Timestamp: '+now);
    var cloudWatchParams = {
        StartTime: startTime,
        EndTime: now,
        MetricName: 'ApproximateNumberOfMessagesVisible',
        Namespace: 'AWS/SQS',
        Period: config.period,
        Statistics: ['Average'],
        Dimensions: [
            {
                Name: 'QueueName',
                Value: config.queueName,
            },
        ],
        Unit: 'Count'
    };
    cloudwatch.getMetricStatistics(cloudWatchParams, function(err, data) {
        if (err) console.log(err, err.stack); // an error occurred
        else {
            //set Metric Variable
            var metricValue = data.Datapoints[0].Average;
            console.log ('Cloudwatch Metric Value is: '+ metricValue);
            var up = 1;
            var down = -1;
            // check if scaling is required
            if (metricValue = config.scaleDownThreshold)
                console.log ("metric not breached for scaling action");
            else if (metricValue >= config.scaleUpThreshold)
                scale(up); //scaleup
            else
                scale(down); //scaledown
        }
    });
};

//defining scaling function
function scale (direction) {
    //adjust stepCount depending upon whether we are scaling up or down
    config.stepCount = Math.abs(config.stepCount) * direction;
    //describe Spot Fleet Request Capacity
    console.log ('attempting to adjust capacity by: '+ config.stepCount);
    var describeParams = {
        DryRun: false,
        SpotFleetRequestIds: [
            config.spotFleetRequest
        ]
    };
    //get current fleet capacity
    ec2.describeSpotFleetRequests(describeParams, function(err, data) {
        if (err) {
            console.log('Unable to describeSpotFleetRequests: ' + err); // an error occurred
            return 'Unable to describeSpotFleetRequests';
        }
        //set current capacity variable
        var currentCapacity = data.SpotFleetRequestConfigs[0].SpotFleetRequestConfig.TargetCapacity;
        console.log ('current capacity is: ' + currentCapacity);
        //set desired capacity variable
        var desiredCapacity = currentCapacity + config.stepCount;
        console.log ('desired capacity is: '+ desiredCapacity);
        //find out if the spot fleet is already modifying
        var fleetModifyState = data.SpotFleetRequestConfigs[0].SpotFleetRequestState;
        console.log ('current state of the the spot fleet is: ' + fleetModifyState);
        //only proceed forward if  maxCount or minCount hasn't been reached
        //or spot fleet isn't being currently modified.
        if (fleetModifyState == 'modifying')
            console.log ('capacity already at min, max or fleet is currently being modified');
        else if (desiredCapacity  config.maxCount)
            console.log ('capacity already at max count');
        else {
            console.log ('scaling');
            var modifyParams = {
                SpotFleetRequestId: config.spotFleetRequest,
                TargetCapacity: desiredCapacity
            };
            ec2.modifySpotFleetRequest(modifyParams, function(err, data) {
                if (err) {
                    console.log('unable to modify spot fleet due to: ' + err);
                }
                else {
                    console.log('successfully modified capacity to: ' + desiredCapacity);
                    return 'success';
                }
            });
        }
    });
}

You can modify this sample script to meet your application’s requirements.

You could also leverage AWS Lambda for dynamically scaling your Spot fleet. As depicted in the diagram below, an AWS Lambda function can be scheduled (e.g using AWS datapipeline, cron or any form of scheduling) to get the ApproximateNumberOfMessagesVisible SQS metric for the SQS queue in a batch processing application. This Lambda function will check the current size of a Spot fleet using the DescribeSpotFleetRequests API, and then scale the Spot fleet using the ModifySpotFleetRequest API after also checking certain constraints such as the state or size of the Spot fleet similar to the script discussed above.

Dynamic Spot Fleet Scaling Architecture

You could also use the sample IAM policy provided above to create an IAM role for the AWS Lambda function. A sample Lambda deployment package for dynamically scaling a Spot fleet based on the value of the ApproximateNumberOfMessagesVisible SQS metric can be found here. However, you could modify it to use any CloudWatch metric based on your use case. The sample script and Lambda function provided are only for reference and should be tested before using in a production environment.

AWS Lambda sessions at re:Invent 2015

Ajay Nair Ajay Nair, Sr. Product Manager, AWS Lambda

 
If you will be attending re:Invent 2015 in Las Vegas next week, you will have many opportunities to learn more about building applications using AWS Lambda. The Lambda team will be presenting multiple sessions covering new features, as well as deep dives on using Lambda for data processing and backend workloads (Click any of the following links to learn more about a breakout session)

 
We will also be joined by customers presenting their own experiences with using Lambda in production systems.

 
Want to go hands on? Sign up for a workshop!

 
Want to chat with the experts and get your questions answered? Stop by at the Compute Booth and New Services booth on the display floor, or stop by at the developer lounge session below:

 
Didn’t register before the conference sold out? You can still sign up below to watch Live Streams of the keynotes and select breakout sessions. All sessions will be recorded and made available on YouTube after the conference. Also, all slide decks from the sessions will be made available on SlideShare.net after the conference.

 
See you in Vegas!

A Simple Serverless Test Harness using AWS Lambda

Tim Wagner Tim Wagner, AWS Lambda General Manager

You can easily test a Lambda function locally by using either a nodejs runner or JUnit for Java. But once your function is live in Lambda, how do you test it?

One option is to create an API for it using Amazon API Gateway and then employ one of the many HTTP-based test harnesses out there. While that certainly works, in this article we’ll look at using Lambda itself as a simple (and serverless!) test platform. We’ll look at testing two categories – other Lambda functions and HTTPS endpoints – but you can apply these techniques to testing anything you want.

Unit Testing

In the Microservices without the Servers blog post we looked briefly at testing the image processing service we built using Lambda itself. The idea was simple: We create a “unit” test that calls another Lambda function (the functionality being tested) and records its output in DynamoDB. We can record the actual response of calling the function under test or just summarize whether it succeeded or failed. Other information, such as performance data (running time, memory consumed, etc.) can of course be added.

AWS Lambda includes a unit and load test harness blueprint, making it easy to create a custom harness. By default, the blueprint runs the function under test and records the response in a predetermined DynamoDB table. You can easily customize how results are recorded (e.g., sending them to Amazon SQS, Amazon Kinesis, or Amazon S3 instead of DymamoDB) and what is recorded (success, actual results, performance or other environmental attributes of running the function, etc.)

Using this blueprint is easy: It reads a simple JSON format to tell it what to call:

{
  "operation": "unit",
  "function": <Name of the Lambda function under test>,
  "resultsTable": "unit-test-results",
  "testId": "MyTestRun",
  "event": {
    /* The event you want the function under test to process goes here. */
  }
}

One of the nice things about this approach is that you don’t have to worry about infrastructure at any point in the testing process. The economics of Lambda functions work to your advantage: after the test runs, you’re not left owning a bunch of resources, and you don’t have to spend any energy spinning up or shutting down infrastructure. When you’re done with the results in DynamoDB, you can simply delete the corresponding rows or replace them with a single, summary row. Until then, you have the full power of a managed NoSQL database to access and manipulate your test run results.

The Lambda blueprint itself is intentionally very simple. You can add an existing test library if you want more structure, or just instantiate the blueprint and modify it for each function under test.

HTTPS Endpoint Testing

Lambda also includes a blueprint for invocating HTTPS endpoints, such as those created by Amazon API Gateway. You can combine these two blueprints to create a unit tester for any HTTPS service, including APIs created with API Gateway. Here’s an example of the unit test JSON from our image service test:

{
  "operation": "unit",
  "function": "HTTPSInvoker",
  "resultsTable": "unit-test-results",
  "testId": "LinuxConDemo",
  "event": {
    "options": {
      "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com",
      "path": "/prod/ImageProcessingService",
      "method": "POST"
    },
    "data": {
      "operation": "getSample"
    }
  }
}

Load Testing

Lambda’s test harness blueprint can run in either of two modes: Unit and Load. Load testing enables you to use Lambda’s scalability to your advantage, running multiple unit tests asynchronously. As before, each unit test records its result in DynamoDB, so you can easily do some simple scale testing, all without standing up infrastructure. Simple queries in DynamoDB enable you to validate how many tests completed (and completed successfully). To do more elaborate post-hoc analysis, you can also hook up a Lambda function to DynamoDB as an event handler, allowing you to count results, perform additional validation, etc.

Here’s an example recreated from the image processing blog article that uses the unit and load test harness again, this time in “load test” mode:

{
  "operation": "load",
  "iterations": 100,
  "function": "TestHarness",
  "event": {
    "operation": "unit",
    "function": "HTTPSInvoker",
    "resultsTable": "unit-test-results",
    "testId": "LinuxConLoadTestDemo",
    "event": {
      "options": {
        "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com",
        "path": "/prod/ImageProcessingService",
        "method": "POST"
      },
      "data": {
        "operation": "getSample"
      }
    }
  }
}

From the outside in, this JSON file

  1. Instructs load tester to run 100 copies of…
  2. The unit test, which runs the HTTPS invoker, recording the result in a DynamoDB table named “unit-test-results”
  3. The HTTPS invoker POSTs the “getSample” operation request to the API Gateway endpoint
  4. …which ultimately calls the image processing microservice, implemented as a Lambda function.

Summary

In this article we took a deeper look at serverless testing and how Lambda itself can be used as a testing platform. We saw how the test harness blueprint can be used for both unit and load testing and how the approach can be customized to the particular needs of your functions and test regimen.

Until next time, happy Lambda coding (and serverless testing)!

-Tim
Follow Tim’s Lambda adventures on Twitter

Microservices without the Servers

Tim Wagner Tim Wagner, AWS Lambda General Manager

At LinuxCon/ContainerCon 2015 I presented a demo-driven talk titled, “Microservices without the Servers”. In it, I created an image processing microservice, deployed it to multiple regions, built a mobile app that used it as a backend, added an HTTPS-based API using Amazon API Gateway and a website, and then unit and load tested it, all without using any servers.

This blog recreates the talk in detail, stepping you through all the pieces necessary for each of these steps and going deeper into the architecture. For a high-level overview, check out the slides. For another example of this architecture, check out the executable gist repository, SquirrelBin.

Serverless Architecture

By “serverless”, we mean no explicit infrastructure required, as in: no servers, no deployments onto servers, no installed software of any kind. We’ll use only managed cloud services and a laptop. The diagram below illustrates the high-level components and their connections: a Lambda function as the compute (“backend”) and a mobile app that connects directly to it, plus Amazon API Gateway to provide an HTTP endpoint for a static Amazon S3-hosted website.

A Serverless Architecture for Mobile and Web Apps Using AWS Lambda

A Serverless Architecture for Mobile and Web Apps Using AWS Lambda

Now, let’s start building!

Step 1: Create the Image Processing Service

To make this a little easier to follow along, we’re going to use a library that comes built in with Lambda’s nodejs language: ImageMagick. However, that’s not required – if you prefer to use your own library instead, you can load JavaScript or native libraries, run Python, or even wrap wrap a command line executable. The examples below are implemented in nodejs, but you can also build this service using Java, Clojure, Scala, or other jvm-based languages in AWS Lambda.

The code below is a sort of “hello world” program for ImageMagick – it gives us a basic command structure (aka a switch statement) and enables us to retrieve the built-in rose image and return it. Apart from encoding the result so it can live happily in JSON, there’s not much to this.

var im = require("imagemagick");
var fs = require("fs");
exports.handler = function(event, context) {
    if (event.operation) console.log("Operation " + event.operation + " requested");
    switch (event.operation) {
        case 'ping': context.succeed('pong'); return;
        case 'getSample':
            event.customArgs = ["rose:", "/tmp/rose.png"];
            im.convert(event.customArgs, function(err, output) {
                if (err) context.fail(err);
                else {
                    var resultImgBase64 = new Buffer(fs.readFileSync("/tmp/rose.png")).toString('base64');
                    try {fs.unlinkSync("/tmp/rose.png");} catch (e) {} // discard
                    context.succeed(resultImgBase64);
                }
            });
            break; // allow callback to complete
        default:
            var error = new Error('Unrecognized operation "' + event.operation + '"');
            context.fail(error);
            return;
    }
};

First, let’s make sure the service is running by sending it the following JSON in the AWS Lambda console’s test window:

{
  "operation": "ping"
}

You should get the requisite “pong” response. Next, we’ll actually invoke ImageMagick by sending JSON that looks like this:

{
  "operation": "getSample"
}

This request retrieves a base64-encoded string representing a PNG version of a picture of a rose: “”iVBORw0KGg…Jggg==”. To make sure this isn’t just some random characters, copy-paste it (sans double quotes) into any convenient Base64-to-image decoder, such as codebeautify.org/base64-to-image-converter. You should see a nice picture of a rose:

Sample Rose Image

Sample Image (red rose)

Now, let’s complete the image processing service by exposing the rest of the nodejs wrapper around it. We’re going to offer a few different operations:

  • ping: Verify service is available.
  • getDimensions: Shorthand for calling identify operation to retrieve width and height of an image.
  • identify: Retrieve image metadata.
  • resize: A convenience routine for resizing (which calls convert under the covers)
  • thumbnail: A synonym for resize.
  • convert: The “do-everything” routine – can convert media formats, apply transforms, resize, etc.
  • getSample: Retrieve a sample image; the “hello world” operation.

Most of the code is extremely straightforward wrapping of the nodejs ImageMagick routines, some of which take JSON (in which case the event passed in to Lambda is cleaned up and forwarded along) and others of which take command line (aka “custom”) arguments, which are passed in as a string array. The one part of this that might be non-obvious if you haven’t used ImageMagick before is that it works as a wrapper over the command line, and the names of files have semantic meaning. We have two competing needs: We want the client to convey the semantics (e.g., the output format of an image, such as PNG versus JPEG) but we want the service author to determine where to place the temporary storage on disk so we don’t leak implementation details. To accomplish both at once, we define two arguments in the JSON schema: “inputExtension” and “outputExtension”, and then we build the actual file location by combining the client’s portion (file extension) with the server’s portion (directory and base name). You can see (and use!) the completed code in the image processing blueprint.

There are lots of tests you can run here (and we’ll do more later), but as a quick sanity check, retrieve the sample rose image again and the pass it back in using a negation (color inversion) filter. You can use JSON like this in the Lambda console, just replace the base64Image field with the actual image characters (it’s a little long to include here in the blog page).

{
  "operation": "convert",
  "customArgs": [
    "-negate"
  ],
  "outputExtension": "png",
  "base64Image": "...fill this in with the rose sample image, base64-encoded..."
}

The output, decoded as an image, should be that elusive botanical rarity, a blue rose:

Blue Rose Image

Blue Rose (negative of red rose sample image)

So that’s all there is to the functional aspect of the service. Normally, this is where it would start to get ugly, going from “worked once” to “scalable and reliable service with 24x7x365 monitoring and production logging”. But that’s the beauty of Lambda: our image processing code is already a fully deployed, production strength microservice. Next, let’s add a mobile app that can call it…

Step 2: Create a Mobile Client

Our image processing microservice can be accessed in a number of ways, but to demonstrate a sample client, we’ll build a quick Android app. Below I’m showing the client-side code that we used in the ContainerCon talk to create a simple Android app that let’s you pick an image and a filter and then displays the effect of applying the filter to the image by calling the “convert” operation in the image processing service that’s now running in AWS Lambda.

To get a sense of what the app does, here’s one of its sample images, the AWS Lambda Icon:

Lambda Icon Image

Android Emulator Displaying the AWS Lambda Icon Image

We’ll pick the “negate” filter to invert the colors in the icon:

Negate Selection

Selecting the ‘Negate’ Image Conversion Filter

..and here’s the result: A blue version of our (originally orange) Lambda moniker:

Negated Icon Result

Result of Applying the ‘Negate’ Filter to the AWS Lambda Icon

We could also give an old-world feel to the modern Seattle skyline by choosing the Seattle image and aplying a sepia-tone filter:

Sepia-toned Seattle Skyline

A Sepia-toned Seattle Skyline

Now on to the code. I’m not trying to teach basic Android programming here, so I’ll just focus on the Lambda-specific elements of this app. (If you’re creating your own, you’ll also need to include the AWS Mobile SDK jar to run the sample code below.) Conceptually there are four parts:

  1. POJO Data Schema
  2. Remote Service (Operation) Definition
  3. Initialization
  4. Service Invocation

We’ll take a look at each one in turn.

The data schema defines any objects that need to be passed between client and server. There are no “Lambda-isms” here; these objects are just POJOs (Plain Old Java Objects) with no special libraries or frameworks. We define a base event and then extend it to reflect our operation structure – you can think of this as the “Javaification” of the JSON we used when defining and testing the image processing service above. If you were also writing the server in Java, you’d typically share these files as part of the common event structure definition; in our example, these POJOs turn into JSON on the server side.

LambdaEvent.java

package com.amazon.lambda.androidimageprocessor.lambda;
public class LambdaEvent {
    private String operation;
    public String getOperation() {return operation;}
    public void setOperation(String operation) {this.operation = operation;}
    public LambdaEvent(String operation) {setOperation(operation);}
}

ImageConvertRequest.java

package com.amazon.lambda.androidimageprocessor.lambda;
import java.util.List;
public class ImageConvertRequest extends LambdaEvent {
    private String base64Image;
    private String inputExtension;
    private String outputExtension;
    private List customArgs;
    public ImageConvertRequest() {super("convert");}
    public String getBase64Image() {return base64Image;}
    public void setBase64Image(String base64Image) {this.base64Image = base64Image;}
    public String getInputExtension() {return inputExtension;}
    public void setInputExtension(String inputExtension) {this.inputExtension = inputExtension;}
    public String getOutputExtension() {return outputExtension;}
    public void setOutputExtension(String outputExtension) {this.outputExtension = outputExtension;}
    public List getCustomArgs() {return customArgs;}
    public void setCustomArgs(List customArgs) {this.customArgs = customArgs;}
}

So far, not very complicated. Now that we have a data model, we’ll define the service endpoint using some Java annotations. We’re exposing two operations here, “ping” and “convert”; it would be easy to extend this to include the others as well, but we don’t need them for the sample app below.

ILambdaInvoker.java

package com.amazon.lambda.androidimageprocessor.lambda;
import com.amazonaws.mobileconnectors.lambdainvoker.LambdaFunction;
import java.util.Map;
public interface ILambdaInvoker {
    @LambdaFunction(functionName = "ImageProcessor")
    String ping(Map event);
    @LambdaFunction(functionName = "ImageProcessor")
    String convert(ImageConvertRequest request);
}

Now we’re ready to do the main part of the app. Much of this is boilerplate Android code or simple client-side resource management, but I’ll point out a couple of sections that are Lambda related:

This is the “init” section; it creates the authentication provider to call the Lambda APIs and creates a Lambda invoker capable of calling the endpoints defined above and transmitting the POJOs in our data model:


        // Create an instance of CognitoCachingCredentialsProvider
        CognitoCachingCredentialsProvider cognitoProvider = new CognitoCachingCredentialsProvider(
                this.getApplicationContext(), "us-east-1:<YOUR COGNITO IDENITY POOL GOES HERE>", Regions.US_EAST_1);

        // Create LambdaInvokerFactory, to be used to instantiate the Lambda proxy.
        LambdaInvokerFactory factory = new LambdaInvokerFactory(this.getApplicationContext(),
                Regions.US_EAST_1, cognitoProvider);

        // Create the Lambda proxy object with a default Json data binder.
        lambda = factory.build(ILambdaInvoker.class);

The other code section that’s interesting (well, sort of) is the actual remote procedure call itself:


                try {
                    return lambda.convert(params[0]);
                } catch (LambdaFunctionException e) {
                    Log.e("Tag", "Failed to convert image");
                    return null;
                }

It’s actually not that interesting because the magic (argument serialization and result deserialization) is happening behind the scenes, leaving just some error handling to be done here.

Here’s the complete source file:

MainActivity.java

package com.amazon.lambda.androidimageprocessor;

import android.app.Activity;
import android.app.ProgressDialog;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.AsyncTask;
import android.os.Bundle;
import android.util.Base64;
import android.util.Log;
import android.view.View;
import android.widget.ImageView;
import android.widget.Spinner;
import android.widget.Toast;

import com.amazon.lambda.androidimageprocessor.lambda.ILambdaInvoker;
import com.amazon.lambda.androidimageprocessor.lambda.ImageConvertRequest;
import com.amazonaws.auth.CognitoCachingCredentialsProvider;
import com.amazonaws.mobileconnectors.lambdainvoker.LambdaFunctionException;
import com.amazonaws.mobileconnectors.lambdainvoker.LambdaInvokerFactory;
import com.amazonaws.regions.Regions;

import java.io.ByteArrayOutputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;

public class MainActivity extends Activity {

    private ILambdaInvoker lambda;
    private ImageView selectedImage;
    private String selectedImageBase64;
    private ProgressDialog progressDialog;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        // Create an instance of CognitoCachingCredentialsProvider
        CognitoCachingCredentialsProvider cognitoProvider = new CognitoCachingCredentialsProvider(
                this.getApplicationContext(), "us-east-1:2a40105a-b330-43cf-8d4e-b647d492e76e", Regions.US_EAST_1);

        // Create LambdaInvokerFactory, to be used to instantiate the Lambda proxy.
        LambdaInvokerFactory factory = new LambdaInvokerFactory(this.getApplicationContext(),
                Regions.US_EAST_1, cognitoProvider);

        // Create the Lambda proxy object with a default Json data binder.
        lambda = factory.build(ILambdaInvoker.class);

        // ping lambda function to make sure everything is working
        pingLambda();
    }

    // ping the lambda function
    @SuppressWarnings("unchecked")
    private void pingLambda() {
        Map event = new HashMap();
        event.put("operation", "ping");

        // The Lambda function invocation results in a network call.
        // Make sure it is not called from the main thread.
        new AsyncTask<Map, Void, String>() {
            @Override
            protected String doInBackground(Map... params) {
                // invoke "ping" method. In case it fails, it will throw a
                // LambdaFunctionException.
                try {
                    return lambda.ping(params[0]);
                } catch (LambdaFunctionException lfe) {
                    Log.e("Tag", "Failed to invoke ping", lfe);
                    return null;
                }
            }

            @Override
            protected void onPostExecute(String result) {
                if (result == null) {
                    return;
                }

                // Display a quick message
                Toast.makeText(MainActivity.this, "Made contact with AWS lambda", Toast.LENGTH_LONG).show();
            }
        }.execute(event);
    }

    // event handler for "process image" button
    public void processImage(View view) {
        // no image has been selected yet
        if (selectedImageBase64 == null) {
            Toast.makeText(this, "Please tap one of the images above", Toast.LENGTH_LONG).show();
            return;
        }

        // get selected filter
        String filter = ((Spinner) findViewById(R.id.filter_picker)).getSelectedItem().toString();
        // assemble new request
        ImageConvertRequest request = new ImageConvertRequest();
        request.setBase64Image(selectedImageBase64);
        request.setInputExtension("png");
        request.setOutputExtension("png");

        // custom arguments per filter
        List customArgs = new ArrayList();
        request.setCustomArgs(customArgs);
        switch (filter) {
            case "Sepia":
                customArgs.add("-sepia-tone");
                customArgs.add("65%");
                break;
            case "Black/White":
                customArgs.add("-colorspace");
                customArgs.add("Gray");
                break;
            case "Negate":
                customArgs.add("-negate");
                break;
            case "Darken":
                customArgs.add("-fill");
                customArgs.add("black");
                customArgs.add("-colorize");
                customArgs.add("50%");
                break;
            case "Lighten":
                customArgs.add("-fill");
                customArgs.add("white");
                customArgs.add("-colorize");
                customArgs.add("50%");
                break;
            default:
                return;
        }

        // async request to lambda function
        new AsyncTask() {
            @Override
            protected String doInBackground(ImageConvertRequest... params) {
                try {
                    return lambda.convert(params[0]);
                } catch (LambdaFunctionException e) {
                    Log.e("Tag", "Failed to convert image");
                    return null;
                }
            }

            @Override
            protected void onPostExecute(String result) {
                // if no data was returned, there was a failure
                if (result == null || Objects.equals(result, "")) {
                    hideLoadingDialog();
                    Toast.makeText(MainActivity.this, "Processing failed", Toast.LENGTH_LONG).show();
                    return;
                }
                // otherwise decode the base64 data and put it in the selected image view
                byte[] imageData = Base64.decode(result, Base64.DEFAULT);
                selectedImage.setImageBitmap(BitmapFactory.decodeByteArray(imageData, 0, imageData.length));
                hideLoadingDialog();
            }
        }.execute(request);

        showLoadingDialog();
    }

    /*
    Select methods for each image
     */

    public void selectLambdaImage(View view) {
        selectImage(R.drawable.lambda);
        selectedImage = (ImageView) findViewById(R.id.static_lambda);
        Toast.makeText(this, "Selected image 'lambda'", Toast.LENGTH_LONG).show();
    }

    public void selectSeattleImage(View view) {
        selectImage(R.drawable.seattle);
        selectedImage = (ImageView) findViewById(R.id.static_seattle);
        Toast.makeText(this, "Selected image 'seattle'", Toast.LENGTH_LONG).show();
    }

    public void selectSquirrelImage(View view) {
        selectImage(R.drawable.squirrel);
        selectedImage = (ImageView) findViewById(R.id.static_squirrel);
        Toast.makeText(this, "Selected image 'squirrel'", Toast.LENGTH_LONG).show();
    }

    public void selectLinuxImage(View view) {
        selectImage(R.drawable.linux);
        selectedImage = (ImageView) findViewById(R.id.static_linux);
        Toast.makeText(this, "Selected image 'linux'", Toast.LENGTH_LONG).show();
    }

    // extract the base64 encoded data of the drawable resource `id`
    private void selectImage(int id) {
        Bitmap bmp = BitmapFactory.decodeResource(getResources(), id);
        ByteArrayOutputStream stream = new ByteArrayOutputStream();
        bmp.compress(Bitmap.CompressFormat.PNG, 100, stream);
        selectedImageBase64 = Base64.encodeToString(stream.toByteArray(), Base64.DEFAULT);
    }

    // reset images to their original state
    public void reset(View view) {
        ((ImageView) findViewById(R.id.static_lambda)).setImageDrawable(getResources().getDrawable(R.drawable.lambda, getTheme()));
        ((ImageView) findViewById(R.id.static_seattle)).setImageDrawable(getResources().getDrawable(R.drawable.seattle, getTheme()));
        ((ImageView) findViewById(R.id.static_squirrel)).setImageDrawable(getResources().getDrawable(R.drawable.squirrel, getTheme()));
        ((ImageView) findViewById(R.id.static_linux)).setImageDrawable(getResources().getDrawable(R.drawable.linux, getTheme()));

        Toast.makeText(this, "Please choose from one of these images", Toast.LENGTH_LONG).show();
    }

    private void showLoadingDialog() {
        progressDialog = ProgressDialog.show(this, "Please wait...", "Processing image", true, false);
    }

    private void hideLoadingDialog() {
        progressDialog.dismiss();
    }
}

That’s it for the mobile app: a data model (aka Java class), a control model (aka a couple of methods), three statements to initialize things, and then a remote call with a try/catch block around it…easy stuff.

Multi-region Deployments

So far we haven’t said much about where this code runs. Lambda takes care of deploying your code within a region, but you have to decide in which region(s) you’d like to run it. In my original demo, I built the function initially in the us-east-1 region, aka the Virginia data center. To make good on the claim in the abstract that we’d build a global service, let’s extend that to include eu-west-1 (Ireland) and ap-northeast-1 (Tokyo) so that mobile apps can connect from around the globe with low latency:

Cross-region Auto-deployment from Amazon S3 with an AWS Lambda function

A Serverless Mechanism to Deploy Lambda Functions in Two Additional Regions

This one we’ve already discussed in the blog: In the S3 Deployment post, I show how to use a Lambda function to deploy other Lambda functions stored as ZIP files in Amazon S3. In the ContainerCon talk we made this slightly fancier by also turning on S3 cross-region replication, so that we could upload the image processing service as a ZIP file to Ireland, have S3 automatically copy it to Tokyo, and then have both regions automatically deploy it to the associated Lambda services in those respective regions. Gotta love serverless solutions :).

A Serverless Web App, Part 1: API Endpoints

Now that we have a mobile app and a globally-deployed image processing service serving as its backend, let’s turn our attention to creating a serverless web app for those folks who prefer a browser to a device. We’ll do this in two parts: First, we’ll create an API endpoint for our image processing service. Then in the next section we’ll add the actual website using Amazon S3.

One of the ways in which AWS Lambda makes it easy to turn code into services is by providing a web service front end “built in”. However, this requires clients (like the mobile client we built in the last section) to sign requests with AWS-provided credentials. That’s handled by the Amazon Cognito auth client in our Android app, but what if we wanted to provide public access to the image processing service via a website?

To accomplish this, we’ll turn to another server, the Amazon API Gateway. This service lets you define an API without requiring any infrastructure – the API is fully managed by AWS. We’ll use the API gateway to create a URL for the image processing service that provides access to a subset of its capabilities to anyone on the web. Amazon API Gateway offers a variety of ways to control access to APIs: API calls can be signed with AWS credentials, you can use OAuth tokens and simply forward the token headers for verification, you can use API keys (not recommended as a way to secure access), or make an API completely public, as we’ll show here.

In addition to a variety of access models, the API Gateway has a lot of features that we won’t get to explore in this post. Some are builtin (like anti-DDOS protection) and others, like caching, would enable us to further reduce latency and cost for repeated retrievals of a popular image. By inserting an layer of indirection between clients and (micro)services, API Gateway also makes it possible to evolve them independently through its versioning and staging features. For now, though, we’ll focus on the basic task of exposing our image processing service as an API.

Ok, let’s create our API. In the AWS Console, pick the API Gateway and then select “New API”, provide a name for the API and an optional description. In my example, I named this “ImageAPI”.

API Gateway API Creation Button

Next, create a resource for your new API (I called this “ImageProcessingService”) and then create a POST method in it. Select “Lambda function” as the integration type, and type in the name of the Lambda function you’re using as your image processing service. In the “Method Request” configuration, set the authorization type to “none” (aka, this will be a publicly accessible endpoint). That’s pretty much it.

Method and Resource creation in Amazon API Gateway

To test the integration, click the “Test” button:
API Gateway API Test Button

then supply a test payload such as {“operation”: “ping”}. You should get the expected “pong” result, indicating that you’ve successfully linked your API to the your Lambda function.

Aside: We’ll get to more (and deeper) testing later, but one thing I sometimes find useful is to add a GET method at the top level resource in my API, bound to something simple, like the ping operation, to enable me to also quickly vet from any browser that my API is linked up to my Lambda function as expected. Not required for this demo (or in general), but you might find it useful as well.

For what comes next (S3 static content) we also need CORS enabled. It’s straightforward but there are several steps. The API Gateway team continues to make this easier, so instead of repeating the instructions here (and potentially having them get out of date quickly), I’ll point you to the documentation.

Click on the “Deploy this API” button. With that, you should be all set for website creation!

A Serverless Web App, Part 2: Static Website Hosting in Amazon S3

This part is easy – upload the following Javascript website code to your S3 bucket of choice:

var ENDPOINT = 'https://fuexvelc41.execute-api.us-east-1.amazonaws.com/prod/ImageProcessingService';

angular.module('app', ['ui.bootstrap'])

    .controller('MainController', ['$scope', '$http', function($scope, $http) {
        $scope.loading = false;
        $scope.image = {
            width: 100
        };

        $scope.ready = function() {
            $scope.loading = false;
        };

        $scope.submit = function() {
            var fileCtrl = document.getElementById('image-file');
            if (fileCtrl.files && fileCtrl.files[0]) {
                $scope.loading = true;
                var fr = new FileReader();
                fr.onload = function(e) {
                    $scope.image.base64Image = e.target.result.slice(e.target.result.indexOf(',') + 1);
                    $scope.$apply();
                    document.getElementById('original-image').src = e.target.result;
                    // Now resize!
                    $http.post(ENDPOINT, angular.extend($scope.image, { operation: 'resize', outputExtension: fileCtrl.value.split('.').pop() }))
                        .then(function(response) {
                            document.getElementById('processed-image').src = "data:image/png;base64," + response.data;
                        })
                        .catch(console.log)
                        .finally($scope.ready);
                };
                fr.readAsDataURL(fileCtrl.files[0]);
            }
        };
    }]);

And here’s the HTML source we used for the (very basic) website in the demo:

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Image Processing Service</title>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" type="text/css" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.4/css/bootstrap.min.css">
    <link rel="stylesheet" type="text/css" href="http://fonts.googleapis.com/css?family=Open+Sans:400,700">
    <link rel="stylesheet" type="text/css" href="main.css">
</head>
<body ng-app="app" ng-controller="MainController">
    <div class="container">
        <h1>Image Processing Service</h1>
        <div class="row">
            <div class="col-md-4">
                <form ng-submit="submit()">
                    <div class="form-group">
                        <label for="image-file">Image</label>
                        <input id="image-file" type="file">
                    </div>
                    <div class="form-group">
                        <label for="image-width">Width</label>
                        <input id="image-width" class="form-control" type="number"
                               ng-model="image.width" min="1" max="4096">
                    </div>
                    <button type="submit" class="btn btn-primary">
                        <span class="glyphicon glyphicon-refresh" ng-if="loading"></span>
                        Submit
                    </button>
                </form>
            </div>
            <div class="col-md-8">
                <accordion close-others="false">
                    <accordion-group heading="Original Image" is-open="true">
                        <img id="original-image" class="img-responsive">
                    </accordion-group>
                    <accordion-group heading="Processed Image" is-open="true">
                        <img id="processed-image" class="img-responsive">
                    </accordion-group>
                </accordion>
            </div>
        </div>
    </div>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular.js/1.3.15/angular.min.js"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/0.13.3/ui-bootstrap.min.js"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/0.13.3/ui-bootstrap-tpls.min.js"></script>
    <script type="text/javascript" src="main.js"></script>
</body>
</html>

Finally, here’s the CSS:

body {
    font-family: 'Open Sans', sans-serif;
    padding-bottom: 15px;
}

a {
    cursor: pointer;
}

/** LOADER **/

.glyphicon-refresh {
    -animation: spin .7s infinite linear;
    -webkit-animation: spin .7s infinite linear;
}

@keyframes spin {
    from { transform: rotate(0deg); }
    to { transform: rotate(360deg); }
}

@-webkit-keyframes spin {
    from { -webkit-transform: rotate(0deg); }
    to { -webkit-transform: rotate(360deg); }
}

…then turn on static website content serving in S3:

Amazon S3 static website hosting configuration

The URL will depend on your S3 region and object names, e.g. “http://image-processing-service.s3-website-us-east-1.amazonaws.com/”. Visit that URL in a browser and you should see your image website:

Sample website image

Unit and Load Testing

With API Gateway providing a classic URL-based interface to your Lambda microservice, you have a variety of options for testing. But let’s stick to our serverless approach and do it entirely without infrastructure or even a client!

First, we want to make calls through the API. That’s easy; we use Lambda’s HTTPS invocation blueprint to POST to the endpoint we got when we deployed with API Gateway:

{
  "options": {
    "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com",
    "path": "/prod/ImageProcessingService",
    "method": "POST"
  },
  "data": {
    "operation": "getSample"
  }
}

Now that we have that, let’s wrap a unit test around it. Our unit test harness doesn’t do much; it just runs another Lambda function and pops the result into an Amazon DynamoDB table that we specify. We’ll use the unit and load test harness Lambda blueprint for this in its “unit test” mode:

{
  "operation": "unit",
  "function": "HTTPSInvoker",
  "resultsTable": "unit-test-results",
  "testId": "LinuxConDemo",
  "event": {
    "options": {
      "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com",
      "path": "/prod/ImageProcessingService",
      "method": "POST"
    },
    "data": {
      "operation": "getSample"
    }
  }
}

Finally, we ‘ll do a simple load test by running the unit test multiple times. We’ll use the Lambda unit and load test harness again, this time in “load test” mode:

{
  "operation": "load",
  "iterations": 100,
  "function": "TestHarness",
  "event": {
    "operation": "unit",
    "function": "HTTPSInvoker",
    "resultsTable": "unit-test-results",
    "testId": "LinuxConLoadTestDemo",
    "event": {
      "options": {
        "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com",
        "path": "/prod/ImageProcessingService",
        "method": "POST"
      },
      "data": {
        "operation": "getSample"
      }
    }
  }
}

Here’s a picture of our serverless testing architecture:

Serverless Unit and Load Testing Harness in AWS Lambda

A Serverless Unit and Load Test Harness

You can easily vary this approach to incorporate validation, run a variety of unit tests, etc. If you don’t need the web app infrastructure, you can skip the API Gateway and HTTP invocation and simply run the image processing service directly in your unit test. If you want to summarize or analyze the test output, you can easily attach a Lambda function as an event handler to the DynamoDB table that holds the test results.

Summary

This was a longish post, but it’s a complete package for building a real, scalable backend service and fronting it with both mobile clients and a website, all without the need for servers or other infrastructure in any part of the system: frontend, backend, API, deployment, or testing. Go serverless!

Until next time, happy Lambda (and serverless microservice) coding!

-Tim
Follow Tim’s Lambda adventures on Twitter

Everything Depends on Context or, The Fine Art of nodejs Coding in AWS Lambda

Tim Wagner Tim Wagner, AWS Lambda General Manager

Quick, what’s wrong with the Lambda code sketch below?

exports.handler = function(event, context) {
    anyAsyncCall(args, function(err, result) {
        if (err) console.log('problem');
        else /* do something with result */;
    });
    context.succeed();
};

If you said the placement of context.succeed, you’re correct – it belongs inside the callback. In general, when you get this wrong your code exits prematurely, after the incorrectly placed context.succeed line, without allowing the callback to run. The same thing happens if you make calls in a loop, often leading to race conditions where some callbacks get “dropped”; the lack of a barrier synchronization forces a too-early exit.

If you test outside of Lambda, these patterns work fine in nodejs, because the default node runtime waits for all tasks to complete before exiting. Context.succeed, context.done, and context.fail however, are more than just bookkeeping – they cause the request to return after the current task completes, even if other tasks remain in the queue. Generally that’s not what you want if those tasks represent incomplete callbacks.

Placement Patterns

Fixing the code in the single callback case is trivial; the code above becomes

exports.handler = function(event, context) {
    anyAsyncCall(args, function(err, result) {
        if (err) console.log('problem');
        else /* do something with result */;
        context.succeed();
    });
};

Dealing with a loop that has an unbounded number of async calls inside it takes more work; here’s one pattern (the asyncAll function) used in Lambda’s test harness blueprint to run a test a given number of iterations:

/**
 * Provides a simple framework for conducting various tests of your Lambda
 * functions. Make sure to include permissions for `lambda:InvokeFunction`
 * and `dynamodb:PutItem` in your execution role!
 */
var AWS = require('aws-sdk');
var doc = require('dynamodb-doc');

var lambda = new AWS.Lambda({ apiVersion: '2015-03-31' });
var dynamo = new doc.DynamoDB();


// Runs a given function X times
var asyncAll = function(opts) {
    var i = -1;
    var next = function() {
        i++;
        if (i === opts.times) {
            opts.done();
            return;
        }
        opts.fn(next, i);
    };
    next();
};


/**
 * Will invoke the given function and write its result to the DynamoDB table
 * `event.resultsTable`. This table must have a hash key string of "testId"
 * and range key number of "iteration". Specify a unique `event.testId` to
 * differentiate each unit test run.
 */
var unit = function(event, context) {
    var lambdaParams = {
        FunctionName: event.function,
        Payload: JSON.stringify(event.event)
    };
    lambda.invoke(lambdaParams, function(err, data) {
        if (err) {
            context.fail(err);
        }
        // Write result to Dynamo
        var dynamoParams = {
            TableName: event.resultsTable,
            Item: {
                testId: event.testId,
                iteration: event.iteration || 0,
                result: data.Payload,
                passed: !JSON.parse(data.Payload).hasOwnProperty('errorMessage')
            }
        };
        dynamo.putItem(dynamoParams, context.done);
    });
};

/**
 * Will invoke the given function asynchronously `event.iterations` times.
 */
var load = function(event, context) {
    var payload = event.event;
    asyncAll({
        times: event.iterations,
        fn: function(next, i) {
            payload.iteration = i;
            var lambdaParams = {
                FunctionName: event.function,
                InvocationType: 'Event',
                Payload: JSON.stringify(payload)
            };
            lambda.invoke(lambdaParams, function(err, data) {
                next();
            });
        },
        done: function() {
            context.succeed('Load test complete');
        }
    });
};


var ops = {
    unit: unit,
    load: load
};

/**
 * Pass the test type (currently either "unit" or "load") as `event.operation`,
 * the name of the Lambda function to test as `event.function`, and the event
 * to invoke this function with as `event.event`.
 *
 * See the individual test methods above for more information about each
 * test type.
 */
exports.handler = function(event, context) {
    if (ops.hasOwnProperty(event.operation)) {
        ops[event.operation](event, context);
    } else {
        context.fail('Unrecognized operation "' + event.operation + '"');
    }
};

The approach above serializes the loop; there are many other approaches, and you can use async or other libraries to help.

Does this matter for Java or other jvm-based languages in AWS Lambda?

The specific issue discussed here – the “side effect” of the placement of a call like context.success on outstanding callbacks – is unique to nodejs. In other languages, such as Java, returning from the thread of control that represents the request ends the request, which is a little easier to reason about and generally matches developer expectations. Any other threads or processes running at the time that request returns get frozen until the next request (assuming the container gets reused; i.e., possibly never), so if you want them to wrap up, you would need to include explicit barrier synchronization before returning, just as you normally would for a server-side request implemented with multiple threads/processes.

In all languages, context also offers useful “environmental” information (like the request id) and methods (like the amount of time remaining).

Why not just let nodejs exit, if its default behavior is fine?

That would require every request to “boot” the runtime…potentially ok in a one-off functional test, but latency would suffer and it would keep high request rates from being cost effective. Check out the post on container reuse for more on this topic.

Can you make this easier?

You bet! We were trying hard to balance simplicity, control, and a self-imposed “don’t hack node” rule when we launched back in November. Fortunately, newer versions of nodejs offer more control over “exiting” behavior, and we’re looking hard at how to make future releases of nodejs within Lambda offer easier to understand semantics without losing the latency and cost benefits of container reuse. Stay tuned!

Until next time, happy Lambda (and contextually successful nodejs) coding!

-Tim
Follow Tim’s Lambda adventures on Twitter

Better Together: Amazon ECS and AWS Lambda

My colleague Constantin Gonzalez sent a nice guest post that shows how to create container workers using Amazon ECS.

Amazon EC2 Container Service (Amazon ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.

AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device.

In this post, we show you how to combine the two services to mutually enhance their capabilities: See how you can get more out of Lambda by using it to start ECS tasks and how you can turn your ECS cluster into a dynamic fleet of container workers that react to any event supported by Lambda.

Example Setup: Ray-tracing high-quality images in the cloud

To illustrate this pattern, you build a simple architecture that generates high-quality, ray-traced images out of input files written in a popular, open-source raytracing language called POV-Ray, conveyed by POV and licensed under either POV’s proprietary license (up to version 3.6) or AGPLv3 (version 3.7 onwards). Here’s an overview of the architecture:

To use this architecture, put your POV-Ray scene description file (a POV-Ray .POV file) and its rendering parameters (a POV-Ray .INI file), as well as any supporting other files (e.g., texture images), into a single .ZIP file and upload it to an Amazon S3 bucket. In this architecture, the bucket is configured with an S3 event notification which triggers a Lambda function as soon as the .ZIP file is uploaded.

In similar setups (such as transcoding images), Lambda alone would be sufficient to perform its job on the uploaded object within its allocated time frame (currently 60 seconds). But in this example, we want to support complex rendering jobs that usually take significantly longer.

Therefore, the Lambda function simply takes the event data it received from S3 and sends it as a message into an Amazon Simple Queue Service (SQS) queue. SQS is a fast, reliable, scalable, fully-managed message queuing service. The Lambda function then starts an ECS task that can fetch and process the message from SQS.

Your ECS task contains a simple shell script that reads messages from SQS, extracts the S3 bucket and key information, downloads and unpacks the .ZIP file from S3 and proceeds to starting POV-Ray with the downloaded scene description and data.

After POV-Ray has performed its rendering magic, the script takes the resulting .PNG picture and uploads it back to the same S3 bucket where the original scene description was downloaded from. Then it deletes the message from the queue to avoid duplicate message processing.

The script continues pulling, processing, and deleting messages from the SQS queue until it is fully drained, then it exits, thereby terminating its own container.

Simple and efficient event-driven computing

This architecture can help you:

  • Extend the capabilities of Lambda to support any processing time, more programming languages, or other resource requirements, to take advantage of the flexibility of Docker containers.
  • Extend the capabilities of ECS to allow event-driven execution of ECS tasks: Use any event type supported by Lambda to start new ECS tasks for processing events, run long batch jobs triggered by new data in S3, or any other event-driven mechanism that you want to implement as a Docker container.
  • Get the best of both worlds by coupling the dynamic, event-driven Lambda model with the power of the Docker eco-system.

Step-by-Step

Sounds interesting? Then get started!

This is an advanced architecture example covering a number of AWS services like Lambda, ECS, S3, and SQS in depth as well as using multiple related IAM policies. The underlying resources are in your account and subject to their pricing.

To make it easier for you to follow, we have published all necessary code and scripts on GitHub in the awslabs/lambda-ecs-worker-pattern repository. You might find it even more helpful if you could become familiar with the mentioned services by working through the respective Getting Started documentation first.

Meet the following prerequisites:

This post walks through the steps required for setup, then gives you a simple Python script that can perform all the steps for you.

Step 1: Set up an S3 bucket
Start by setting up an S3 bucket to hold both the POV-Ray input files and the resulting .PNG output pictures. Choose a bucket name and create it:

$ aws s3 mb s3://<YOUR-BUCKET-NAME>

Step 2: Create an SQS queue
Use SQS to pass the S3 notification event data from Lambda to your ECS task. You can create a new SQS queue using the following command:

$ aws sqs create-queue --queue-name ECSPOVRayWorkerQueue

Step 3: Create the Lambda function
The following function reads in a configuration file with the name of an SQS queue, an ECS task definition name, and a whitelist of accepted input file types (.ZIP, in this example).

The config file uses JSON and looks like this (make sure to use your region):

$ cat ecs-worker-launcher/config.js
{
    "queue": "https://<YOUR-REGION>.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue",
    "task": "ECSPOVRayWorkerTask",
    "s3_key_suffix_whitelist": [".zip"]
}

The SQS queue ARN (which looks like: https://eu-west-1.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue) is the output of the preceding command, in which you created your ECS queue. The “task” attribute references the name of an ECS task that you create in a future step.

The Lambda function checks the S3 object key given in the Lambda event against the file type whitelist; in case of a match, it sends a message to the configured SQS queue with the event data and starts the ECS task specified in the configuration file. Here’s the code:

// Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License").
// You may not use this file except in compliance with the License.
// A copy of the License is located at
//
//    http://aws.amazon.com/apache2.0/
//
// or in the "license" file accompanying this file.
// This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and limitations under the License.

// This Lambda function forwards the given event data into an SQS queue, then starts an ECS task to
// process that event.

var fs = require('fs');
var async = require('async');
var aws = require('aws-sdk');
var sqs = new aws.SQS({apiVersion: '2012-11-05'});
var ecs = new aws.ECS({apiVersion: '2014-11-13'});

// Check if the given key suffix matches a suffix in the whitelist. Return true if it matches, false otherwise.
exports.checkS3SuffixWhitelist = function(key, whitelist) {
    if(!whitelist){ return true; }
    if(typeof whitelist == 'string'){ return key.match(whitelist + '$') }
    if(Object.prototype.toString.call(whitelist) === '[object Array]') {
        for(var i = 0; i < whitelist.length; i++) {
            if(key.match(whitelist[i] + '$')) { return true; }
        }
        return false;
    }
    console.log(
        'Unsupported whitelist type (' + Object.prototype.toString.call(whitelist) +
        ') for: ' + JSON.stringify(whitelist)
    );
    return false;
};

exports.handler = function(event, context) {
    console.log('Received event:');
    console.log(JSON.stringify(event, null, '  '));

    var config = JSON.parse(fs.readFileSync('config.json', 'utf8'));
    if(!config.hasOwnProperty('s3_key_suffix_whitelist')) {
        config.s3_key_suffix_whitelist = false;
    }
    console.log('Config: ' + JSON.stringify(config));

    var key = event.Records[0].s3.object.key;

    if(!exports.checkS3SuffixWhitelist(key, config.s3_key_suffix_whitelist)) {
        context.fail('Suffix for key: ' + key + ' is not in the whitelist')
    }

    // We can now go on. Put the S3 URL into SQS and start an ECS task to process it.
    async.waterfall([
            function (next) {
                var params = {
                    MessageBody: JSON.stringify(event),
                    QueueUrl: config.queue
                };
                sqs.sendMessage(params, function (err, data) {
                    if (err) { console.warn('Error while sending message: ' + err); }
                    else { console.info('Message sent, ID: ' + data.MessageId); }
                    next(err);
                });
            },
            function (next) {
                // Starts an ECS task to work through the feeds.
                var params = {
                    taskDefinition: config.task,
                    count: 1
                };
                ecs.runTask(params, function (err, data) {
                    if (err) { console.warn('error: ', "Error while starting task: " + err); }
                    else { console.info('Task ' + config.task + ' started: ' + JSON.stringify(data.tasks))}
                    next(err);
                });
            }
        ], function (err) {
            if (err) {
                context.fail('An error has occurred: ' + err);
            }
            else {
                context.succeed('Successfully processed Amazon S3 URL.');
            }
        }
    );
};

The Lambda function uses the Async.js library to make it easier to program the sequence of events to perform in an event-driven language like Node.js. You can install the library by typing npm install async from within the directory where the Lambda function and its configuration file are located.

To upload the function into Lambda, zip the Lambda function, its configuration file, and the node_modules directory with the Async.js library. In the Lambda console, upload the .ZIP file as described in the Node.js for S3 events tutorial.

For this function to perform its job, it needs an IAM role with a policy that allows access to SQS as well as the right to start tasks on ECS. It also should be able to publish log data to CloudWatch Logs. However, it does not need explicit access to S3, because only the ECS task needs to download the source file from and upload the resulting image to S3. Here is an example policy:

{
    "Statement": [
        {
            "Action": [
                "logs:*", 
                "lambda:invokeFunction",
                "sqs:SendMessage",
                "ecs:RunTask"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:logs:*:*:*",
                "arn:aws:lambda:*:*:*:*",
                "arn:aws:sqs:*:*:*",
                "arn:aws:ecs:*:*:*"
            ]
        }
    ],
    "Version": "2012-10-17"
}

For the sake of simplicity in this post, we used very broadly defined resource identifiers like “arn:aws:sqs:*:*:*”, which cover all the resources of the given types. In a real-world scenario, we recommend that you make resource definitions as specific as possible by adding account IDs, queue names, and other resource ARN parameters.

Step 4: Configure S3 bucket notifications
Now you need to set up a bucket notification for your S3 bucket that triggers the Lambda function as soon as a new object is copied into the bucket.

This is a two-step process:

1. Add permission for S3 to be able to call your Lambda function:

$ aws lambda add-permission \
--function-name ecs-pov-ray-worker\
--region \
--statement-id \
--action "lambda:InvokeFunction" \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3::: \
--source-account \
--profile

2. Set up an S3 bucket notification configuration (note: use the ARN for your Lambda function):

$ aws s3api put-bucket-notification-configuration \
--bucket \
--notification-configuration \
'{"LambdaFunctionConfigurations": [{"Events": ["s3:ObjectCreated:*"], "Id": "ECSPOVRayWorker", "LambdaFunctionArn": "arn:aws:lambda:eu-west-1::function:ecs-worker-launcher"}]}'

If you use these CLI commands, remember to substitute your particular name and Lambda function ARN.

Step 5: Create a Docker image
Docker images contain all of the software needed to run your application on Docker, out of Dockerfiles that describe the steps needed to create that image.

In this step, craft a Dockerfile that installs the POV-Ray ray-tracing application from POV (see important licensing information at the top of this post) as well as a simple shell script that uses the AWS CLI to consume messages from SQS, download and unpack the input data from S3 into the local file system, run the ray-tracer on the input file, then upload the resulting image back to S3, and delete the message from the SQS queue.

Start with the shell script, called ecs-worker.sh:

#!/bin/bash

# Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

#
# Simple POV-Ray worker shell script.
#
# Uses the AWS CLI utility to fetch a message from SQS, fetch a ZIP file from S3 that was specified in the message,
# render its contents with POV-Ray, then upload the resulting .png file to the same S3 bucket.
#

region=${AWS_REGION}
queue=${SQS_QUEUE_URL}

# Fetch messages and render them until the queue is drained.
while [ /bin/true ]; do
    # Fetch the next message and extract the S3 URL to fetch the POV-Ray source ZIP from.
    echo "Fetching messages from SQS queue: ${queue}..."
    result=$( \
        aws sqs receive-message \
            --queue-url ${queue} \
            --region ${region} \
            --wait-time-seconds 20 \
            --query Messages[0].[Body,ReceiptHandle] \
        | sed -e 's/^"\(.*\)"$/\1/'\
    )

    if [ -z "${result}" ]; then
        echo "No messages left in queue. Exiting."
        exit 0
    else
        echo "Message: ${result}."

        receipt_handle=$(echo ${result} | sed -e 's/^.*"\([^"]*\)"\s*\]$/\1/')
        echo "Receipt handle: ${receipt_handle}."

        bucket=$(echo ${result} | sed -e 's/^.*arn:aws:s3:::\([^\\]*\)\\".*$/\1/')
        echo "Bucket: ${bucket}."

        key=$(echo ${result} | sed -e 's/^.*\\"key\\":\s*\\"\([^\\]*\)\\".*$/\1/')
        echo "Key: ${key}."

        base=${key%.*}
        ext=${key##*.}

        if [ \
            -n "${result}" -a \
            -n "${receipt_handle}" -a \
            -n "${key}" -a \
            -n "${base}" -a \
            -n "${ext}" -a \
            "${ext}" = "zip" \
        ]; then
            mkdir -p work
            pushd work

            echo "Copying ${key} from S3 bucket ${bucket}..."
            aws s3 cp s3://${bucket}/${key} . --region ${region}

            echo "Unzipping ${key}..."
            unzip ${key}

            if [ -f ${base}.ini ]; then
                echo "Rendering POV-Ray scene ${base}..."
                if povray ${base}; then
                    if [ -f ${base}.png ]; then
                        echo "Copying result image ${base}.png to s3://${bucket}/${base}.png..."
                        aws s3 cp ${base}.png s3://${bucket}/${base}.png
                    else
                        echo "ERROR: POV-Ray source did not generate ${base}.png image."
                    fi
                else
                    echo "ERROR: POV-Ray source did not render successfully."
                fi
            else
                echo "ERROR: No ${base}.ini file found in POV-Ray source archive."
            fi

            echo "Cleaning up..."
            popd
            /bin/rm -rf work

            echo "Deleting message..."
            aws sqs delete-message \
                --queue-url ${queue} \
                --region ${region} \
                --receipt-handle "${receipt_handle}"

        else
            echo "ERROR: Could not extract S3 bucket and key from SQS message."
        fi
    fi
done

Remember to run chmod +x ecs-worker.sh to make the shell script executable. This permission is copied over to the Docker image you create in the next step, which is to put together a Dockerfile that includes all you need to set up the POV-Ray software and the AWS CLI in addition to your script:

# POV-Ray Amazon ECS Worker

FROM ubuntu:14.04

MAINTAINER FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM>

# Libraries and dependencies

RUN \
  apt-get update && apt-get -y install \
  autoconf \
  build-essential \
  git \
  libboost-thread-dev \
  libjpeg-dev \
  libopenexr-dev \
  libpng-dev \
  libtiff-dev \
  python \
  python-dev \
  python-distribute \
  python-pip \
  unzip \
  zlib1g-dev

# Compile and install POV-Ray

RUN \
  mkdir /src && \
  cd /src && \
  git clone https://github.com/POV-Ray/povray.git && \
  cd povray && \
  git checkout origin/3.7-stable && \
  cd unix && \
  sed 's/automake --w/automake --add-missing --w/g' -i prebuild.sh && \
  sed 's/dist-bzip2/dist-bzip2 subdir-objects/g' -i configure.ac && \
  ./prebuild.sh && \
  cd .. && \
  ./configure COMPILED_BY="FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM>" LIBS="-lboost_system -lboost_thread" && \
  make && \
  make install

# Install AWS CLI

RUN \
  pip install awscli

WORKDIR /

COPY ecs-worker.sh /

CMD [ "./ ecs-worker.sh" ]

Substitute your own name and email address into the COMPILED_BY parameter when using this Dockerfile. Also, this file assumes that you have the ecs-worker.sh script in your current directory when you create the Docker image.

After making sure the shell script is in the local directory and setting up the Dockerfile (and your account/credentials with Docker Hub), you can create the Docker image using the following commands:

$ docker build -t <DOCKERHUB_USER>/<DOCKERHUB_REPOSITORY>:<TAG> .
$ docker login -u <DOCKERHUB_USER> -e <DOCKERHUB_EMAIL>
$ docker push <TAG>

Step 6: Create an ECS task definition
Now that you have a Docker image ready to go, you can create an ECS task definition:

{
    "containerDefinitions": [
        {
            "name": "ECSPOVRayWorker",
            "image": "<DOCKERHUB_USER>/<DOCKERHUB_REPOSITORY>:<TAG>",
            "cpu": 512,
            "environment": [
                {
                    "name": "AWS_REGION",
                    "value": "<YOUR-CHOSEN-AWS-REGION>"
                },
                {
                    "name": "SQS_QUEUE_URL",
                    "value": "https://<YOUR_REGION>.queue.amazonaws.com/<YOUR_AWS_ACCOUNT_ID>/ECSPOVRayWorkerQueue"
                }
            ],
            "memory": 512,
            "essential": true
        }
    ],
    "family": "ECSPOVRayWorkerTask"
}

When using this example task definition file, remember to substitute your own values for the Dockerhub user, repository, and tag as well as your chosen AWS region and SQS queue ARN.

Now, you’re ready to register your task definition with ECS:

$ aws ecs register-task-definition –cli-input-json file://task-definition-file.json

Remember, the ECS task family name “ECSPOVRayWorkerTask” corresponds to the “task” attribute of your Lambda function’s configuration file. This is how Lambda knows which ECS task to start upon invocation; if you decide to name your ECS task definition differently, also remember to update the Lambda function’s configuration file accordingly.

Step 7: Add a policy to your ECS instance role that allows access to SQS and S3
Your SQS queue worker script running inside your Docker container on ECS needs some permissions to fetch messages from SQS, download and upload files to/from S3 and to delete messages from SQS when it’s done.

The following example policy shows the permissions needed for this application, in addition to the standard ECS-related permissions:

{
    "Statement": [
        {
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::/*"
        }
    ],
    "Version": "2012-10-17"
}

You can attach this policy to your ECS instance role or work the missing policy statements into your existing ECS instance role. The former option is preferred as it lets you manage different policies for different tasks in separate policy documents.

Step 8: Test your new ray-tracing service
You’re now ready to test the new Lambda/Docker worker pattern for rendering ray-traced images!

To help you test it, we have provided you with a ready to use .ZIP file, containing a sample POV-Ray scene that you can download from the awslabs/lambda-ecs-worker-pattern GitHub repository.

Upload the .ZIP file to your S3 bucket and, after a few minutes, you should see the final rendered image appear in the same bucket, looking like this:

If it doesn’t work out of the box, don’t panic! Here are some hints on how to debug this scenario:

  • Use the Amazon CloudWatch Logs console and look for errors generated by Lambda. Check out the Troubleshooting section of the AWS Lambda documentation.
  • Log into your ECS container node(s) and use the docker ps -a command to identify the Docker container that was running your application. Use the docker logs command to look for errors. Check out the Troubleshooting section of the ECS documentation.
  • If the Docker container is still running, you can log into it using the docker exec -it /bin/bash command and see what’s going on as it happens.

All in one go

To make setup even easier, we have put together a Python Fabric script that handles all of the above tasks for you. Fabric is a Python module that makes it easy to run commands on remote nodes, transfer files over SSH, run commands locally, and structure your script in a manner similar to a makefile.

You can download the script and Python fabfile that can help set this up, along with instructions, from the awslabs/lambda-ecs-worker-pattern repository on GitHub.

Further considerations

This example is intentionally simple and generic so you can adapt it to a wide variety of situations. When implementing this pattern for your own projects, you may want to consider additional issues.

In this pattern, each Lambda function launches its own ECS container to process the event. When many events occur, multiple containers are launched and this may not be what you want; a single running ECS task can continue processing messages from SQS until the queue is empty. Consider scaling the number of running ECS tasks independently of the number of Lambda function invocations. For more information, see Scaling Amazon ECS Services Automatically Using Amazon CloudWatch and AWS Lambda.

This approach is not limited to launching ECS containers; you can use Lambda to launch any other AWS service or resource, including Amazon Elastic Transcoder jobs, Amazon Simple Workflow Service executions, or AWS Data Pipeline jobs.

Combining ECS tasks with SQS is a very simple, but powerful batch worker pattern. You can use it even without Lambda: whenever you want to get a piece of long-running batch work done, write its parameters into an SQS queue and launch an ECS task in the background, while your application continues normally.

This pattern uses SQS to buffer the full S3 bucket notification event for the ECS task to pick it up. In cases where the parameters to be forwarded to ECS are short and simple (a URL, file name, or simple data structure), you can wrap them into environment variables and specify them as overrides in the run-task operation. This means that for simple parameters, you can stop using SQS altogether.

This pattern can also save on costs. Many traditional computing tasks need a specialized software installation (like the POV-Ray rendering software in this example) but are only used intermittently (such as one time per day or per week) for less than an hour. Keeping machines or fleets for such specialized tasks can create waste because they may not reach a significant use level.
Using ECS, you can share the hardware infrastructure for your batch worker fleets among very different specific worker implementations (a ray-tracing application, ETL process, document converter, etc.). This allows you to drive higher use of your generic ECS fleet while it performs very different tasks, through the ability to run different container images on the same infrastructure. You need fewer hardware resources to accommodate a wide variety of tasks.

Conclusion

This pattern is widely applicable. Many applications that can be seen as batch-driven workers can be implemented as ECS tasks that can be started from a Lambda function, using SQS to buffer parameters.

Look at your infrastructure and try to identify underused EC2 worker instances that could be re-implemented as ECS tasks; run them on a smaller, more efficient footprint and keep all of their functionality. Re-visit event-driven cases where you may have dismissed Lambda before, and try to apply the techniques outlined in this post in order to expand the usefulness of Lambda into more use cases with more complex execution requirements.

We hope you found this post useful and look forward to your comments about where you plan to implement this in your current and future projects.

Cost-effective Batch Processing with Amazon EC2 Spot

Tipu Qureshi Tipu Qureshi, AWS Senior Cloud Support Engineer

With Spot Instances, you can save up to 90% of costs by bidding on spare Amazon Elastic Compute Cloud (Amazon EC2) instances. This reference architecture is meant to help enable you to realize cost savings for batch processing applications while maintaining high availability. We recommend tailoring and testing it for your application before implementing it in a production environment.

Below is a multi-part job processing architecture that can be used for the deployment of a heterogeneous, scalable “grid” of worker nodes that can quickly crunch through large batch processing tasks in parallel. There are numerous batch oriented applications in place today that can leverage this style of on-demand processing, including claims processing, large scale transformation, media processing and multi-part data processing work.

Spot Batch Architecture Diagram

Raw job data is uploaded to Amazon Simple Storage Service (S3) which is a highly-available and persistent data store. An AWS Lambda function will be invoked by S3 every time a new object is uploaded to the input S3 bucket. AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. Information about Lambda functions is available here and a walkthrough on triggering Lambda functions on S3 object uploads is available here.

AWS Lambda automatically runs your code with an IAM role that you select, making it easy to access other AWS resources, such as Amazon S3, Amazon SQS, and Amazon DynamoDB. AWS lambda can be used to place a job message into an Amazon SQS queue. Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed message queuing service, which makes it simple and cost-effective to decouple the components of a cloud application. Depending on the application’s needs, multiple SQS queues might be required for different functions and priorities.

The AWS Lambda function will also store state information for each job task in Amazon DynamoDB. DynamoDB is a regional service, meaning that the data is automatically replicated across availability zones. AWS Lambda can be used to trigger other types of workflows as well, such as an Amazon Elastic Transcoder job. EC2 Spot can also be used with Amazon Elastic MapReduce (Amazon EMR).
Below is a sample IAM policy that can be attached to an IAM role for AWS Lambda. You will need to change the ARNs to match your resources.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1438283855455",
      "Action": [
        "dynamodb:PutItem"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:dynamodb:us-east-1::table/demojobtable"
    },
    {
      "Sid": "Stmt1438283929844",
      "Action": [
        "sqs:SendMessage"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sqs:us-east-1::demojobqueue"
    }
  ]
}

Below is a sample Lambda function that will send an SQS message and put an item into a DynamoDB table in response to a S3 object upload. You will need to change the SQS queue URL and DynamoDB table name to match your resources.

// create an IAM Lambda role with access to Amazon SQS queue and DynamoDB table
// configure S3 to publish events as shown here: http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser-configure-s3.html

// dependencies
var AWS = require('aws-sdk');

// get reference to clients
var s3 = new AWS.S3();
var sqs = new AWS.SQS();
var dynamodb = new AWS.DynamoDB();

console.log ('Loading function');

exports.handler = function(event, context) {
  // Read options from the event.
  var srcBucket = event.Records[0].s3.bucket.name;
  // Object key may have spaces or unicode non-ASCII characters.
  var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));
  // prepare SQS message
  var params = {
    MessageBody: 'object '+ srcKey + ' ',
    QueueUrl: 'https://sqs.us-east-1.amazonaws.com//demojobqueue',
    DelaySeconds: 0
  };
  //send SQS message
  sqs.sendMessage(params, function (err, data) {
    if (err) {
      console.error('Unable to put object' + srcKey + ' into SQS queue due to an error: ' + err);
      context.fail(srcKey, 'Unable to send message to SQS');
    } // an error occurred
    else {
      //define DynamoDB table variables 
      var tableName = "demojobtable";
      var datetime = new Date().getTime().toString();
      //Put item into DynamoDB table where srcKey is the hash key and datetime is the range key
      dynamodb.putItem({
        "TableName": tableName,
        "Item": {
          "srcKey": {"S": srcKey },
          "datetime": {"S": datetime },
        }
      }, function(err, data) {
        if (err) {
          console.error('Unable to put object' + srcKey + ' into DynamoDB table due to an error: ' + err);
          context.fail(srcKey, 'Unable to put data to DynamoDB Table');
        }
        else {
          console.log('Successfully put object' + srcKey + ' into SQS queue and DynamoDB');
          context.succeed(srcKey, 'Data put into SQS and DynamoDB');
        }
      });
    }
  });
};

Worker nodes are Amazon EC2 Spot and On-demand instances on deployed Auto Scaling groups. These groups are containers that ensure health and scalability of worker nodes. Worker nodes pick up job parts from the input queue automatically and perform single tasks based on the job task state in DynamoDB. Worker nodes will store the input objects in a file system such as Amazon Elastic File System (Amazon EFS) for processing. Amazon EFS is a file storage service for EC2 that provides elastic capacity to your applications, automatically adding storage as you add files. Depending on IO and application needs, the job data can also be stored on local instance store or Amazon Elastic Block Store (EBS). Each job can be further split into multiples sub-parts if there is a mechanism to stitch the outputs together (as in the case of some media processing where pre-segmenting the output may be possible). Once completed, the objects will be uploaded back to S3 using multi-part upload.

Similar to our blog on an EC2 Spot architecture for web applications, Auto Scaling groups running On-demand instances can be used together with groups running Spot instances. Spot Auto Scaling groups can use different Spot bid prices and even different instance types to give you more flexibility and to meet changing traffic demands.

The availability of Spot instances can vary depending on how many unused Amazon EC2 instances are available. Because real-time supply and demand dictates the available supply of Spot instances, you should architect your application to be resilient to instance termination. When the Spot price exceeds the price you named (i.e. the bid price), the instance will receive a warning that it will be terminated after two minutes. You can manage this event by creating IAM roles with the relevant SQS and DynamoDB permissions for the Spot instances to run the required shutdown scripts. Details about Creating an IAM Role Using the AWS CLI can be found here and Spot Instance Termination Notices documentation can be found here.

A script like the following can be placed in a loop and can be run on startup (e.g. via systemd or rc.local) to detect for Spot instance termination. It can then update job task state in DynamoDB and re-insert the job task into the queue if required. We recommend that applications poll on the termination notice at five-second intervals.


#!/bin/bash
while true
  do
    if curl -s http://169.254.169.254/latest/meta-data/spot/termination-time | grep -q .*T.*Z; then /env/bin/runterminationscripts.sh; 
  else
    # Spot instance not yet marked for termination.
    sleep 5
  fi
done

An Auto Scaling group running on-demand instances in tandem with Auto Scaling group(s) running Spot instances will help to ensure your application’s availability in case of changes in Spot market price and Spot instance capacity. Additionally, using multiple Spot Auto Scaling groups with different bids and capacity pools (groups of instances that share the same attributes) can help with both availability and cost savings. By having the ability to run across multiple pools, you reduce your application’s sensitivity to price spikes that affect a pool or two (in general, there is very little correlation between prices in different capacity pools). For example, if you run in five different pools your price swings and interruptions can be cut by 80%.

For Auto Scaling to scale according to your application’s needs, you must define how you want to scale in response to changing conditions. You can assign more aggressive scaling policies to Auto Scaling groups that run Spot instances, and more conservative ones to Auto Scaling groups that run on-demand instances. The Auto Scaling instances will scale up based on the SQS queue depth CloudWatch metric (or more relevant metric depending on your application), but will scale down based on EC2 instance CPU utilization CloudWatch metric (or more relevant metric) to ensure that a job actually gets completed before the instance is terminated. For information about using Amazon CloudWatch metrics (such as SQS queue depth) to scale automatically, see Dynamic Scaling. As a safety buffer, a grace period (e.g. 300 seconds) should also be configured for the Auto Scaling group to prevent termination before a job completes.

Reserved instances can be purchased for the On-demand EC2 instances if they are consistently being used to realize even more cost savings.

To automate further, you can optionally create a Lambda function to dynamically manage Auto Scaling groups based on the Spot market. The Lambda function could periodically invoke the EC2 Spot APIs to assess market prices and availability and respond by creating new Auto Scaling launch configurations and groups automatically. Information on the Describe Spot Price History API is available here. This function could also delete any Spot Auto Scaling groups and launch configurations that have no instances. AWS Data Pipeline can be used to invoke the Lambda function using the AWS CLI at regular intervals by scheduling pipelines. More information about scheduling pipelines is available here and information about invoking AWS Lambda functions using AWS CLI is here.

Spot Architecture Automation Diagram

Building NoSQL Database Triggers with Amazon DynamoDB and AWS Lambda

Tim Wagner Tim Wagner, AWS Lambda General Manager

SQL databases have offered triggers for years, making it easy to validate and check data, maintain integrity constraints, create compute columns, and more. Why should SQL tables have all the fun…let’s do the equivalent for NoSQL data!

Amazon DynamoDB recently launched their streams feature (table update notifications) in production. When you combine this with AWS Lambda, it’s easy to create NoSQL database triggers that let you audit, aggregate, verify, and transform data. In this blog post we’ll do just that: first, we’ll create a data audit trigger, then we’ll extend it to also transform the data by adding a computed column to our table that the trigger maintains automatically. We’ll use social security numbers and customer names as our sample data, because they’re representative of something you might find in a production environment. Let’s get started…

Data Auditing

Our first goal is to identify and report invalid social security numbers. We’ll accept two formats: 9 digits (e.g., 123456789) and 11 digits (e.g., 123-45-6789). Anything else is an error and will generate an SNS message which, for the purposes of this demo, we’ll use to send an email report of the problem.

Setup Part 1: Defining a Table Schema

First, start by creating a new table; I’m calling it “TriggerDemo”:

Amazon DynamoDB Setup: Table Creation

For our example we’ll use just two fields: Name and SocialSecurityNumber (the primary hash key and primary range key, respectively, both represented as strings). In a more realistic setting you’d typically have additional customer-specific information keyed off these fields. You can accept the default capacity settings and you don’t need any secondary indices.

Amazon DynamoDB Setup: Primary Key Configuration

You do need to turn on streams in order to be able to send updates to your AWS Lambda function (we’ll get to that in a minute). You can read more about configuring and using DynamoDB streams in the DynamoDB developer guide.

Amazon DynamoDB Setup: Enabling Streams

Here’s the summary view of the table we’ve just configured:

Amazon DynamoDB Setup: Summary

Setup Part 2: SNS Topic and Email Subscription

To give us a way to report errors, we’ll create an SNS topic; I’m calling mine, “BadSSNNumbers”.

Amazon Simple Notification Service (SNS) Setup: Creating a Topic

(The other topic here is the DynamoDB alarm.)

Amazon Simple Notification Service (SNS) Setup: Listing Topics

…and then I’ll subscribe my email to it to receive error notifications:

Amazon Simple Notification Service (SNS) Setup: Configuring an Email Subscription to the Topic

(I haven’t shown it here, but you can also turn on SNS logging as a debugging aid.)

Ok, we have a database and a notification system…now we need a compute service!

Setup Part 3: A Lambda-based Trigger

Now we’ll create an AWS Lambda function that will respond to DynamoDB updates by verifying the integrity of each social security number, using the SNS topic we just created to notify us of any problematic entries.

First, create a new Lambda function by selecting the “dynamodb-process-stream” blueprint. Blueprints help you get started quickly with common tasks.

AWS Lambda Function Setup: Choosing the NoSQL Trigger Blueprint

For the event source, select your TriggerDemo table:

AWS Lambda Function Setup: Configuring the DynamoDB Stream Event Source

You’ll also need to provide your function with permissions to read from the stream by choosing the recommended role (DynamoDB event stream role):

AWS Lambda Function Setup: Choosing the NoSQL Trigger Blueprint

The blueprint-provided permission policy only assumes you’re going to read from the update stream and create log entries, but we need an additional permission: publishing to the SNS topic. In the later part of this demo we’ll also want to write to the table, so let’s take care of both pieces at once: Hop over to the IAM console and add two managed policies to your role: SNS full access and DynamoDB full access. (Note: This is an quick approach for demo purposes, but if you want to use the techniques described here for a production table, I strongly recommend your create custom “minimal trust” policies that permit only the necessarily operations and resources to be accessed from your Lambda function.)

AWS Lambda Function Setup: Adding Managed Policies for SNS and DynamoDB

The code for the Lambda function is straightforward: It receives batches of change notifications from DynamoDB and processes each one in turn by checking its social security number, reporting any malformed ones via the SNS topic we configured earlier. Replace the sample code provided by the blueprint with the following, being sure to replace the SNS ARN with the one from your own topic:

var AWS = require('aws-sdk');
var sns = new AWS.SNS();
exports.handler = function(event, context) {processRecord(context, 0, event.Records);}

// Process each DynamoDB record
function processRecord(context, index, records) {
    if (index == records.length) {
        context.succeed("Processed " + records.length + " records.");
        return;
    }
    record = records[index];
    console.log("ID: " + record.eventID + "; Event: " + record.eventName);
    console.log('DynamoDB Record: %j', record.dynamodb);
    // Assumes SSN# is only set only on row creation
    if ((record.eventName != "INSERT") || valid(record)) processRecord(context, index+1, records);
    else {
        console.log('Invalid SSN # detected');
        var name = record.dynamodb.Keys.Name.S;
        console.log('name: ' + name);
        var ssn  = record.dynamodb.Keys.SocialSecurityNumber.S;
        console.log('ssn: ' + ssn);
        var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.';
        console.log('Message to send: ' + message);
        var params = {
            Message:  message,
            TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE'
        };
        sns.publish(params, function(err, data) {
            if (err) console.log(err, err.stack);
            else console.log('malformed SSN message sent successfully');
            processRecord(context, index+1, records);
        });
    }
}

// Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn.
function valid(record) {
    var SSN = record.dynamodb.Keys.SocialSecurityNumber.S;
    if (SSN.length != 9 && SSN.length != 11) return false;
    if (SSN.length == 9) {
        for (var indx in SSN) if (!isDigit(SSN[indx])) return false;
        return true;
    }
    else {
        return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) &&
               SSN[3] == '-' &&
               isDigit(SSN[4]) && isDigit(SSN[5]) &&
               SSN[6] == '-' &&
               isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]);
    }
}

function isDigit(c) {return c >= '0' && c <= '9';}

Testing the Trigger

Ok, now it’s time to see things in action. First, use the “Test” button on the Lambda console to validate your code and make sure the SNS notifications are sending email. Next, if you created your Lambda function event source in a disabled state, enable it now. Then go to the DynamoDB console and enter some sample data. First, let’s try a valid entry:

Adding Sample Data to the DynamoDB Table

Since this one was valid, you should get a CloudWatch Log entry but no email. Now for the fun part: Try an invalid entry, such as “Bob Smith” with a social security number of “asdf”. You should receive an email notification something like this for the invalid SSN entry:

Sample Email Notification

You can also check the Amazon CloudWatch Logs to see the analysis and reporting in action and debug any problems:

Sample Email Notification

So in a few lines of Lambda function code we implemented a scalable, serverless NoSQL trigger capable of auditing every change to a DynamoDB table and reporting any errors it detects. You can use similar techniques to validate other data types, aggregate or mark suspected errors instead of reporting them via SNS, and so forth.

Data Transformation

In the previous section we audited the data. Now we’re going to take it a step further and have the trigger also maintain a computed column that describes the format of the social security number. The computed attribute can have one of three values: 9 (meaning, “The social security number in this row is valid and is a 9-digit format”), 11, or “INVALID”.

We don’t need to alter anything about the DynamoDB table or the SNS topic, but in addition to the extra code, the IAM permissions for the Lambda function must now allow us to write to the DynamoDB table in addition to reading from its update stream. If you added the DynamoDBFullAccess managed policy earlier when you did the SNS policy, you’re already good. If not, hop over to the IAM console and add that second managed policy now. (Also see the best practice note above on policy scoping if you’re putting this into production.)

The code changes only slightly to add the new DynamoDB writes:

var AWS = require('aws-sdk');
var sns = new AWS.SNS();
var dynamodb = new AWS.DynamoDB();
exports.handler = function(event, context) {processRecord(context, 0, event.Records);}

function processRecord(context, index, records) {
    if (index == records.length) {
        context.succeed("Processed " + records.length + " records.");
        return;
    }
    record = records[index];
    console.log("ID: " + record.eventID + "; Event: " + record.eventName);
    console.log('DynamoDB Record: %j', record.dynamodb);
    if (record.eventName != "INSERT") processRecord(context, index+1, records);
    else if (valid(record)) {
        var name = record.dynamodb.Keys.Name.S;
        var ssn = record.dynamodb.Keys.SocialSecurityNumber.S;
        dynamodb.putItem({
            "TableName":"TriggerDemo",
            "Item": {
                "Name":                 {"S": name},
                "SocialSecurityNumber": {"S": ssn},
                "SSN Format":           {"S": ssn.length == 9 ? "9" : "11"}
            }
        }, function(err, data){
            if (err) console.log(err, err.stack);
            processRecord(context, index+1, records);
        });
    }
    else {
        console.log('Invalid SSN # detected');
        var name = record.dynamodb.Keys.Name.S;
        console.log('name: ' + name);
        var ssn  = record.dynamodb.Keys.SocialSecurityNumber.S;
        console.log('ssn: ' + ssn);
        var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.';
        console.log('Message to send: ' + message);
        var params = {
            Message:  message,
            TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE'
        };
        sns.publish(params, function(err, data) {
            if (err) console.log(err, err.stack);
            else console.log('malformed SSN message sent successfully');
            dynamodb.putItem({
                "TableName":"TriggerDemo",
                "Item": {
                    "Name":                 {"S": name},
                    "SocialSecurityNumber": {"S": ssn},
                    "SSN Format":           {"S": "INVALID"}
                }
            }, function(err, data){
                if (err) console.log(err, err.stack);
                processRecord(context, index+1, records);
            });
        });
    }
}

// Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn.
function valid(record) {
    var SSN = record.dynamodb.Keys.SocialSecurityNumber.S;
    if (SSN.length != 9 && SSN.length != 11) return false;
    if (SSN.length == 9) {
        for (var indx in SSN) if (!isDigit(SSN[indx])) return false;
        return true;
    }
    else {
        return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) &&
               SSN[3] == '-' &&
               isDigit(SSN[4]) && isDigit(SSN[5]) &&
               SSN[6] == '-' &&
               isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]);
    }
}

function isDigit(c) {return c >= '0' && c <= '9';}

Now you can go to the DynamoDB console to add more rows to your table to watch your trigger both check and your entries and maintain a computed format column for them. Don’t forget to refresh the DynamoDB table browse view to see the updates!

(Now that the code updates rows in the original table, testing from the Lambda console will generate double notifications – the first one from the original test, and the second when the item is created for real. You could add an “istest” field to the sample event in the console test experience and a condition in the code to prevent this if you want to keep testing “offline” from the actual table.)

I chose to leave the original data unchanged in this example, but you could also use the trigger to transform the original values instead – for example, choosing the 11-digit format as the canonical one and then converting any 9-digit values into their 11-digit equivalents.

Summary

In this post we explored combining DynamoDB stream notifications with AWS Lambda functions to recreate conventional database triggers in a serverless, NoSQL architecture. We used a simple nodejs function to first audit and later transform rows in the table in order to find invalid social security numbers and to compute the format of the number in each entry. While we worked in JavaScript for this example, you could also use Java, Clojure, Scala, or other jvm-based languages to write your trigger. Our notification method of choice for the demo was an SNS-provided email, but text messages, web hooks, and SQS entries just require a different subscription.

Until next time, happy Lambda (and database trigger) coding!

-Tim

 
Follow Tim’s Lambda adventures on Twitter

Amazon S3 Adds Prefix and Suffix Filters for Lambda Function Triggering

Tim Wagner Tim Wagner, AWS Lambda General Manager

Today Amazon S3 added some great new features for event handling:

  • Prefix filters – Send events only for objects in a given path
  • Suffix filters – Send events only for certain types of objects (.png, for example)
  • Deletion events

You can see some images of the S3 console’s experience on the AWS Blog; here’s what it looks like in Lambda’s console: Amazon S3 Adds Deletion Event Types and Prefix and Suffix Filtering.

Let’s take a look at how these new features apply to Lambda event processing for S3 objects.

Processing Deletions

Previously, you could get S3 bucket notification events (aka “S3 triggers”) when objects were created but not when they were deleted. With the new event type, you can now use Lambda to automatically apply cleanup code when an object goes away, or to help keep metadata or indices up to date as S3 objects come and go.

You can request notification when an object is deleted by using the s3:ObjectRemoved:Delete event type. You can request notification when a delete marker is created for a versioned object by using s3:ObjectRemoved:DeleteMarkerCreated. You can also use a wildcard expression like s3:ObjectRemoved:* to request notification any time an object is deleted, regardless of whether it’s been versioned.

S3 events aren’t guaranteed to be sent in order, but S3 does offers help for keeping the event sequence organized: The S3 event structure includes a new field, sequencer, which is a hexadecimal value that establishes a relative ordering among PUTs and DELETEs for the same object. (The values can’t be meaningfully compared across different objects.) If you’re trying to keep another data structure, like an index, in sync this is critical information to save and compare, as a PUT followed by a DELETE is very different from a DELETE followed by a PUT. Check out the S3 event format for a detailed description of the event format and options.

Prefix Filters

Prefix filters can be used to pick the “directory” in which to send events. For example, let’s say you have two paths in use in your S3 bucket:

Incoming/
Thumbnails/

Your client app uploads images to the Incoming path and, for each image, you create a matching thumbnail in the Thumbnails path.

Previously, to use Lambda for this you’d need to send all events from the bucket to Lambda – including the thumbnail creation events. That has a couple of downsides: You have to send more events than you’d like, and you have to explicitly code your Lambda function to ignore the “extra” ones in order to avoid recursively creating thumbnails of thumbnails (of thumbnails of …)

Now, it’s easier: You set a prefix filter of “Incoming/”, and only the incoming images are sent to Lambda. The Lambda code also gets simpler because the recursion check is no longer needed, since you never write images to the path in your prefix filter. (Of course it doesn’t hurt to retain it for safety if you prefer.)

You can also use prefix filters on the object name (the “path” is just a string in reality). So if you name files like, “IMAGE001” and “DOC002” and you only want to send documents to Lambda, you can set a prefix of “DOC”.

Suffix Filters

Suffix filters work similarly, although they’re conventionally used to pick the “type” of file to send to Lambda by choosing a suffix, such as “.doc” or “.java”. You can combine prefix and suffix filters.

In general, you can’t have define overlapping prefix or suffix filters, as it would make the delivery ambiguous; the partitioning of events must be unique. The exception to this is that you can have overlapping prefix filters if the suffix filters disambiguate. If you want to send one event to multiple recipients, check out the post on S3 event fanout to see some suggested architectures for doing just that.

You can read all the details on Amazon S3 event notification settings in the docs, which also cover using the command line and programmatic/REST access.

Until next time, happy Lambda (and S3 event) coding!

-Tim

 
Follow Tim’s Lambda adventures on Twitter