Category: AWS Lambda


Techniques and Tools for Better Serverless API Logging with Amazon API Gateway and AWS Lambda


Ryan Green @ryangtweets
Software Development Engineer, API Gateway

Developing, testing, and operating Serverless APIs using Amazon API Gateway and AWS Lambda can be made much easier with built-in support for Amazon CloudWatch Logs.

In Lambda functions, you can use log statements to send log events to CloudWatch log streams, and API Gateway automatically submits log events for requests to APIs with logging enabled.

However, it can be difficult to reconcile log events for a serverless API sent across multiple CloudWatch log groups and log streams. Tracking down logs for a specific request or tailing request logs for that operation can sometimes be a cumbersome experience.

Here are some simple logging techniques and tools that can help greatly while developing, testing, and operating your serverless API.

Technique: Capture the request ID for a particular API request

The request ID for an individual API request is always returned by API Gateway in the " x-amzn-RequestId" response header. When API Gateway logs an individual API request, the request ID is always included in the first event in the request log:

"Starting execution for request: [REQUEST_ID]"

Logs for an API Gateway API are always sent to a log group in the following format:

"API-Gateway-Execution-Logs_[API_ID]/[STAGE_NAME]"

To troubleshoot an individual API request, search for the request ID in the CloudWatch Logs console, or using the Cloudwatch API or an AWS SDK (more on tooling later).

Technique: Correlate your API Gateway request IDs with Lambda request IDs

Individual API requests are tracked independently across AWS services. Thus, an individual request to your API actually generates at least two request identifiers ("request IDs") – one for the API Gateway request and one for the Lambda invocation request.

To reduce time-to-diagnosis when analyzing logs, it is helpful to correlate both request IDs together in the logs. Log the API Gateway request ID from your Lambda function and send the API Gateway request ID ($context.requestId) to your Lambda function via a mapping template:

{
   "context" : {
      "request-id" : "$context.requestId"
   }
   …
}

Then, in your Lambda function, log the API Gateway request ID along with the Lambda request ID. The Lambda request ID is automatically included in the log message.

exports.handler = function(event, context) {
    var apiRequestId = event.context['request-id'];
    var lambdaRequestId = context.awsRequestId;
    console.log("API Gateway Request ID: " + apiRequestId + " Lambda Request ID: " + context.awsRequestId);
    var logprefix = "APIG: " + apiRequestId + " -  ";
    console.log(logprefix + "hello world!");
    ...
}

Invoking this Lambda function produces these log messages in the "/aws/lambda/[FUNCTION_NAME]" log group:

/aws/lambda/echo                           2016/09/07/[$LATEST]6ccf17d298b64b5fac8c41b1a65e0831 2016-09-07T21:39:39.145Z        943ad105-7543-11e6-a9ac-65e093327849        API Gateway Request ID: 9439989f-7543-11e6-8dda-150c09a55dc2 Lambda Request ID: 943ad105-7543-11e6-a9ac-65e093327849

/aws/lambda/echo                           2016/09/07/[$LATEST]6ccf17d298b64b5fac8c41b1a65e0831 2016-09-07T21:39:39.145Z        943ad105-7543-11e6-a9ac-65e093327849        APIG: 9439989f-7543-11e6-8dda-150c09a55dc2 -   hello world!

Using this technique allows you to quickly locate the Lambda function logs for an individual API request. Search for the request ID returned in the "x-amzn-RequestId" header in the log group for the Lambda function (by default, named "/aws/lambda/[FUNCTION_NAME]").

Tooling

While the AWS SDK and CLI provide excellent building blocks for working with API Gateway, Lambda, and CloudWatch Logs, there is still room for specialized tooling for logs when developing, testing, and operating your serverless API.

To that end, I've created apilogs—a fork of the excellent awslogs project—to include native support for API Gateway/Lambda serverless APIs.

Given an API Gateway REST API ID and Stage name, this command-line tool produces an aggregated stream of time-ordered, ANSI-colored log events emitted by API Gateway and all Lambda functions attached to your API. It automatically aggregates events from all log streams for the API and Lambda functions.

For example:

Stream all log events emitted from API Gateway as well as from all Lambda functions attached to the API:

$ apilogs get --api-id xyz123 --stage prod –-watch

Search APIG/Lambda logs for events from a specific request ID in the past hour:

$ apilogs get --api-id xyz123 --stage prod --start='1h ago' | grep "6605b081-6f04-11e6-97ac-c34deb0b3dd9"

The log events can then be further filtered and processed by standard command-line tools. Credentials are passed to apilogs via the same mechanism as the AWS CLI.

Conclusion

I hope apilogs can become part of your standard dev, test, or ops workflows. Check out apilogs on Github. Feedback and contributions are always welcome!

Migrating a Native JAVA REST API to a Serverless Architecture with the Lambada Framework for AWS

This is a guest post by Çağatay Gürtürk, the creator of the Lambada framework

Serverless computing has become a hot topics since AWS Lambda and Amazon API Gateway started to offer an elegant way to build and deploy REST APIs without needing to maintain 24/7 running servers and infrastructure, with attractive pricing models.

Being the first language offered by Lambda, Node.JS seems to have the most online resources and tools but it is also possible to write Lambda functions natively with Java and Python. Java is especially interesting as a language because of its maturity, large community, and available codebase. With Lambda and Java, it is even possible to apply enterprise patterns and frameworks such as Spring, as well as all the best practices we used to apply in the Java world.

In order to make development for Lambda in Java easier, I started Lambada Framework as an open source project. It is a little, but powerful, open source project in beta stage that lets developers create a new serverless API in AWS infrastructure or migrate an existing one.

Lambada Framework accomplishes this target by implementing the most common JAX-RS annotations and providing a Maven plugin to deploy easily to the AWS cloud. Briefly, JAX-RS is a standard annotation set which can be used to map regular Java methods to HTTP paths and methods. For instance, you can look at the following method:

@GET
@Path("/helloworld/{id}")
public Response indexEndpoint(@PathParam int id) {
    return Response.status(200).entity("Hello world: " + id).build();
}

This is a very lean method marked with @GET and @Path annotations, which mean that this method is called when a GET request comes to URLs in "/helloworld/{id}" format, with theid parameter as an argument. Finally, it returns a Response object within this method with a 200 response code and text content. As you can see, these annotations offer a seamless way to define a REST API and map different resources to Java methods.

JAX-RS annotations on their own do not mean so much and they do not have any effect out-of-the-box. To make these annotations work, a JAX-RS implementation framework should be added to the project. This framework would scan all the JAX-RS annotations in the project and create a server and routing table to respond to HTTP requests correctly. While Jersey is one such reference implementation, and the most popular one, there are also other implementations of JAX-RS, such as RESTEasy and Apache CXF. You are free to choose any of them and your controller methods always stay same, thanks to standard annotations.

Lambada Framework is a JAX-RS implementation but different from the others: instead of running a web server, it scans the JAX-RS annotations at build time and populates Lambda functions and the API Gateway definitions using them.

This means that if you already marked your controller methods with JAX-RS annotations and used a framework like Jersey or RestEasy, you can easily switch to serverless architecture with very little modifications in your code. You would have to change only your build mechanism and replace your preferred JAX-RS implementation with Lambada Framework.

In the following example, you see how to deploy a very basic REST API to Lambda.

  1. First, clone the example project to your local directory:
    git clone https://github.com/lambadaframework/lambadaframework-boilerplate
  1. This project has a pom.xml file with some configuration options. You must change the deployment.bucket option; other changes are up to you. The Lambada Framework creates this bucket in your account if it does not exists and it uses that bucket during your project's lifetime. S3 bucket names are global and must be unique, so you must pick a name which is not taken by any one else.

  2. Make sure that the default AWS profile installed in your system has administrator privileges, or at least the following IAM policy:

     {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": [
                        "cloudformation:*",
                        "s3:*",
                        "lambda:*",
                        "execute-api:*",
                        "apigateway:*",
                        "iam:*",
                        "ec2:DescribeSecurityGroups",
                        "ec2:DescribeVpcs",
                        "ec2:DescribeSubnets"
                    ],
                    "Resource": [
                        "*"
                    ]
                }
            ]
        }
  1. Now you are all set. In the root directory of your project, fire the following command:

    mvn deploy

Your project compiles to a fat JAR with all its dependencies and is deployed to Lambda. After the JAR file is on the S3 bucket, Lambada scans that for supported JAX-RS annotations in your code and creates the necessary API Gateway endpoints. At the end of the process, the URL of your API is printed on the screen. You can navigate to this URL and explore your API using the AWS Management Console to see which resources and methods are created.

Lambada Framework is under development and support for missing JAX-RS annotations are being added. Follow the Lambada Framework GitHub page for the newest features and feel free to submit any issue or contributions.

Happy serverless computing!

Maintaining a Healthy Email Database with AWS Lambda, Amazon SNS, and Amazon DynamoDB

Carlos Sanchiz
Sr. Solutions Architect

Mike Deck
Partner Solutions Architect

Reputation in the email world is critical to achieve reasonable deliverability rates (the percentage of emails that arrive to inboxes); if you fall under certain levels, your emails end up in the spam folder or rejected by the email servers. To keep these numbers high, you have to constantly improve your email quality, but most importantly, you have to take action when a delivery fails or a recipient doesn't want to receive your email.

Back in 2012, we showed you how to automate the process of handling bounces and complaints with an scheduled task, using Amazon SNS, Amazon SQS, Amazon EC2, and some C# code. We have released many AWS services since then, so this post shows a different approach towards the same goal of a clean and healthy email database.

To set a little bit of context about bounces and complaints processing, I'm reusing some of the previous post:

Amazon SES assigns a unique message ID to each email that you successfully submit to send. When Amazon SES receives a bounce or complaint message from an ISP, we forward the feedback message to you. The format of bounce and complaint messages varies between ISPs, but Amazon SES interprets these messages and, if you choose to set up Amazon SNS topics for them, categorizes them into JSON objects.

Amazon SES will categorize your hard bounces into two types: permanent and transient. A permanent bounce indicates that you should never send to that recipient again. A transient bounce indicates that the recipient's ISP is not accepting messages for that particular recipient at that time and you can retry delivery in the future. The amount of time you should wait before resending to the address that generated the transient bounce depends on the transient bounce type. Certain transient bounces require manual intervention before the message can be delivered (e.g., message too large or content error). If the bounce type is undetermined, you should manually review the bounce and act accordingly.

A complaint indicates the recipient does not want the email that you sent them. When we receive a complaint, we want to remove the recipient addresses from our list.

In this post, we show you how to use AWS Lambda functions to receive SES notifications from the feedback loop from ISPs email servers via Amazon SNS and update an Amazon DynamoDB table with your email database.

Here is a high-level overview of the architecture:

Using the combination of Lambda, SNS and DynamoDB frees you from the operational overhead of having to run servers and maintain them. You focus on your application logic and AWS handles the undifferentiating heavy lifting behind the operations, scalability, and high availability.

Workflow

  1. Create the SNS topic to receive the SES bounces, deliveries and complaints.
  2. Create the DynamoDB table to use for our email database.
  3. Create the Lambda function to process the bounces, deliveries and complaints and subscribe it to the SNS topic
  4. Test & start emailing!

Create an SNS topic

First, create an SNS topic named "ses-notifications". You subscribe your Lambda function to the topic later.

Create a DynamoDB table

Create a simple DynamoDB table called "mailing" to store the email database. Use the UserId (email address) as the partition key.

Create the Lambda function

Set up your Lambda function that will process all the notifications coming from SES through your SNS topic.

Note: This post uses Node.js 4.3 as the Lambda runtime but at the time of publication, you can also use Python 2.7, Java 8 or Node.js 0.10.

For the Lambda function code, I used the recently published blueprint (ses-notification-nodejs) and adapted it to work with the DynamoDB table. The following code has the modifications highlighted:

'use strict';
console.log('Loading function');

let doc = require('dynamodb-doc');
let dynamo = new doc.DynamoDB();
let tableName = 'mailing';

exports.handler = (event, context, callback) => {
    //console.log('Received event:', JSON.stringify(event, null, 2));
    const message = JSON.parse(event.Records[0].Sns.Message);

    switch(message.notificationType) {
        case "Bounce":
            handleBounce(message);
            break;
        case "Complaint":
            handleComplaint(message);
            break;
        case "Delivery":
            handleDelivery(message);
            break;
        default:
            callback("Unknown notification type: " + message.notificationType);
    }
};

function handleBounce(message) {
    const messageId = message.mail.messageId;
    const addresses = message.bounce.bouncedRecipients.map(function(recipient){
        return recipient.emailAddress;
    });
    const bounceType = message.bounce.bounceType;

    console.log("Message " + messageId + " bounced when sending to " + addresses.join(", ") + ". Bounce type: " + bounceType);

    for (var i=0; i<addresses.length; i++){
        writeDDB(addresses[i], message, tableName, "disable");
    }
}

function handleComplaint(message) {
    const messageId = message.mail.messageId;
    const addresses = message.complaint.complainedRecipients.map(function(recipient){
        return recipient.emailAddress;
    });

    console.log("A complaint was reported by " + addresses.join(", ") + " for message " + messageId + ".");

    for (var i=0; i<addresses.length; i++){
        writeDDB(addresses[i], message, tableName, "disable");
    }
}

function handleDelivery(message) {
    const messageId = message.mail.messageId;
    const deliveryTimestamp = message.delivery.timestamp;
    const addresses = message.delivery.recipients;

    console.log("Message " + messageId + " was delivered successfully at " + deliveryTimestamp + ".");

    for (var i=0; i<addresses.length; i++){
        writeDDB(addresses[i], message, tableName, "enable");
    }
}

function writeDDB(id, payload, tableName, status) {
    const item = {
            UserId: id,
            notificationType: payload.notificationType,
            from: payload.mail.source,
            timestamp: payload.mail.timestamp,
            state: status
        };
    const params = {
            TableName:tableName,
            Item: item
        };
    dynamo.putItem(params,function(err,data){
            if (err) console.log(err);
            else console.log(data);
    });
}

Assign the function a role with execute and DynamoDB permissions so it can run and update the DynamoDB table accordingly and use index.handler as the function Handler.

This is the lambda_dynamo IAM role policy to use for the three functions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1428341300017",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:UpdateItem"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNT-ID:table/mailing"
        },
        {
            "Sid": "",
            "Resource": "*",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Effect": "Allow"
        }
    ]
}

You also set the corresponding SNS topic as an event source so that the Lambda function is triggered when notifications arrive.

Test the Lambda function

After everything is in place, it's time to test. Publish notifications to your SNS topics using Amazon SNS notification examples for Amazon SES, and see how your DynamoDB table is updated by your Lambda functions.

Here's an example of publishing a complaint notification to the ses-complaints-topic SNS topic using the CLI:

$ aws sns publish --topic-arn "arn:aws:sns:us-east-1:xxxxxxxxxxx:ses-notifications" --message file://message_complaints.txt --subject Test --region us-east-1

{
    "MessageId": "f7f5ad2d-a268-548d-a45c-e28e7624a64d"
}
$ cat message_complaints.txt

{
      "notificationType":"Complaint",
      "complaint":{
         "userAgent":"Comcast Feedback Loop (V0.01)",
         "complainedRecipients":[
            {
               "emailAddress":"recipient1@example.com"
            }
         ],
         "complaintFeedbackType":"abuse",
         "arrivalDate":"2009-12-03T04:24:21.000-05:00",
         "timestamp":"2012-05-25T14:59:38.623-07:00",
         "feedbackId":"000001378603177f-18c07c78-fa81-4a58-9dd1-fedc3cb8f49a-000000"
      },
      "mail":{
         "timestamp":"2012-05-25T14:59:38.623-07:00",
     "messageId":"000001378603177f-7a5433e7-8edb-42ae-af10-f0181f34d6ee-000000",
         "source":"email_1337983178623@amazon.com",
         "sourceArn": "arn:aws:sns:us-east-1:XXXXXXXXXXXX:ses-notifications",
         "sendingAccountId":"XXXXXXXXXXXX",
         "destination":[
            "recipient1@example.com",
            "recipient2@example.com",
            "recipient3@example.com",
            "recipient4@example.com"
         ]
      }
   }

And here is what you'd start seeing coming in your DynamoDB items list:

After you are done with your tests, you can point your SES notifications to the SNS topic you created and start sending emails.

Conclusion

In this post, we showed how you can use AWS Lambda, Amazon SNS, and Amazon DynamoDB to keep a healthy email database, have a good email sending score, and of course, do all of it without servers to maintain or scale.

While you're at it, why not expose your DynamoDB table with your email database using Amazon API Gateway? For more information, see Using Amazon API Gateway as a proxy for DynamoDB.

If you have questions or suggestions, please comment below.

A Data Sharing Platform Based on AWS Lambda

Julien Lepine

Julien Lepine
Solutions Architect

As developers, one of our top priorities is to build reliable systems; this is a core pillar of the AWS Well Architected Framework. A common pattern to fulfill this goal is to have an architecture built around loosely coupled components.

Amazon Kinesis Streams offers an excellent answer for this, as the events generated can be consumed independently by multiple consumers and remain available for 1 to 7 days. Building an Amazon Kinesis consumer application is done by leveraging the Amazon Kinesis Client Library (KCL) or native integration with AWS Lambda.

As I was speaking with other developers and customers about their use of Amazon Kinesis, there are a few patterns that came up. This post addresses those common patterns.

Protecting streams

Amazon Kinesis has made the implementation of event buses easy and inexpensive, so that applications can send meaningful information to their surrounding ecosystem. As your applications grow and get more usage within your company, more teams will want to consume the data generated, even probably external parties such as business partners or customers.

When the applications get more usage, some concerns may arise:

  • When a new consumer starts (or re-starts after some maintenance), it needs to read a lot of data from the stream (its backlog) in a short amount of time in order to get up to speed
  • A customer may start many consumers at the same time, reading a lot of events in parallel or having a high call rate to Amazon Kinesis
  • A consumer may have an issue (such as infinite loop, retry error) that causes it to call Amazon Kinesis at an extremely high rate

These cases may lead to a depletion of the resources available in your stream, and that could potentially impact all your consumers.

Managing the increased load can be done by leveraging the scale-out model of Amazon Kinesis through the addition of shards to an existing stream. Each shard adds both input (ingestion) and output (consumption) capacity to your stream:

  • 1000 write records and up to 1 megabyte per second for ingesting events
  • 5 read transactions and up to 2 megabytes per second for consuming events

Avoiding these scenarios could be done by scaling-out your streams, and provisioning for peak, but that would create inefficiencies and may not even fully protect your consumers from the behavior of others.

What becomes apparent in these cases is the impact that a single failing consumer may have on all other consumers, a symptom described as the “noisy neighbor”, or managing the blast radius of your system. The key point is to limit the impact that a single consumer can have on others.

A solution is to compartmentalize your platform: this method consists of creating multiple streams and then creating groups of consumers that share the same stream. This gives you the possibility to limit the impact a single consumer can have on its neighbors, and potentially to propose a model where some customers have a dedicated stream.

You can build an Amazon Kinesis consumer application (via the KCL or Lambda) that reads a source stream and sends the messages to the “contained” streams that the actual consumers will use.

Transforming streams

Another use case I see from customers is the need to transfer the data in their stream to other services:

  • Some applications may have limitations in their ability to receive or process the events
  • They may not have connectors to Amazon Kinesis, and only support Amazon SQS
  • They may only support a push model, where their APIs need to be called directly when a message arrives
  • Some analytics/caching/search may be needed on the events generated
  • Data may need to be archived or sent to a data warehouse engine

There are many other cases, but the core need is having the ability to get the data from Amazon Kinesis into other platforms.

The solution for these use cases is to build an Amazon Kinesis consumer application that reads a stream and prepares these messages for other services.

Sharing data with external parties

The final request I have seen is the possibility to process a stream from a different AWS account or region. While you can give access to your resources to an external AWS account through cross-account IAM roles, that feature requires development and is not supported natively by some services. For example, you cannot subscribe a Lambda function to a stream in a different AWS account or region.

The solution is to replicate the Amazon Kinesis stream or the events to another environment (AWS account, region, or service).

This can be done one time through an Amazon Kinesis consumer application that reads a stream and forwards the events to the remote environment.

Solution: A Lambda-based fan-out function

These three major needs have a common solution: the deployment of an Amazon Kinesis consumer application that listens to a stream and is able to send messages to other instances of Amazon Kinesis, services, or environments (AWS accounts or regions).

In the aws-lambda-fanout GitHub repository, you’ll find a Lambda function that specifically supports this scenario. This function is made to forward incoming messages from Amazon Kinesis or DynamoDB Streams.

The architecture of the function is made to be simple and extensible, with one core file fanout.js that loads modules for the different providers. The currently supported providers are as follows:

  • Amazon SNS
  • Amazon SQS
  • Amazon Elasticsearch Service
  • Amazon Kinesis Streams
  • Amazon Kinesis Firehose
  • AWS IoT
  • AWS Lambda
  • Amazon ElastiCache for Memcached
  • Amazon ElastiCache for Redis

The function is built to support multiple inputs:

  • Amazon Kinesis streams
  • Amazon Kinesis streams containing Amazon Kinesis Producer Library (KPL) records
  • DynamoDB Streams records

It relies on Lambda for a fully-managed environment where scaling, logging, and monitoring are automated by the platform. It also supports Lambda functions in a VPC for Amazon ElastiCache.

The configuration is stored in a DynamoDB table, and associates the output configuration with each function. This table has a simple schema:

  • sourceArn (Partition Key): The Amazon Resource Name (ARN) of the input Amazon Kinesis stream
  • id [String]: The name of the mapping
  • type [String]: The destination type
  • destination [String]: The ARN or name of the destination
  • active [Boolean]: Whether that mapping is active

Depending on the target, some other properties are also stored.

The function can also group records together for services that don’t initially support it, such as Amazon SQS, Amazon SNS, or AWS IoT. Amazon DynamoDB Streams records can also be transformed to plain JSON objects to simplify management in later stages. The function comes with a Bash-based command line Interface to make the deployment and management easier.

As an example, the following lines deploy the function, which registers a mapping from one stream (inputStream) to another (outputStream).

./fanout deploy --function fanout

./fanout register kinesis --function fanout --source-type kinesis --source inputStream --id target1 --destination outputStream --active true

./fanout hook --function fanout --source-type kinesis --source inputStream

Summary

There are many options available for you to forward your events from one service or environment to another. For more information about this topic, see Using AWS Lambda with Amazon Kinesis. Happy eventing!

If you have questions or suggestions, please comment below.

Implementing a Serverless AWS IoT Backend with AWS Lambda and Amazon DynamoDB

Ed Lima

Ed Lima
Cloud Support Engineer

Does your IoT device fleet scale to hundreds or thousands of devices? Do you find it somewhat challenging to retrieve the details for multiple devices? AWS IoT provides a platform to connect those devices and build a scalable solution for your Internet of Things workloads.

Out of the box, the AWS IoT console gives you your own searchable device registry with access to the device state and information about device shadows. You can enhance and customize the service using AWS Lambda and Amazon DynamoDB to build a serverless backend with a customizable device database that can be used to store useful information about the devices as well as helping to track what devices are activated with an activation code, if required.

You can use DynamoDB to extend the AWS IoT internal device registry to help manage the device fleet, as well as storing specific additional data about each device. Lambda provides the link between AWS IoT and DynamoDB allowing you to add, update, and query your new device database backend.

In this post, you learn how to use AWS IoT rules to trigger specific device registration logic using Lamba in order to populate a DynamoDB table. You then use a second Lambda function to search the database for a specific device serial number and a randomly generated activation code to activate the device and register the email of the device owner in the same table. After you’re done, you’ll have a fully functional serverless IoT backend, allowing you to focus on your own IoT solution and logic instead of managing the infrastructure to do so.

Prerequisites

You must have the following before you can create and deploy this framework:

  • An AWS account
  • An IAM user with permissions to create AWS resources (AWS IoT things and rules, Lambda functions, DynamoDB tables, IAM policies and roles, etc.)
  • JS and the AWS SDK for JavaScript installed locally to test the deployment

Building a backend

In this post, I assume that you have some basic knowledge about the services involved. If not, you can review the documentation:

For this use case, imagine that you have a fleet of devices called “myThing”. These devices can be anything: a smart lightbulb, smart hub, Internet-connected robot, music player, smart thermostat, or anything with specific sensors that can be managed using AWS IoT.

When you create a myThing device, there is some specific information that you want to be available in your database, namely:

  • Client ID
  • Serial number
  • Activation code
  • Activation status
  • Device name
  • Device type
  • Owner email
  • AWS IoT endpoint

The following is a sample payload with details of a single myThing device to be sent to a specific MQTT topic, which triggers an IoT rule. The data is in a format that AWS IoT can understand, good old JSON. For example:

{
  "clientId": "ID-91B2F06B3F05",
  "serialNumber": "SN-D7F3C8947867",
  "activationCode": "AC-9BE75CD0F1543D44C9AB",
  "activated": "false",
  "device": "myThing1",
  "type": "MySmartIoTDevice",
  "email": "not@registered.yet",
  "endpoint": "<endpoint prefix>.iot.<region>.amazonaws.com"
}

The rule then invokes the first Lambda function, which you create now. Open the Lambda console, choose Create a Lambda function , and follow the steps. Here’s the code:

console.log('Loading function');
var AWS = require('aws-sdk');
var dynamo = new AWS.DynamoDB.DocumentClient();
var table = "iotCatalog";

exports.handler = function(event, context) {
    //console.log('Received event:', JSON.stringify(event, null, 2));
   var params = {
    TableName:table,
    Item:{
        "serialNumber": event.serialNumber,
        "clientId": event.clientId,
        "device": event.device,
        "endpoint": event.endpoint,
        "type": event.type,
        "certificateId": event.certificateId,
        "activationCode": event.activationCode,
        "activated": event.activated,
        "email": event.email
        }
    };

    console.log("Adding a new IoT device...");
    dynamo.put(params, function(err, data) {
        if (err) {
            console.error("Unable to add device. Error JSON:", JSON.stringify(err, null, 2));
            context.fail();
        } else {
            console.log("Added device:", JSON.stringify(data, null, 2));
            context.succeed();
        }
    });
}

The function adds an item to a DynamoDB database called iotCatalog based on events like the JSON data provided earlier. You now need to create the database as well as making sure the Lambda function has permissions to add items to the DynamoDB table, by configuring it with the appropriate execution role.

Open the DynamoDB console, choose Create table and follow the steps. For this table, use the following details.

The serial number uniquely identifies your device; if, for instance, it is a smart hub that has different client devices connecting to it, use the client ID as the sort key.

The backend is good to go! You just need to make the new resources work together; for that, you configure an IoT rule to do so.

On the AWS IoT console, choose Create a resource and Create a rule , and use the following settings to point the rule to your newly-created Lambda function, also called iotCatalog.

After creating the rule, AWS IoT adds permissions on the background to allow it to trigger the Lambda function whenever a message is published to the MQTT topic called registration. You can use the following Node.js deployment code to test:

var AWS = require('aws-sdk');
AWS.config.region = 'ap-northeast-1';

var crypto = require('crypto');
var endpoint = "<endpoint prefix>.iot.<region>.amazonaws.com";
var iot = new AWS.Iot();
var iotdata = new AWS.IotData({endpoint: endpoint});
var topic = "registration";
var type = "MySmartIoTDevice"

//Create 50 AWS IoT Things
for(var i = 1; i < 51; i++) {
  var serialNumber = "SN-"+crypto.randomBytes(Math.ceil(12/2)).toString('hex').slice(0,15).toUpperCase();
  var clientId = "ID-"+crypto.randomBytes(Math.ceil(12/2)).toString('hex').slice(0,12).toUpperCase();
  var activationCode = "AC-"+crypto.randomBytes(Math.ceil(20/2)).toString('hex').slice(0,20).toUpperCase();
  var thing = "myThing"+i.toString();
  var thingParams = {
    thingName: thing
  };
  
  iot.createThing(thingParams).on('success', function(response) {
    //Thing Created!
  }).on('error', function(response) {
    console.log(response);
  }).send();

  //Publish JSON to Registration Topic

  var registrationData = '{\n \"serialNumber\": \"'+serialNumber+'\",\n \"clientId\": \"'+clientId+'\",\n \"device\": \"'+thing+'\",\n \"endpoint\": \"'+endpoint+'\",\n\"type\": \"'+type+'\",\n \"activationCode\": \"'+activationCode+'\",\n \"activated\": \"false\",\n \"email\": \"not@registered.yet\" \n}';

  var registrationParams = {
    topic: topic,
    payload: registrationData,
    qos: 0
  };

  iotdata.publish(registrationParams, function(err, data) {
    if (err) console.log(err, err.stack); // an error occurred
    // else Published Successfully!
  });
  setTimeout(function(){},50);
}

//Checking all devices were created

iot.listThings().on('success', function(response) {
  var things = response.data.things;
  var myThings = [];
  for(var i = 0; i < things.length; i++) {
    if (things[i].thingName.includes("myThing")){
      myThings[i]=things[i].thingName;
    }
  }

  if (myThings.length = 50){
    console.log("myThing1 to 50 created and registered!");
  }
}).on('error', function(response) {
  console.log(response);
}).send();

console.log("Registration data on the way to Lambda and DynamoDB");

The code above creates 50 IoT things in AWS IoT and generate random client IDs, serial numbers, and activation codes for each device. It then publishes the device data as a JSON payload to the IoT topic accordingly, which in turn triggers the Lambda function:

And here it is! The function was triggered successfully by your IoT rule and created your database of IoT devices with all the custom information you need. You can query the database to find your things and any other details related to them.

In the AWS IoT console, the newly-created things are also available in the thing registry.

Now you can create certificates, policies, attach them to each “myThing” AWS IoT Thing then install each certificate as you provision the physical devices.

Activation and registration logic

However, you’re not done yet…. What if you want to activate a device in the field with the pre-generated activation code as well as register the email details of whoever activated the device?

You need a second Lambda function for that, with the same execution role from the first function (Basic with DynamoDB). Here’s the code:

console.log('Loading function');

var AWS = require('aws-sdk');
var dynamo = new AWS.DynamoDB.DocumentClient();
var table = "iotCatalog";

exports.handler = function(event, context) {
    //console.log('Received event:', JSON.stringify(event, null, 2));

   var params = {
    TableName:table,
    Key:{
        "serialNumber": event.serialNumber,
        "clientId": event.clientId,
        }
    };

    console.log("Gettings IoT device details...");
    dynamo.get(params, function(err, data) {
    if (err) {
        console.error("Unable to get device details. Error JSON:", JSON.stringify(err, null, 2));
        context.fail();
    } else {
        console.log("Device data:", JSON.stringify(data, null, 2));
        console.log(data.Item.activationCode);
        if (data.Item.activationCode == event.activationCode){
            console.log("Valid Activation Code! Proceed to register owner e-mail and update activation status");
            var params = {
                TableName:table,
                Key:{
                    "serialNumber": event.serialNumber,
                    "clientId": event.clientId,
                },
                UpdateExpression: "set email = :val1, activated = :val2",
                ExpressionAttributeValues:{
                    ":val1": event.email,
                    ":val2": "true"
                },
                ReturnValues:"UPDATED\_NEW"
            };
            dynamo.update(params, function(err, data) {
                if (err) {
                    console.error("Unable to update item. Error JSON:", JSON.stringify(err, null, 2));
                    context.fail();
                } else {
                    console.log("Device now active!", JSON.stringify(data, null, 2));
                    context.succeed("Device now active! Your e-mail is now registered as device owner, thank you for activating your Smart IoT Device!");
                }
            });
        } else {
            context.fail("Activation Code Invalid");
        }
    }
});
}

The function needs just a small subset of the data used earlier:

{
  "clientId": "ID-91B2F06B3F05",
  "serialNumber": "SN-D7F3C8947867",
  "activationCode": "AC-9BE75CD0F1543D44C9AB",
  "email": "verified@registered.iot"
}

Lambda uses the hash and range keys (serialNumber and clientId) to query the database and compare the database current pre-generated activation code to a code that is supplied by the device owner along with their email address. If the activation code matches the one from the database, the activation status and email details are updated in DynamoDB accordingly. If not, the user gets an error message stating that the code is invalid.

You can turn it into an API with Amazon API Gateway. In order to do so, go to the Lambda function and add an API endpoint, as follows.

Now test the access to the newly-created API endpoint, using a tool such as Postman.

If an invalid code is provided, the requester gets an error message accordingly.

Back in the database, you can confirm the record was updated as required.

Cleanup

After you finish the tutorial, delete all the newly created resources (IoT things, Lambda functions, and DynamoDB table). Alternatively, you can keep the Lambda function code for future reference, as you won’t incur charges unless the functions are invoked.

Conclusion

As you can see, by leveraging the power of the AWS IoT Rules Engine, you can take advantage of the seamless integration with AWS Lambda to create a flexible and scalable IoT backend powered by Amazon DynamoDB that can be used to manage your growing Internet of Things fleet.

You can also configure an activation API to make use of the newly-created backend and activate devices as well as register email contact details from the device owner; this information could be used to get in touch with your users regarding marketing campaigns or newsletters about new products or new versions of your IoT products.

If you have questions or suggestions, please comment below.

Redirection in a Serverless API with AWS Lambda and Amazon API Gateway

Ronald Widha

Ronald Widha @ronaldwidha
Partner Solutions Architect

Redirection is neither a success nor an error response. You return redirection when the requested resource resides either temporarily or permanently under a different URI. The client needs to issue subsequent calls to the new location in order to retrieve the requested resource. Even though you typically see 302 and 301 redirects when requesting text/html, these response types also apply to REST JSON endpoints.

Many of you are already familiar how to return success and error responses. Amazon API Gateway and AWS Lambda help development teams to build scalable web endpoints very quickly. In a previous post (Error Handling Patterns in Amazon API Gateway and AWS Lambda), we discussed several error handling patterns to implement an explicit contract for the types of error responses that an API can produce and how they translate into HTTP status codes. However, so far this blog has not discussed how to handle redirection.

This post shows the recommended patterns for handling redirection in your serverless API built on API Gateway and Lambda.

Routing Lambda errors to API Gateway HTTP 30x responses

In HTTP, there are several types of redirection codes. You return these status codes as part of an HTTP response to the client.

Type HTTP Status Code Description
Multiple types available 300 The requested resource is available in multiple representations with its own specific location (not commonly used).
Moved permanently 301 The requested resource has been moved a new permanent URI.
Found 302 The requested resource is temporarily available under a different URI.

For more information about HTTP server status codes, see RFC2616 section 10.5 on the W3C website.

In this specific scenario, you would like your API Gateway method to return the following HTTP response. Note that there are two key parameters to return:

  1. The status code, i.e., 301 or 302
  2. The new URI of the resource
HTTP/1.1 302 Found
Content-Type: text/html
Content-Length: 503
Connection: keep-alive
Date: Fri, 08 Jul 2016 16:16:44 GMT
Location: http://www.amazon.com/

In API Gateway, AWS recommends that you model the various HTTP response types that your API method may produce, and define a mapping from the return value of your Lambda function to these HTTP responses. To do that, you need to do two things:

  1. API Gateway can only map responses to different method responses on error. So even though redirection isn’t strictly a failure, you still throw an exception from Lambda to communicate back to API Gateway when it needs to issue a 30x response.
  2. Unlike HTTP 200 Success or any of the HTTP Error status codes, a redirection requires two values: the status code and the location. API Gateway do not support VTL for JSON serialization in the header mappings; thus, in this case, you need to take advantage of how the Lambda runtime handles exceptions to pass these values in two separate fields back to API Gateway.

Node.JS: Using the exception name to store the redirection URI

API Gateway > Lambda high level diagram

The mapping from a Lambda function error to an API Gateway method response is defined by an integration response, which defines a selection pattern used to match the Lambda function errorMessage and routes it to an associated method response. In this example, you use the prefix-based error handling pattern:

API Gateway Settings Value
Integration Response: Lambda Error Regex ^HandlerDemo.ResponseFound.*
Method Response: HTTP Status 302

Because you don’t have access to VTL in the API Gateway header mappings, retrieve your redirection URI from errorType.

API Gateway Settings Value
Integration Response: Header Mappings Location: integration.response.body.errorType

In the Lambda function Node.JS runtime, errorType can be assigned to any value. In this case, use it to store the redirection URI. In the handler.js, you have the following:

// Returns 302 or 301
var err = new Error("HandlerDemo.ResponseFound Redirection: Resource found elsewhere");
err.name = "http://a-different-uri";
context.done(err, {});

The same technique applies to 301 permanent redirects except you replace the API Gateway regex above to detect HandlerDemo.MovedPermanently.

Node.JS: Handling other response types

In order to keep it consistent, you can leverage the same pattern for other error codes. You can leverage errorType for any end user–visible messages.

//404 error
var err = new Error(“HandlerDemo.ResponseNotFound: Resource not found.”);
err.name = “Sorry, we can't find what you're looking for. Are you sure the address is correct?";
context.done(err, {});

Just as an example, upon failure, you return a “Fail Whale” HTML webpage to the user.

API Gateway Settings Value
Integration Response: Lambda Error Regex ^HandlerDemo.ResponseNotFound.*
Method Response: HTTP Status 404
Integration Response: Content-Type (Body Mapping Templates) text/html
Integration Response: Template <html>

   <img src=”fail-whale.gif” />

   $input.path(‘$.errorType’)

</html>

HTTP 200 Success can be handled as you normally would in your handler.js:

//200 OK
context.done(null, response);

Java: Using the Java inner exception to store the redirection URI

API Gateway > Lambda high level diagram

Because Java is a strongly typed language, the errorType Lambda function return value contains the Java exception type which you cannot override (nor should you). Thus, you will be using a different property to retrieve the redirection URI:

API Gateway Settings Value
Method Response: HTTP Status 302
Integration Response: Header Mappings Location: integration.response.body.cause.errorMessage

The cause.errorMessage parameter is accessible in the Lambda function Java runtime as an inner exception.

throw new Exception(new ResponseFound("http://www.amazon.com"));

ResponseFound is a class that extends Throwable.

package HandlerDemo;

public class ResponseFound extends Throwable {
  public ResponseFound(String uri) { super(uri); }
}

The full response from Lambda received by API Gateway is the following:

{
  "errorMessage": "HandlerDemo.ResponseFound: http://www.amazon.com",
  "errorType": "java.lang.Exception",
  "stackTrace": [ ... ],
  "cause": {
    "errorMessage": "http://www.amazon.com",
    "errorType": "HandlerDemo.ResponseFound",
    "stackTrace": [ ... ]
  }
}

Because Lambda serializes the inner exception type to the outer errorMessage. you can still use the same Lambda error regex in API Gateway to detect a redirect from errorMessage.

API Gateway Settings Value
Integration Response: Lambda Error Regex ^HandlerDemo.ResponseFound.*

Conclusion

In this post, I showed how to emit different HTTP responses, including a redirect response, on Amazon API Gateway and AWS Lambda for Node.JS and Java runtimes. I encourage developers to wrap these patterns into a helper class where it handles the inner working of returning a success, error, or redirection response.

One thing you may notice missing is an example of handling redirection in Python. At the time of writing, the Chalice community is considering support for this use case as part of the framework.

If you have questions or suggestions, please comment below.

Powering Secondary DNS in a VPC using AWS Lambda and Amazon Route 53 Private Hosted Zones

Mark Statham, Senior Cloud Architect

When you implement hybrid connectivity between existing on-premises environments and AWS, there are a number of approaches to provide DNS resolution of both on-premises and VPC resources. In a hybrid scenario, you likely require resolution of on-premises resources, AWS services deployed in VPCs, AWS service endpoints, and your own resources created in your VPCs.

You can leverage Amazon Route 53 private hosted zones to provide private DNS zones for your VPC resources and dynamically register resources, as shown in a previous post, Building a Dynamic DNS for Route 53 using CloudWatch Events and Lambda.

Ultimately, this complex DNS resolution scenario requires that you deploy and manage additional DNS infrastructure, running on EC2 resources, into your VPC to handle DNS requests either from VPCs or on-premises. Whilst this is a familiar approach it adds additional cost and operational complexity, where a solution using AWS managed services can be used instead.

In this post, we explore how you can use leverage Route 53 private hosted zones with AWS Lambda and Amazon CloudWatch Events to mirror on-premises DNS zones which can then be natively resolved from within your VPCs, without the need for additional DNS forwarding resources.

Route 53 private hosted zones

Route 53 offers the convenience of domain name services without having to build a globally distributed highly reliable DNS infrastructure. It allows instances within your VPC to resolve the names of resources that run within your AWS environment. It also lets clients on the Internet resolve names of your public-facing resources. This is accomplished by querying resource record sets that reside within a Route 53 public or private hosted zone.

A private hosted zone is basically a container that holds information about how you want to route traffic for a domain and its subdomains within one or more VPCs and is only resolvable from the VPCs you specify; whereas a public hosted zone is a container that holds information about how you want to route traffic from the Internet.

Route 53 has a programmable API that can be used to automate the creation/removal of records sets which we’re going leverage later in this post.

Using Lambda with VPC support and scheduled events

AWS Lambda is a compute service where you can upload your code and the service runs the code on your behalf using AWS infrastructure. You can create a Lambda function and execute it on a regular schedule. You can specify a fixed rate (for example, execute a Lambda function every hour or 15 minutes), or you can specify a cron expression. This functionality is underpinned by CloudWatch Events.

Lambda runs your function code securely within a VPC by default. However, to enable your Lambda function to access resources inside your private VPC, you must provide additional VPC-specific configuration information that includes VPC subnet IDs and security group IDs. Lambda uses this information to set up elastic network interfaces (ENIs) that enable your function to connect securely to other resources within your private VPC or reach back into your own network via AWS Direct Connect or VPN.

Each ENI is assigned a private IP address from the IP address range within the subnets that you specify, but is not assigned any public IP addresses. You cannot use an Internet gateway attached to your VPC, as that requires the ENI to have public IP addresses. Therefore, if your Lambda function requires Internet access, for example to access AWS APIs, you can use the Amazon VPC NAT gateway. Alternatively, you can leverage a proxy server to handle HTTPS calls, such as those used by the AWS SDK or CLI.

Building an example system

When you combine the power of Route 53 private hosted zones and Lambda, you can create a system that closely mimics the behavior of a stealth DNS to provide resolution of on-premises domains via VPC DNS.

For example, it is possible to schedule a Lambda function that executes every 15 minutes to perform a zone transfer from an on-premises DNS server, using a full zone transfer query (AXFR). The function can check the retrieved zone for differences from a previous version. Changes can then be populated into a Route 53 private hosted zone, which is only resolvable from within your VPCs, effectively mirroring the on-premises master to Route 53.

This then allows your resources deployed in VPC to use just VPC DNS to resolve on-premises, VPC and Internet resources records without the need for any additional forwarding infrastructure to on-premises DNS.

The following example is based on python code running as a Lambda function, invoked using CloudWatch Events with constant text to provide customizable parameters to support the mirroring of multiple zones for both forward and reverse domains.

Prerequisites for the example

Before you get started, make sure you have all the prerequisites in place including installing the AWS CLI and creating a VPC.

  • Region

    Check that the region where your VPC is deployed has the Lambda and CloudWatch Events services available.

  • AWS Command Line Interface (AWS CLI)

    This example makes use of the AWS CLI; however, all actions can be performed via the AWS console as well. Make sure you have the latest version installed, which provides support for creating Lambda functions in a VPC and the required IAM permissions to create resources required. For more information, see Getting Set Up with the AWS Command Line Interface.

  • VPC

    For this example, create or use a VPC configured with at least two private subnets in different Availability Zones and connectivity to the source DNS server. If you are building a new VPC, see Scenario 4: VPC with a Private Subnet Only and Hardware VPN Access.

    Ensure that the VPC has the DNS resolution and DNS hostnames options set to yes, and that you have both connectivity to your source DNS server and the ability to access the AWS APIs. You can create an AWS managed NAT gateway to provide Internet access to AWS APIs or as an alternative leverage a proxy server.

    You may wish to consider creating subnets specifically for your Lambda function, allowing you to restrict the IP address ranges that need access to the source DNS server and configure network access controls accordingly.

    After the subnets are created, take note of them as you’ll need them later to set up the Lambda function: they are in the format subnet-ab12cd34. You also need a security group to assign to the Lambda function; this can be the default security group for the VPC or one you create with limited outbound access to your source DNS: the format is sg-ab12cd34.

  • DNS server

    You need to make sure that you modify DNS zone transfer settings so that your DNS server accepts AXFR queries from the Lambda function. Also, ensure that security groups or firewall policies allow connection via TCP port 53 from the VPC subnet IP ranges created above.

Setting up the example Lambda function

Before you get started, it’s important to understand how the Lambda function works and interacts with the other AWS services and your network resources:

  1. The Lambda function is invoked by CloudWatch Events and configured based on a JSON string passed to the function. This sets a number of parameters, including the DNS domain, source DNS server, and Route 53 zone ID. This allows a single Lambda function to be reused for multiple zones.
  2. A new ENI is created in your VPC subnets and attached to the Lambda function; this allows your function to access your internal network resources based on the security group that you defined.
  3. The Lambda function then transfers the source DNS zone from the IP specified in the JSON parameters. You need to ensure that your DNS server is configured to allow full zone transfers and allow AXFR queries to your DNS server, which happens over TCP port 53.
  4. The Route 53 DNS zone is retrieved via API.
  5. The two zone files are compared; the resulting differences are returned as a set of actions to be performed against Route 53.
  6. Updates to the Route 53 zone are made via API and, finally, the SOA is updated to match the source version.

You’re now ready to set up the example using the following instructions.

Step 1 – Create a Route 53 hosted zone

Before you create the Lambda function, there needs to be a target Route 53 hosted zone to mirror the DNS zone records into. This can either be a public or private zone; however, for the purposes of this example, you will create a private hosted zone that only responds to queries from the VPC you specify.

To create a Route 53 private hosted zone associated with your VPC, provide the region and VPC ID as part of the following command:

aws route53 create-hosted-zone \
--name <domainname> \
--vpc VPCRegion=<region>,VPCId=<vpc-aa11bb22> \
--caller-reference mirror-dns-lambda \
--hosted-zone-config Comment="My DNS Domain"

Save the HostedZone Id returned, since you will need it for future steps.

Step 2 – Create an IAM role for the Lambda function

In this step, you use the AWS CLI to create the Identity and Access Management (IAM) role that the Lambda function assumes when the function is invoked. You need to create an IAM policy with the required permissions and then attach this policy to the role.

Download the mirror-dns-policy.json and mirror-dns-trust.json files from the aws-lambda-mirror-dns-function AWS Labs GitHub repo.

mirror-dns-policy.json

The policy includes EC2 permissions to create and manage ENIs required for the Lambda function to access your VPC, and Route 53 permissions to list and create resource records. The policy also allows the function to create log groups and log events as per standard Lambda functions.

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": "arn:aws:logs:*:*:*"
    }, {

        "Effect": "Allow",
        "Action": [
            "ec2:CreateNetworkInterface",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DetachNetworkInterface",
            "ec2:DeleteNetworkInterface"
        ],
        "Resource": "*"
    }, {
        "Sid": "Manage Route 53 records",
        "Effect": "Allow",
        "Action": [
            "route53:ChangeResourceRecordSets",
            "route53:ListResourceRecordSets"
        ],

        "Resource": ["*"]
    }]
}

To restrict the Lambda function access, you can control the scope of changes to Route 53 by specifying the hosted zones that are being managed in the format “arn:aws:route53:::hostedzone/Z148QEXAMPLE8V”. This policy can be updated later if additional hosted zone IDs are added.

mirror-dns-trust.json

The mirror-dns-trust.json file contains the trust policy that grants the Lambda service permission to assume the role; this is standard for creating Lambda functions.

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
            "Service": "lambda.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }]
}

Create IAM entities

The next step is to create the following IAM entities for the Lambda function:

  • IAM policy

    Create the IAM policy using the policy document in the mirror-dns-policy.json file, replacing with the local path to the file. The output of the create-policy command includes the Amazon Resource Locator (ARN). Save the ARN, as you need it for future steps.

    aws iam create-policy \
    --policy-name mirror-dns-lambda-policy \
    --policy-document file://<LOCAL PATH>/mirror-dns-policy.json
  • IAM role

    Create the IAM role using the trust policy in the mirror-dns-trust.json file, replacing with the local path to the file. The output of the create-role command includes the ARN associated with the role that you created. Save this ARN, as you need it when you create the Lambda function in the next section.

    aws iam create-role \
    --role-name mirror-dns-lambda-role \
    --assume-role-policy-document file://<LOCAL PATH>/mirror-dns-trust.json

    Attach the policy to the role. Use the ARN returned when you created the IAM policy for the
    –policy-arn input parameter.

    aws iam attach-role-policy \
    --role-name mirror-dns-lambda-role \
    --policy-arn <enter-your-policy-arn-here>

Step 3 – Create the Lambda function

The Lambda function uses modules included in the Python 2.7 Standard Library and the AWS SDK for Python module (boto3), which is preinstalled as part of the Lambda service. Additionally, the function uses the dnspython module, which provides DNS handling functions, and there is also an externalized lookup function.

The additional libraries and functions require that we create a deployment package for this example as follows:

  1. Create a new directory for the Lambda function and download the Python scripts lambdafunction.py and lookuprdtype.py from the aws-lambda-mirror-dns-function AWS Labs GitHub repo. Alternatively, clone the repo locally.
  2. Install the additional dnspython module locally using the pip command. This creates a copy of the require module local to the function.
    pip install dnspython -t .
  3. Update the lambda_function.py to specify proxy server configuration, if required.
  4. Create a Lambda deployment package using the following command:
    zip -rq mirror-dns-lambda.zip lambda_function.py \
    lookup_rdtype.py dns*

Then, you’ll use the AWS CLI to create the Lambda function and upload the deployment package by executing the following command to create the function. Note that you need to update the commands to use the ARN of the IAM role that you created earlier, as well as the local path to the Lambda deployment file containing the Python code for the Lambda function.

aws lambda create-function --function-name mirror-dns-lambda \
--runtime python2.7 \
--role <enter-your-role-arn-here> \
--handler lambda_function.lambda_handler \
--timeout 60 \
--vpc-config SubnetIds=comma-separated-vpc-subnet-ids,SecurityGroupIds=comma-separated-security-group-ids \
--memory-size 128 \
--description "DNS Mirror Function"
--zip-file fileb://<LOCAL PATH>/mirror-dns-lambda.zip

The output of the command returns the FunctionArn of the newly-created function. Save this ARN, as you need it in the next section.

Configure a test event in order to validate that your Lambda function works; it should be in JSON format similar to the following. All keys are required as well as values for Domain, MasterDns, and ZoneId.

{
    "Domain": "mydomain.com",
    "MasterDns": "10.0.0.1",
    "ZoneId": "AA11BB22CC33DD",
    "IgnoreTTL": "False",
    "ZoneSerial": ""
}

Invoke the Lambda function to test that everything is working; after the function has been invoked, check the file named output to see if the function has worked (you should see a 200 return code). Alternatively, you can test in the AWS console, using the test event to see the log output.

aws lambda invoke \
--function-name mirror-dns-lambda \
--payload fileb://event.json output

Congratulations, you’ve now created a secondary mirrored DNS accessible to your VPC without the need for any servers!

Step 4 – Create the CloudWatch Events rule

After you’ve confirmed that the Lambda function is executing correctly, you can create the CloudWatch Events rule that triggers the Lambda function on a scheduled basis. First, create a new rule with a unique name and schedule expression. You can create rules that self-trigger on schedule in CloudWatch Events, using cron or rate expressions. The following example uses a rate expression to run every 15 minutes.

aws events put-rule \
--name mirror-dns-lambda-rule \
--schedule-expression 'rate(15 minutes)'

The output of the command returns the ARN to the newly-created CloudWatch Events rule. Save the ARN, as you need it to associate the rule with the Lambda function and to set the appropriate Lambda permissions.

Next, add the permissions required for the CloudWatch Events rule to execute the Lambda function. Note that you need to provide a unique value for the –statement-id input parameter. You also need to provide the ARN of the CloudWatch Events rule that you created.

aws lambda add-permission \
--function-name mirror-dns-lambda \
--statement-id Scheduled01 \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn <enter-your-cloudwatch-events-rule-arn-here>

Finally, set the target of the rule to be the Lambda function. Because you are going to pass parameters via a JSON string, the value for –targets also needs to be in JSON format. You need to construct a file containing a unique identifier for the target, the ARN of the Lambda function previously created, and the constant text that contains the function parameters. An example targets.json file would look similar to the following; note that every quote(“) in the Input value must be escaped.

[{
        "Id": "RuleStatementId01",
        "Arn": "<arn-of-lambda-function>",
        "Input": "{\"Domain\": \"mydomain.com\",\"MasterDns\": \"10.0.0.1\",\"ZoneId\": \"AA11BB22CC33DD\",\"IgnoreTTL\": \"False\",\"ZoneSerial\": \"\"}"
}]

Activate the scheduled event by adding the following target:

aws events put-targets \
--rule mirror-dns-lambda-rule \
--targets file://targets.json

Because a single rule can have multiple targets, every domain that you want to mirror can be defined as another target with a different set of parameters; change the ID and Input values in the target JSON file.

Conclusion

Now that you’ve seen how you can combine various AWS services to automate the mirroring of DNS to Amazon Route 53 hosted zones, we hope that you are inspired to create your own solutions using Lambda in a VPC to enable hybrid integration. Lambda allows you to create highly scalable serverless infrastructures that allow you to reduce cost and operational complexity, while providing high availability. Coupled with CloudWatch Events, you can respond to events in real-time, such as when an instance changes its state or when a customer event is pushed to the CloudWatch Events service.

To learn more about Lambda and serverless infrastructures, see the AWS Lambda Developer Guide and the ” Microservices without the Servers” blog post. To learn more about CloudWatch Events, see Using CloudWatch Events in the Amazon CloudWatch Developer Guide.

We’ve open-sourced the code used in this example in the aws-lambda-mirror-dns-function AWS Labs GitHub repo and can’t wait to see your feedback and your ideas about how to improve the solution.

AWS Serverless Chatbot Competition

Today, we are pleased to announce the official AWS Serverless Chatbot competition!

Bots on Slack can help your team be more productive and accomplish more tasks. They can help you increase visibility into your operations or help your customers easily get information through a natural, conversational interface. However, building and running bots can be a time-consuming and difficult task. Developers must provision, manage, and scale the compute resources that run the bot code.

With AWS Lambda, it’s easy to build and run bots. Upload your code and Lambda takes care of everything required to run and scale your code with high availability.

Enter your bot for a chance to win a ticket to re:Invent 2016 in Las Vegas!

Enter the competition by building a Slack bot that runs on AWS Lambda and Amazon API Gateway. You’re also encouraged to integrate Slack APIs or other APIs, SDKs, and datasets so long as you are authorized to use them.

We’ll be selecting 8 winners for a prize. Each prize includes one ticket to AWS re:Invent and access to discounted hotel room rates, along with recognition at the Serverless State of the Union address, some cool swag, $100 in AWS credits, and publicity opportunities for the winning bots.

To help you get started, we’ve created a few very basic bots that take advantage of Slack’s triggers and webhooks. These bots are built using AWS Lambda and Amazon API Gateway. To get started, check out the diagram below and view the code samples and instructions on GitHub at: aws-serverless-chatbot-sample.

Here’s what you need to do to enter:

  1. Read the Rules and Eligibility Guidelines.
  2. Register for the AWS Serverless Chatbot Competition.
  3. Create AWS and Slack developer accounts.
  4. Visit the Resources Page to learn more about the APIs and services.
  5. Build your chatbot. Our sample code (aws-serverless-chatbot-sample) is a good place to start.
  6. Create your demo video and other materials for the submission.
  7. Submit your materials before 5 PM ET on September 29, 2016.

Serverless Cross Account Stream Replication Using AWS Lambda, Amazon DynamoDB, and Amazon Kinesis Firehose

This is a guest post by Richard Freeman, Ph.D., a solutions architect and data scientist at JustGiving. JustGiving in their own words: We are one of the world’s largest social platforms for giving that’s helped 27.7 million users in 196 countries raise $4.1 billion for over 27,000 good causes.”

At JustGiving, we want our analysts and data scientist to have access to production Amazon Kinesis data in near real-time and have the flexibility to make transformations, but without compromising on security. Rather than build a custom service using the Kinesis Client Library (KCL) or maintain an Amazon EC2 instance running with a Java Kinesis Agent, we used an AWS Lambda function that does not require us to maintain a running server. The Lambda function is used to process Amazon Kinesis events, enrich them, and write them to Amazon DynamoDB in another AWS account.

After the data is stored in DynamoDB, further systems can process the data as a stream; we persist the data to S3 via Amazon Kinesis Firehose using another Lambda function. This gives us a truly serverless environment, where all the infrastructure including the integration, connectors, security, and scalability is managed by AWS, and allows us to focus only on the stream transformation logic rather than on code deployment, systems integration, or platform.

This post shows you how to process a stream of Amazon Kinesis events in one AWS account, and persist it into a DynamoDB table and Amazon S3 bucket in another account, using only fully managed AWS services and without running any servers or clusters. For this walkthrough, I assume that you are familiar with DynamoDB, Amazon Kinesis, Lambda, IAM, and Python.

Overview

Amazon Kinesis Streams is a stream processing engine that can continuously capture and store terabytes of data per hour from hundreds of thousands of sources, such as website clickstreams, financial transactions, social media feeds, web logs, sensors, and location-tracking events.

Working with a production environment and a proof of concept (POC) environment that are in different AWS accounts is useful if you have web analytics traffic written to Amazon Kinesis in production and data scientists need near real-time and historical access to the data in another non-operational, non-critical, and more open AWS environment for experiments. In addition, you may want to persist the Amazon Kinesis records in DynamoDB without duplicates.

The following diagram shows how the two environments are structured.

The production environment contains multiple web servers running data producers that write web events to Amazon Kinesis, which causes a Lambda function to be invoked with a batch of records. The Lambda function (1) assumes a role in the POC account, and continuously writes those events to a DynamoDB table without duplicates.

After the data is persisted in DynamoDB, further processing can be triggered via DynamoDB Streams and invoke a Lambda function (2) that could be used to perform streaming analytics on the events or to persist the data to S3 (as described in this post). Note that the Lambda function (1) can also be combined with what the function (2) does, but we wanted to show how to process both Amazon Kinesis stream and DynamoDB Streams data sources. Also, having the second function (2) allows data scientists to project, transform, and enrich the stream for their requirements while preserving the original raw data stream.

Setting up the POC environment

The POC environment is where we run data science experiments, and is the target environment for the production Amazon Kinesis data. For this example, give the POC the account number: 999999999999.

Create a target table in the POC account

This DynamoDB table will be where the Kinesis events will be written to. To create our table we will show an example using the AWS console:

  1. Open the DynamoDB console.
  2. On the navigation pane, choose Create Table and configure the following:
    • For Table name, enter prod-raven-lambda-webanalytics-crossaccount.
    • For Primary key (hash/partition key), enter seqNumbershardIdHash and choose String.
    • Choose Add sort key.
    • For Sort key Name (range), enter sequenceNumber and choose String.
  3. Under Table settings , clear Use default settings.
  4. Choose Create.
  5. Under Tables , select the table prod-raven-lambda-webanalytics-crossaccount and configure the following:
    • Choose Capacity.
    • Under Provisioned Capacity :
    • For Read capacity units , enter 20.
    • For Write capacity units , enter 20.
    • Choose Save.

Notes:

  • Optionally, and depending on your records, you can also create a global secondary index if you want to search a subset of the records by user ID or email address – this is very useful for QA in development and staging environments.
  • Provisioned throughput capacity should be set based n the amount of expected events that will be read and written per second. If the reads or writes are above this capacity for a period of time, then throttling occurs, and writes must be retried. There are many strategies to deal with this, including having the capacity set dynamically depending on the current consumed read or write of events, or having exponential backoff retry logic.
  • DynamoDB is a schemaless database, so no other fields or columns need to be specified up front. Each item in the table can have a different number of elements.

Create an IAM policy in the POC account

First, create an IAM role that can be used by the Lambda functions to write to the DynamoDB table.

  1. In the POC account, pen the IAM console.
  2. On the navigation pane, choose Policies , Create Policy.
  3. Choose Create Your Own Policy and configure the following:
    • For Policy Name , enter prod-raven-lambda-webanalytics-crossaccount.
    • For Description , enter “Read/write access to the POC prod-raven-lambda-webanalytics-crossaccount table” or similar text.
    • For Policy Document , insert the following JSON:
{    
    "Version": "2012-10-17",    
    "Statement": [        
        {            
            "Sid": "Stmt345926348000",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:DescribeStream",
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:GetRecords",
                "dynamodb:ListTables",
                "dynamodb:PutItem",
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:UpdateItem",
                "dynamodb:UpdateTable"],
            "Resource": [ "arn:aws:dynamodb:<region>:<999999999999>:table/prod-raven-lambda-webanalytics-crossaccount" ]
        }    
    ]
}

The action array list specifies the allowable actions on the specified resource. Here, it is the DynamoDB table created earlier. Replace with your region, e.g., eu-west-1 and with your AWS account ID.

Create an IAM role

Next, create an IAM role which uses this policy, so that Lambda can assume an identity with the indicated privileges.

  1. Open the IAM console, choose Roles , Create New Role.
    • For Role Name , enter DynamoDB-ProdAnalyticsWrite-role.
    • Choose Role for Cross Account Access to provide access between AWS accounts you own.
    • For Establish Trust , enter the production AWS account ID, e.g., 111111111111.
    • Select the policy created earlier, prod-raven-lambda-webanalytics-crossaccount.
  2. Choose Review , Create Role.

This role will be created with an Amazon Resource Notation ID (ARN), e.g. arn:aws:iam::999999999999:role/DynamoDB-ProdAnalyticsWrite-role. You will now see the policy is attached and the trusted account ID is 111111111111.

Setting up the production environment

In the production environment, you use a Lambda function to read, parse, and transform the Amazon Kinesis records and write them to DynamoDB. Access to Amazon Kinesis and DynamoDB is fully managed through IAM policies and roles. At this point, you need to sign out of the POC account and sign in as a user with IAM administration rights in the production account 111111111111.

Create a Lambda policy

Create a policy that allows a production user to assume the role of a user in the POC account .

  1. In the production environment, open the IAM console and choose Policies.
  2. Choose Create Policy , Create Your Own Policy.
    1. For Policy Name , enter prod-raven-lambda-assumerole.
    2. For Description , enter “This policy allows the Lambda function to execute, write to CloudWatch and assume a role in POC” or similar text.
    3. For Policy Document , insert the following JSON:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:GetRecords",
                "kinesis:GetShardIterator",
                "kinesis:DescribeStream",
                "kinesis:ListStreams",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "",
            "Resource": "*",
            "Action": [
                "logs:*"
            ],
            "Effect": "Allow"
        },
        {
            "Sid": "Stmt1435680952001",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "iam:GenerateCredentialReport",
                "iam:Get*",
                "iam:List*"
            ],
            "Resource": [
            "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::<999999999999>:role/DynamoDB-ProdAnalyticsWrite-role"
        }
    ]
}

Create a Lambda execution IAM role

  1. In the IAM console, choose Roles , Create New Role.
  2. For Set Role Name , enter LambdaKinesisDynamo and choose Next step.
  3. For Role Type , choose AWS Service Roles , AWS Lambda.
  4. Select the policy created earlier, prod-raven-lambda-assumerole** , and choose Next step **.
  5. Choose Review , Create Role.

Create the processor Lambda function

  1. 1. Open the Lambda console and choose Create a Lambda Function.
  2. 2. Select the blueprint kinesis-process-record-python.
  3. 3. Configure the event source:
    • For Event Source type , choose Kinesis.
    • For Kinesis Stream , select web-analytics (or your stream) .
    • For Batch Size , enter 300 (this depends on the frequency that events are added to Amazon Kinesis).
    • For Starting position , choose Trim horizon.
  4. 4. Configure the function:
    • For Name , enter ProdKinesisToPocDynamo.
    • For Runtime , choose Python 2.7.
    • For Edit code inline , add the Lambda function source code (supplied in the next section).
    • For Handler , choose lambdafunction.lambdahandler.
    • For Role , choose the role that you just created, LambdaKinesisDynamo.
    • For Memory ,choose 128 (this depends on the frequency that events are added to Amazon Kinesis)
    • For Timeout , choose 2min 0 sec (this depends on the frequency that events are added to Amazon Kinesis).
  5. 5. Review the configuration:
    • For Enable event source , choose Enable later (this can be enabled later via the Event sources tab).
  6. 6. Choose Create function.

Create the Lambda function code

At the time of this post, Lambda functions support the Python, Node.js, and Java languages. For this task, you implement the function in Python with the following basic logic:

  • Assume a role that allows the Lambda function to write to the DynamoDB table. Manually configure the DynamoDB client to use the newly-assumed cross account role.
  • Convert Amazon Kinesis events to the DynamoDB format.
  • Write the record to the DynamoDB table.

Assume a cross account role for access to the POC account DynamoDB table

This code snippet uses the AWS Security Token Service (AWS STS) which enables the creation of temporary IAM credentials for an IAM role. The assume_role() function returns a set of temporary credentials (consisting of an access key ID, a secret access key, and a security token) that can be used to access the DynamoDB table via the IAM role arn:aws:iam::999999999999:role/DynamoDB-ProdAnalyticsWrite-role.

Assume the role

import base64, json, boto3

def lambda_handler(event, context):
    client = boto3.client('sts')
    sts_response = client.assume_role(RoleArn='arn:aws:iam::<999999999999>:role/DynamoDB-PrdAnalyticsWrite-role',                              
                                      RoleSessionName='AssumePocRole', DurationSeconds=900)

Configure the DynamoDB client using a cross account role

    dynamodb = boto3.resource(service_name='dynamodb', region_name=<region>,
                              aws_access_key_id = sts_response['Credentials']['AccessKeyId'],
                              aws_secret_access_key = sts_response['Credentials']['SecretAccessKey',
                              aws_session_token = sts_response['Credentials']['SessionToken'])

You can now use any DynamoDB methods in the AWS SDK for Python (Boto 3) to write to DynamoDB tables in the POC environment, from the production environment.

In order to test the code, I recommend that you create a test event. The simplest way to do this is to print the event from Amazon Kinesis, copy it from the Lambda function to CloudWatch Logs, and convert it into valid JSON. You can now use it when you select the Lambda function and choose Actions , Configure Test Event. After you are happy with the records in DynamoDB, you can choose Event Source,** Disable** to disable the mapping between the Lambda function and Amazon Kinesis.

Here’s the Python code to write an Amazon Kinesis record to the DynamoDB table:

    tableName = 'prod-raven-lambda-webanalytics'
    for record in event['Records']:
        try:            
            payload_json = json.loads(base64.b64decode(record['kinesis']['data']))
            payload_json['sequenceNumber'] = record['kinesis']['sequenceNumber']
            payload_json['shardId'] = record['eventID'].split(':')[0]
            payload_json['seqNumbershardIdHash'] = payload_json['sequenceNumber'][-3:-1]+'_'+ payload_json['shardId']
            items={}
            for k, v in payload_json.items():
                if v is not None and v != '':
                    items[k] = v
            table = dynamodb.Table(tableName)
            response  = table.put_item(Item=items)
        except Exception as e:
            print(e.__doc__)
            print(e.message)
    return 'Successfully processed {} records.'.format(len(event['Records']))

The first part of the source snippet iterates over the batch of events that were sent to the Lambda function as part of the Amazon Kinesis event source, and decodes from base64 encoding. Then the fields used for the DynamoDB hash and range are extracted directly from the Amazon Kinesis record. This helps avoid storing duplicate records or idempotent processing, and allows for rapid retrieval without the need to do a full table scan (more on this later). Each record is then converted into the DynamoDB format, and finally written to the target DynamoDB table with the put_item()function.

As the data is now persisted in the POC environment, data scientists and analysts are free to experiment on the latest production data without having access to production or worrying about impacting the live website.

Records in the POC DynamoDB table

Now I’ll give technical details on how you can ensure that the DynamoDB records are idempotent, and different options for querying the data. The pattern I describe can be more widely used as it will work with any Amazon Kinesis records. The idea is to use the Amazon Kinesis sequence number, which is unique per stream, as the range, and a subset of it as a composite hash. In our experiments, we found that the taking the last two minus one digits (here, the 80 in red) from the sequence number was better distributed than the last two (02).

This schema ensures that you do not have duplicate Amazon Kinesis records without the need to maintain any state or position in the client, or change the records in any way, which is well suited for a Lambda function. For example if you deleted the Amazon Kinesis event source of the Lambda function and added a new one an hour later with a trim horizon, then this would mean that the existing DynamoDB rows would be overwritten, as the records would have identical hash and range keys. Not having duplicate records is very useful for any incremental loads into other systems and reduces the deduplication work needed downstream, and also allows us to readily replay Amazon Kinesis records.

An example of a populated DynamoDB table could look like the following graphic.

Notice that the composite key of hash and range is unique and that other fields captured can be useful in understanding the user click stream. For example you can see that if you run a query on attribute useremail_ and order by nanotimestamp_, you obtain the clickstream of an individual user. However, this means that DynamoDB has to analyze the whole table in what is known as a table scan, as you can only query on hash and range keys.

One solution is to use a global secondary index, as discussed earlier; however, you might suffer from the data writes being throttled in the main table should the index writes be throttled (AWS is working on an asynchronous update process), This can happen if the hash/range are not well distributed. If you need to use the index, one solution is to use a scalable pattern with global secondary indexes.

Another way is to use DynamoDB Streams as an event source to trigger another Lambda function that writes to a table specifically used for querying by user email, e.g., the hash could be useremail_ and the range could be nanotimestamp_. This also means that you are no longer limited to five indexes per main table and the throttling issues mentioned earlier do not affect the main table.

Using DynamoDB Streams is also more analytics-centric as it gives you dynamic flexibility on the primary hash and range, allowing data scientists to experiment with different configurations. In addition, you might use a Lambda function to write the records to one table for the current week and another one for the following week; this allows you to keep the most recent hot data in a smaller table, and older or colder data in separate tables that could be removed or archived on a regular basis.

Persisting records to S3 via Amazon Kinesis Firehose

Now that you have the data in DynamoDB, you can enable DynamoDB Streams to obtain a stream of events which can be fully sourced and processed in the POC. For example, you could perform time series analysis on this data or persist the data to S3 or Amazon Redshift for batch processing.

To write to S3, use a Lambda function that reads the records from DynamoDB Streams and writes them to Amazon Kinesis Firehose. Other similar approaches exist but they rely on having a running server with custom deployed code. Firehose is a fully managed service for delivering real-time streaming data to destinations such as S3.

First, set up an IAM policy and associate it with an IAM role so that the Lambda function can write to Firehose.

Create an IAM policy

  1. Open the IAM console.
  2. On the navigation pane, choose Policies , Create Policy.
  3. Choose Create Your Own Policy and configure the following:
    • For Policy Name , enter poc-raven-lambda-firehose.
    • For Description , enter Read access to DynamoDB, Put records to Kinesis Firehose, and Lambda execution rights.
    • For Policy Document , insert the following JSON:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetRecords",
                "dynamodb:GetShardIterator",
                "dynamodb:DescribeStream",
                "dynamodb:ListStreams",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "cloudwatch:PutMetricData"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "firehose:DescribeDeliveryStream",
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch",
                "firehose:UpdateDestination"
            ],
            "Resource": [
                "*"
            ]
        }

    ]
}

Next, create an IAM role which uses this policy, so that Lambda can assume an identity with the required privileges of reading from DynamoDB Streams, writing to Amazon Kinesis streams.

Create an IAM role

  1. Open the IAM console.
  2. On the navigation pane, choose Roles , Create New Role.
  3. For Set Role Name , enter lambdadynamostreams_firehose and choose Next step.
  4. For Role Type , choose AWS Service Roles , AWS Lambda.
  5. Select the policy created earlier, poc-raven-lambda-firehose** , and choose Next step **.
  6. Choose Review , Create Role.

Create the Lambda function

  1. Open the Lambda console.
  2. Choose Create a Lambda function and select the dynamo-process-stream-python blueprint.
  3. Configure the event sources:
    • For Event Source type , choose DynamoDB.
    • For DynamoDB Table , enter prod-raven-lambda-webanalytics.
    • For Batch Size , enter 300 (this depends on the frequency that events are added to DynamoDB).
    • For Starting position , choose Trim horizon.
  4. Configure the function:
    • For Name , enter PocLambdaFirehose.
      -For Runtime , choose Python 2.7.
      -For Edit code inline , add the Lambda function source code (see code below).
    • For Handler , choose lambdafunction.lambdahandler.
    • For Role , select the role you created earlier, e.g., lambdadynamostreams_firehose. This grants the Lambda function access to DynamoDB Streams, Firehose, and CloudWatch metrics and logs.
    • For Memory (MB), choose 128.
    • For Timeout , enter 5 min 0 sec.
  5. Review
    • For Enable event source , choose Enable later (this can be enabled later via the Event sources tab).
    • Choose Create function.
from __future__ import print_function
import json
import boto3

def lambda_handler(event, context):
    firehose_client = boto3.client(service_name='firehose', region_name='eu-west-1')
    stream_name='stg-kinesis'
    for record in event['Records']:
        output_records = {}
        for key, value in record['dynamodb']['NewImage'].iteritems():
            if value.keys()[0] in ['S', 'N', 'B']:
                output_records[key]=value.values()[0]
        kinesis_data="".join([json.dumps(output_records),"\n"])        
        put_record(stream_name, kinesis_data, firehose_client)
    return 'done'

def put_record(stream_name, data, client):    
    client.put_record(DeliveryStreamName=stream_name,
                      Record={'Data':data}) 

First, the Lambda function iterates over all records and values, and flattens the JSON data structure from the DynamoDB format to JSON records, e.g., ‘event’:{‘S’:’page view’}becomes{‘event’:’page view’}. Then, the record is then encoded as a JSON record and sent to Firehose.

For testing, you can log events from your Amazon Kinesis stream by printing them as JSON to standard out; Lambda automatically delivers this information into CloudWatch Logs. This information can then be used as the test event in the Lambda console.

Configuring the Amazon Kinesis Firehose delivery stream

  1. Open the Amazon Kinesis Firehose console.
  2. Choose Create Delivery Stream.
    • For Destination , choose Amazon S3.
    • For Delivery stream name , enter: prod-s3-firehose.
    • For S3 bucket , choose Create new S3 bucket or Use an existing S3 bucket.
    • For S3 prefix , enter prod-kinesis-data.
  3. Choose Next to change the following Configuration settings.
    • For Buffer size , enter 5.
    • For Buffer interval , enter 300.
    • For Data compression , choose UNCOMPRESSED.
    • For Data Encryption , choose No Encryption.
    • For Error Logging , choose Enable.
    • For IAM role , choose Firehose Delivery IAM Role. This opens a new window: Amazon Kinesis Firehose is requesting permission to use resources in your account.
    • For IAM Role , choose Create a new IAM role.
    • For Role Name, choose firehosedeliveryrole.
  4. After the policy document is automatically generated, choose Allow.
  5. Choose Next.
  6. Review your settings and choose Create Delivery Stream.

After the Firehose delivery stream is created, select the stream and choose Monitoring to see activity. The following graphic shows some example metrics.

After the DynamoDB Streams event source for the Lambda function is enabled, new objects with production Amazon Kinesis records passing through the POC DynamoDB table are automatically created in the POC S3 bucket using Firehose. Firehose can also be used to import data to Amazon Redshift or Amazon Elasticsearch Service.

Benefits of serverless processing

Using a Lambda function for stream processing enables the following benefits:

  • Flexibility – Analytics code can be changed on the fly.
  • High availability – Runs in multiple Availability Zones in a region
  • Zero-maintenance and upgrade of the running instances, all services. are supported by AWS
  • Security – No use of keys or passwords. IAM roles can be used to integrate with Amazon Kinesis Streams, Amazon Kinesis Firehose, and DynamoDB.
  • Serverless compute – There are no EC2 instances or clusters to set up and maintain.
  • Automatic scaling – The number of Lambda functions invoked changes depending on the number of writes to Amazon Kinesis. Note that the upper concurrent limit is the number of Amazon Kinesis streams and DynamoDB Streams shards.
  • Low cost – You pay only for the execution time of the Lambda function.
  • Ease of use – Easy selection of language for functions, and very simple functions. In these examples, the functions just iterate over and parse JSON records.

Summary

In this post, I described how to persist events from an Amazon Kinesis stream into DynamoDB in another AWS account, without needing to build and host your own server infrastructure or using any keys. I also showed how to get the most out of DynamoDB, and proposed a schema that eliminates duplicate Amazon Kinesis records and reduces the need for further de-duplication downstream.

In a few lines of Python code, this pattern has allowed the data engineers at JustGiving the flexibility to project, transform, and enrich the Amazon Kinesis stream events as they arrive. Our data scientists and analysts now have full access to the production data in near-real time but in a different AWS account, allowing them to quickly undertake streaming analysis, project a subset of the data, and run experiments.

I also showed how to persist that data stream to S3 using Amazon Kinesis Firehose. This allows us to quickly access the historical clickstream dataset, without needing a permanent server or cluster running.

If you have questions or suggestions, please comment below. You may also be interested in seeing other material that digs deeper into the capabilities of Lambda and DynamoDB.

AWS Lambda Now Available in Singapore!

We are happy to announce the next step in our regional launch plan has been completed: AWS Lambda is now available in Singapore. With yesterday’s launch, we now have both Amazon API Gateway and AWS Lambda available in Singapore, so you can now use both services together in seven regions:

Stay tuned for additional regions, and check out some of our recent blog posts for useful tips and tricks: