Category: AWS Lambda


Building a Serverless Interface for Global Satellite Imagery

This is a guest post by Joe Flasher, Technical Business Development Manager.

In March 2015, we launched Landsat on AWS, a public dataset made up of imagery from the Landsat 8 satellite. Within the first year of launching Landsat on AWS, we logged over 1 billion requests for Landsat data and have been inspired by our customers innovative uses of the data. The dataset is large, consisting of hundreds of TB spread over millions of objects and is growing by roughly 1.4 TB per day.

It’s also a very beautiful dataset, as the Landsat 8 satellite regularly captures striking images of our planet. This is why we created landsatonaws.com, a website that makes it easy to browse the data in your browser.

With so many files, it can be challenging to browse through them in a human-friendly manner or to visually explore the full breadth of the imagery. We wanted to create an interface that would let users walk through the imagery, but would also act as a referential catalog, providing data about the imagery following a well-structured URL format. In addition, we wanted the interface to be:

  • Fast. Pages should load quickly.
  • Lightweight. Hosting 100,000,000 pages should cost less than $100 per month.
  • Indexable. All path/row pages of the site should be indexable by search engines.
  • Linkable. All unique pages of the site should have a well-structured URL.
  • Up to date. New imagery is added daily, to make sure the site stays current.

To meet all these needs, we employed a serverless architecture that brought together AWS Lambda, Amazon API Gateway and Amazon S3 to power https://landsatonaws.com.

With API Gateway and Lambda, we have a powerful pairing that can handle incoming HTTP requests, run a Lambda function, and return the response to the requester. We also have an S3 bucket where we store a small amount of aggregated metadata that helps the Lambda functions respond quickly to requests.

There is a two-fold advantage to this sort of architecture. Firstly, we have no server to worry about and only pay when a user makes a request. Secondly, while the underlying imagery dataset is very large, the data needed to power the front-end interface is very small; pages are created when requested and then disappear.

Landsat 8 imagery has a well-defined naming structure and we wanted to be able to reproduce that with our page structure. To accomplish that, we provide endpoints for every combination down to the unique scene, for example:

To be able to provide fast responses to these page requests via Lambda functions, we do a small amount of upfront processing on a nightly basis. This is done via another Lambda function that’s triggered with a scheduled CloudWatch event. This function creates a few metadata files, including a list of unique path/row combos for all imagery, a sitemap.txt file for search engine indexing, a list of the last four cloud-free images (for the homepage), and a number of files with scene info, aggregated at the path level. These aggregated files look like the following:

LC80010022016230LGN00,2016-08-17 14:07:40.521957,84.04
LC80010022016246LGN00,2016-09-02 14:07:46.993260,86.66
LC80010022016262LGN00,2016-09-18 14:07:49.546837,20.02
LC80010032015115LGN00,2015-04-25 14:07:18.078839,1.45
LC80010032015131LGN00,2015-05-11 14:07:02.141468,32.71
LC80010032015147LGN00,2015-05-27 14:07:03.269496,87.21
LC80010032015163LGN00,2015-06-12 14:07:14.976409,10.72
LC80010032015179LGN00,2015-06-28 14:07:20.976541,70.1
LC80010032015195LGN00,2015-07-14 14:07:31.606418,23.26
LC80010032015211LGN00,2015-07-30 14:07:36.060404,78.89

The file names include scene ID, acquisition time, and cloud cover percentage.

All of the metadata files are created by accessing the scene_list.gz file from the dataset’s S3 bucket and cutting it up in a few different ways.

For a unique scene request such as https://landsatonaws.com/L8/004/112/LC80041122015344LGN00, the Lambda function calls out to S3 for two different pieces of information. The first is a request to get all the keys that have a prefix with the given scene ID so that we can display all related files and the second is a request for the scene’s metadata so we can provide detailed information. Because our Lambda function is running in the same region as the Landsat data (though it could run anywhere), these requests are handled very quickly.

Summary

By using Lambda, API Gateway, and S3, we’re able to create a fast, lightweight, indexable, and always up to date front end for a very large dataset. You can find all the code used to power https://landsatonaws.com at landsat-on-aws on GitHub. While the code is written specifically for the Landsat on AWS data, this same pattern can be used for any well-defined dataset.

Introducing Simplified Serverless Application Deployment and Management

Orr Weinstein
Orr Weinstein, Sr. Product Manager, AWS Lambda

Today, AWS Lambda launched the AWS Serverless Application Model (AWS SAM); a new specification that makes it easier than ever to manage and deploy serverless applications on AWS. With AWS SAM, customers can now express Lambda functions, Amazon API Gateway APIs, and Amazon DynamoDB tables using simplified syntax that is natively supported by AWS CloudFormation. After declaring their serverless app, customers can use new CloudFormation commands to easily create a CloudFormation stack and deploy the specified AWS resources.

AWS SAM

AWS SAM is a simplification of the CloudFormation template, which allows you to easily define AWS resources that are common in serverless applications.

You inform CloudFormation that your template defines a serverless app by adding a line under the template format version, like the following:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Serverless resource types

Now, you can start declaring serverless resources….

AWS Lambda function (AWS::Serverless::Function)

Use this resource type to declare a Lambda function. When doing so, you need to specify the function’s handler, runtime, and a URI pointing to an Amazon S3 bucket that contains your Lambda deployment package.

If you want, you could add managed policies to the function’s execution role, or add environment variables to your function. The really cool thing about AWS SAM is that you can also use it to create an event source to trigger your Lambda function with just a few lines of text. Take a look at the example below:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  MySimpleFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs4.3
      CodeUri: s3://<bucket>/MyCode.zip
      Events:
        MyUploadEvent:
          Type: S3
          Properties:
            Id: !Ref Bucket
            Events: Create
  Bucket:
    Type: AWS::S3::Bucket

In this example, you declare a Lambda function and provide the required properties (Handler, Runtime and CodeUri). Then, declare the event source—an S3 bucket to trigger your Lambda function on ‘Object Created’ events. Note that you are using the CloudFormation intrinsic ref function to set the bucket you created in the template as the event source.

Amazon API Gateway API (AWS::Serverless::Api)

You can use this resource type to declare a collection of Amazon API Gateway resources and methods that can be invoked through HTTPS endpoints. With AWS SAM, there are two ways to declare an API:

1) Implicitly

An API is created implicitly from the union of API events defined on AWS::Serverless::Function. An example of how this is done is     shown later in this post.

2) Explicitly

If you require the ability to configure the underlying API Gateway resources, you can declare an API by providing a Swagger file, and     the stage name:

MyAPI:
   Type: AWS::Serverless::Api
   Properties:
      StageName: prod
      DefinitionUri: swaggerFile.yml

Amazon DynamoDB table (AWS::Serverless::SimpleTable)

This resource creates a DynamoDB table with a single attribute primary key. You can specify the name and type of your primary key, and your provisioned throughput:

MyTable:
   Type: AWS::Serverless::SimpleTable
   Properties:
      PrimaryKey:
         Name: id
         Type: String
      ProvisionedThroughput:
         ReadCapacityUnits: 5
         WriteCapacityUnits: 5

In the event that you require more advanced functionality, you should declare the AWS::DynamoDB::Table resource instead.

App example

Now, let’s walk through an example that demonstrates how easy it is to build an app using AWS SAM, and then deploy it using a new CloudFormation command.

Building a serverless app

In this example, you build a simple CRUD web service. The “front door” to the web service is an API that exposes the “GET”, “PUT”, and “DELETE” methods. Each method triggers a corresponding Lambda function that performs an action on a DynamoDB table.

This is how your AWS SAM template should look:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple CRUD web service. State is stored in a DynamoDB table.
Resources:
  GetFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.get
      Runtime: nodejs4.3
      Policies: AmazonDynamoDBReadOnlyAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref Table
      Events:
        GetResource:
          Type: Api
          Properties:
            Path: /resource/{resourceId}
            Method: get
  PutFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.put
      Runtime: nodejs4.3
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref Table
      Events:
        PutResource:
          Type: Api
          Properties:
            Path: /resource/{resourceId}
            Method: put
  DeleteFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.delete
      Runtime: nodejs4.3
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref Table
      Events:
        DeleteResource:
          Type: Api
          Properties:
            Path: /resource/{resourceId}
            Method: delete
  Table:
    Type: AWS::Serverless::SimpleTable

There are a few things to note here:

  • You start the template by specifying Transform: AWS::Serverless-2016-10-31. This informs CloudFormation that this template contains AWS SAM resources that need to be ‘transformed’ to full-blown CloudFormation resources when the stack is created.
  • You declare three different Lambda functions (GetFunction, PutFunction, and DeleteFunction), and a simple DynamoDB table. In each of the functions, you declare an environment variable (TABLENAME) that leverages the CloudFormation intrinsic ref function to set TABLENAME to the name of the DynamoDB table that you declare in your template.
  • You do not use the CodeUri attribute to specify the location of your Lambda deployment package for any of your functions (more on this later).
  • By declaring an API event (and not declaring the same API as a separate AWS::Serverless::Api resource), you are telling AWS SAM to generate that API for you. The API that is going to be generated from the three API events above looks like the following:
/resource/{resourceId}
      GET
      PUT
      DELETE

Next, take a look at the code:

'use strict';

console.log('Loading function');
let doc = require('dynamodb-doc');
let dynamo = new doc.DynamoDB();

const tableName = process.env.TABLE_NAME;
const createResponse = (statusCode, body) => {
    return {
        "statusCode": statusCode,
        "body": body || ""
    }
};

exports.get = (event, context, callback) => {
    var params = {
        "TableName": tableName,
        "Key": {
            id: event.pathParameters.resourceId
        }
    };
    
    dynamo.getItem(params, (err, data) => {
        var response;
        if (err)
            response = createResponse(500, err);
        else
            response = createResponse(200, data.Item ? data.Item.doc : null);
        callback(null, response);
    });
};

exports.put = (event, context, callback) => {
    var item = {
        "id": event.pathParameters.resourceId,
        "doc": event.body
    };

    var params = {
        "TableName": tableName,
        "Item": item
    };

    dynamo.putItem(params, (err, data) => {
        var response;
        if (err)
            response = createResponse(500, err);
        else
            response = createResponse(200, null);
        callback(null, response);
    });
};

exports.delete = (event, context, callback) => {

    var params = {
        "TableName": tableName,
        "Key": {
            "id": event.pathParameters.resourceId
        }
    };

    dynamo.deleteItem(params, (err, data) => {
        var response;
        if (err)
            response = createResponse(500, err);
        else
            response = createResponse(200, null);
        callback(null, response);
    });
};

Notice the following:

  • There are three separate functions, which have handlers that correspond to the handlers you defined in the AWS SAM file (‘get’, ‘put’, and ‘delete’).
  • You are using env.TABLE_NAME to pull the value of the environment variable that you declared in the AWS SAM file.

Deploying a serverless app

OK, now assume you’d like to deploy your Lambda functions, API, and DynamoDB table, and have your app up and ready to go. In addition, assume that your AWS SAM file and code files are in a local folder:

  • MyProject
    • app_spec.yml
    • index.js

To create a CloudFormation stack to deploy your AWS resources, you need to:

  1. Zip the index.js file.
  2. Upload it to an S3 bucket.
  3. Add a CodeUri property, specifying the location of the zip file in the bucket for each function in app_spec.yml.
  4. Call the CloudFormation CreateChangeSet operation with app_spec.yml.
  5. Call the CloudFormation ExecuteChangeSet operation with the name of the changeset you created in step 4.

This seems like a long process, especially if you have multiple Lambda functions (you would have to go through steps 1 to 3 for each function). Luckily, the new ‘Package’ and ‘Deploy’ commands from CloudFormation take care of all five steps for you!

First, call package To perform steps 1 to 3. You need to provide the command with the path to your AWS SAM file, an S3 bucket name, and a name for the new template that will be created (which will contain an updated CodeUri property):

aws cloudformation package --template-file app_spec.yml --output-template-file new_app_spec.yml --s3-bucket <your-bucket-name>

Next, call Deploy with the name of the newly generated SAM file, and with the name of the CloudFormation stack. In addition, you need to provide CloudFormation with the capability to create IAM roles:

aws cloudformation deploy --template-file new_app_spec.yml --stack-name <your-stack-name> --capabilities CAPABILITY_IAM

And voila! Your CloudFormation stack has been created, and your Lambda functions, API Gateway API, and DynamoDB table have all been deployed.

Conclusion

Creating, deploying and managing a serverless app has never been easier. To get started, visit our docs page or check out the serverless-application-model repo on GitHub.

If you have questions or suggestions, please comment below.

Simplify Serverless Applications with Environment Variables in AWS Lambda

Gene Ting
Gene Ting, Solutions Architect

Lambda developers often want to configure their functions without changing any code. In this post, we show you how to use environment variables to pass settings to your Lambda function code and libraries.

Creating and updating a Lambda function

First, create a Lambda function that uses some environment variables. Here’s a simple but realistic example that allows you to control the log level of a Lambda function by setting an environment variable called, “LOG_LEVEL”. After you have created the code, pass values into LOG_LEVEL so your code can read it.


'use strict';

const logLevels = {error: 4, warn: 3, info: 2, verbose: 1, debug: 0};

// get the current log level from the current environment if set, else set to info
const currLogLevel = process.env.LOG_LEVEL != null ? process.env.LOG_LEVEL : 'info';

// print the log statement, only if the requested log level is greater than the current log level
function log(logLevel, statement) {
    if(logLevels[logLevel] >= logLevels[currLogLevel] ) {
        console.log(statement);
    }
}

// return the monthly payment, rounded to the cent. FOR DEMO PURPOSES ONLY
function monthlyPayment(principal, rate, years) {
    var fv = Math.pow(1+rate/1200, years * 12);
    return Number(Math.round(principal * (rate/1200*fv) / (fv - 1) + 'e2') + 'e-2'); 
}

exports.handler = (event, context, callback) => {
    log('debug', "principal: " + event.principal + " - rate: " + event.rate + " - years: " + event.years);
    callback(null, monthlyPayment(event.principal, event.rate, event.years));
};

Now, here’s how to get those values into the code. You can do this through the console, you can also do it programmatically with full API and CLI support. In the console, a new section below the Lambda function allows you to specify environment variables.

By default, Lambda chooses the default KMS service key for Lambda. Passing custom keys is supported, but not required.

If you use the default KMS service key for Lambda, then you dont need additional IAM permissions in your execution role – your role works automatically without any changes.

If you supply your own, custom KMS key, then you need to allow “kms:Decrypt”, as shown below in a basic execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:us-east-1:xxxxxxxxxxxx:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:us-east-1:xxxxxxxxxxxx:log-group:/aws/lambda/mortgageCalc:*"
            ]
        }
    ]
}

Test your function and see environment variables in action.

Testing your Lambda function

First, in the Lambda console, configure a test event for this function, using the following JSON snippet:

{
  "principal": 100000,
  "rate": 6,
  "years": 30
}

Next, run the function by pressing “Test”:

You should verify the result of the calculator, but what’s really interesting is what you see in the logs. With the logging level set to ‘info’, no debug information should appear:

Change LOG_LEVEL to ‘debug’ and re-run the function:

Choose “Test” again, and examine the logs: you should see that the additional debugging logs appear. Console output can be found within the log stream for the function in Amazon CloudWatch Logs:

Targeting a table based on a given phase of a deployment lifecycle

In the example above, you used environment variables to modify the behavior of a Lambda function without changing its code. Here’s another typical use of environment variables: Providing stage-specific settings for code as it moves through lifecycle stages from development to deployment. Environment variables can be used to provide settings for resources, such as a development database password versus the production database password.

The example code below shows how to point an Amazon DynamoDB client in Java to a table that varies across stages. It also demonstrates how environment variables can be read in a Java-based Lambda function, using System.getenv.

//Write to the appropriate table based on the current environment

AmazonDynamoDBClient dynamoDB = new AmazonDynamoDBClient();
Table table = dynamoDB.getTable(System.getenv(TARGET_TABLE));
Item item = new Item();
table.putItem(item);

Based on the value of “TARGET_TABLE”, the function connects to different tables. Because you change only the configuration, not your code, you know that the behavior is unchanged as you promote from one stage to another; only the environment variable settings change.

Conclusion

Environment variables bring a new level of simplicity to working with Lambda functions. In this post, we described how to use environment variables to:

  • Change Lambda function behavior, such as switching the logging level, without changing the code
  • Configure access to stage-specific resources, such as a DynamoDB table name or a SQL table password, as your code progresses from development to production.

For more information, see Environment Variables in the AWS Lambda Developer Guide. Happy coding everyone, and have fun creating awesome serverless applications!

 

Building a Backup System for Scaled Instances using AWS Lambda and Amazon EC2 Run Command

Diego Natali Diego Natali, AWS Cloud Support Engineer

When an Auto Scaling group needs to scale in, replace an unhealthy instance, or re-balance Availability Zones, the instance is terminated, data on the instance is lost and any on-going tasks are interrupted. This is normal behavior but sometimes there are use cases when you might need to run some commands, wait for a task to complete, or execute some operations (for example, backing up logs) before the instance is terminated. So Auto Scaling introduced lifecycle hooks, which give you more control over timing after an instance is marked for termination.

In this post, I explore how you can leverage Auto Scaling lifecycle hooks, AWS Lambda, and Amazon EC2 Run Command to back up your data automatically before the instance is terminated. The solution illustrated allows you to back up your data to an S3 bucket; however, with minimal changes, it is possible to adapt this design to carry out any task that you prefer before the instance gets terminated, for example, waiting for a worker to complete a task before terminating the instance.

 

Using Auto Scaling lifecycle hooks, Lambda, and EC2 Run Command

You can configure your Auto Scaling group to add a lifecycle hook when an instance is selected for termination. The lifecycle hook enables you to perform custom actions as Auto Scaling launches or terminates instances. In order to perform these actions automatically, you can leverage Lambda and EC2 Run Command to allow you to avoid the use of additional software and to rely completely on AWS resources.

For example, when an instance is marked for termination, Amazon CloudWatch Events can execute an action based on that. This action can be a Lambda function to execute a remote command on the machine and upload your logs to your S3 bucket.

EC2 Run Command enables you to run remote scripts through the agent running within the instance. You use this feature to back up the instance logs and to complete the lifecycle hook so that the instance is terminated.

The example provided in this post works precisely this way. Lambda gathers the instance ID from CloudWatch Events and then triggers a remote command to back up the instance logs.

Architecture Graph

 

Set up the environment

Make sure that you have the latest version of the AWS CLI installed locally. For more information, see Getting Set Up with the AWS Command Line Interface.

Step 1 – Create an SNS topic to receive the result of the backup

In this step, you create an Amazon SNS topic in the region in which to run your Auto Scaling group. This topic allows EC2 Run Command to send you the outcome of the backup. The output of the aws iam create-topic command includes the ARN. Save the ARN, as you need it for future steps.

aws sns create-topic --name backupoutcome

Now subscribe your email address as the endpoint for SNS to receive messages.

aws sns subscribe --topic-arn <enter-your-sns-arn-here> --protocol email --notification-endpoint <your_email>

Step 2 – Create an IAM role for your instances and your Lambda function

In this step, you use the AWS console to create the AWS Identity and Access Management (IAM) role for your instances and Lambda to enable them to run the SSM agent, upload your files to your S3 bucket, and complete the lifecycle hook.

First, you need to create a custom policy to allow your instances and Lambda function to complete lifecycle hooks and publish to the SNS topic set up in Step 1.

  1. Log into the IAM console.
  2. Choose Policies, Create Policy
  3. For Create Your Own Policy, choose Select.
  4. For Policy Name, type “ASGBackupPolicy”.
  5. For Policy Document, paste the following policy which allows to complete a lifecycle hook:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "autoscaling:CompleteLifecycleAction",
        "sns:Publish"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

Create the role for EC2.

  1. In the left navigation pane, choose Roles, Create New Role.
  2. For Role Name, type “instance-role” and choose Next Step.
  3. Choose Amazon EC2 and choose Next Step.
  4. Add the policies AmazonEC2RoleforSSM and ASGBackupPolicy.
  5. Choose Next Step, Create Role.

Create the role for the Lambda function.

  1. In the left navigation pane, choose Roles, Create New Role.
  2. For Role Name, type “lambda-role” and choose Next Step.
  3. Choose AWS Lambda and choose Next Step.
  4. Add the policies AmazonSSMFullAccess, ASGBackupPolicy, and AWSLambdaBasicExecutionRole.
  5. Choose Next Step, Create Role.

Step 3 – Create an Auto Scaling group and configure the lifecycle hook

In this step, you create the Auto Scaling group and configure the lifecycle hook.

  1. Log into the EC2 console.
  2. Choose Launch Configurations, Create launch configuration.
  3. Select the latest Amazon Linux AMI and whatever instance type you prefer, and choose Next: Configuration details.
  4. For Name, type “ASGBackupLaunchConfiguration”.
  5. For IAM role, choose “instance-role” and expand Advanced Details.
  6. For User data, add the following lines to install and launch the SSM agent at instance boot:
    #!/bin/bash
    sudo yum install amazon-ssm-agent -y
    sudo /sbin/start amazon-ssm-agent
  7. Choose Skip to review, Create launch configuration, select your key pair, and then choose Create launch configuration.
  8. Choose Create an Auto Scaling group using this launch configuration.
  9. For Group name, type “ASGBackup”.
  10. Select your VPC and at least one subnet and then choose Next: Configuration scaling policies, Review, and Create Auto Scaling group.

Your Auto Scaling group is now created and you need to add the lifecycle hook named “ASGBackup” by using the AWS CLI:

aws autoscaling put-lifecycle-hook --lifecycle-hook-name ASGBackup --auto-scaling-group-name ASGBackup --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING --heartbeat-timeout 3600

Step 4 – Create an S3 bucket for files

Create an S3 bucket where your data will be saved, or use an existing one. To create a new one, you can use this AWS CLI command:

aws s3api create-bucket --bucket <your_bucket_name>

Step 5 – Create the SSM document

The following JSON document archives the files in “BACKUPDIRECTORY” and then copies them to your S3 bucket “S3BUCKET”. Every time this command completes its execution, a SNS message is sent to the SNS topic specified by the “SNSTARGET” variable and completes the lifecycle hook.

In your JSON document, you need to make a few changes according to your environment:

Auto Scaling group name (line 12) “ASGNAME=’ASGBackup'”,
Lifecycle hook name (line 13) “LIFECYCLEHOOKNAME=’ASGBackup'”,
Directory to back up (line 14) “BACKUPDIRECTORY=’/var/log'”,
S3 bucket (line 15) “S3BUCKET='<your_bucket_name>'”,
SNS target (line 16) “SNSTARGET=’arn:aws:sns:’${REGION}’:<your_account_id>:<your_sns_ backupoutcome_topic>”

Here is the document:

{
  "schemaVersion": "1.2",
  "description": "Backup logs to S3",
  "parameters": {},
  "runtimeConfig": {
    "aws:runShellScript": {
      "properties": [
        {
          "id": "0.aws:runShellScript",
          "runCommand": [
            "",
            "ASGNAME='ASGBackup'",
            "LIFECYCLEHOOKNAME='ASGBackup'",
            "BACKUPDIRECTORY='/var/log'",
            "S3BUCKET='<your_bucket_name>'",
            "SNSTARGET='arn:aws:sns:'${REGION}':<your_account_id>:<your_sns_ backupoutcome_topic>'",           
            "INSTANCEID=$(curl http://169.254.169.254/latest/meta-data/instance-id)",
            "REGION=$(curl http://169.254.169.254/latest/meta-data/placement/availability-zone)",
            "REGION=${REGION::-1}",
            "HOOKRESULT='CONTINUE'",
            "MESSAGE=''",
            "",
            "tar -cf /tmp/${INSTANCEID}.tar $BACKUPDIRECTORY &> /tmp/backup",
            "if [ $? -ne 0 ]",
            "then",
            "   MESSAGE=$(cat /tmp/backup)",
            "else",
            "   aws s3 cp /tmp/${INSTANCEID}.tar s3://${S3BUCKET}/${INSTANCEID}/ &> /tmp/backup",
            "       MESSAGE=$(cat /tmp/backup)",
            "fi",
            "",
            "aws sns publish --subject 'ASG Backup' --message \"$MESSAGE\"  --target-arn ${SNSTARGET} --region ${REGION}",
            "aws autoscaling complete-lifecycle-action --lifecycle-hook-name ${LIFECYCLEHOOKNAME} --auto-scaling-group-name ${ASGNAME} --lifecycle-action-result ${HOOKRESULT} --instance-id ${INSTANCEID}  --region ${REGION}"
          ]
        }
      ]
    }
  }
}
  1. Log into the EC2 console.
  2. Choose Command History, Documents, Create document.
  3. For Document name, enter “ASGLogBackup”.
  4. For Content, add the above JSON, modified for your environment.
  5. Choose Create document.

Step 6 – Create the Lambda function

The Lambda function uses modules included in the Python 2.7 Standard Library and the AWS SDK for Python module (boto3), which is preinstalled as part of Lambda. The function code performs the following:

  • Checks to see whether the SSM document exists. This document is the script that your instance runs.
  • Sends the command to the instance that is being terminated. It checks for the status of EC2 Run Command and if it fails, the Lambda function completes the lifecycle hook.
  1. Log in to the Lambda console.
  2. Choose Create Lambda function.
  3. For Select blueprint, choose Skip, Next.
  4. For Name, type “lambda_backup” and for Runtime, choose Python 2.7.
  5. For Lambda function code, paste the Lambda function from the GitHub repository.
  6. Choose Choose an existing role.
  7. For Role, choose lambda-role (previously created).
  8. In Advanced settings, configure Timeout for 5 minutes.
  9. Choose Next, Create function.

Your Lambda function is now created.

Step 7 – Configure CloudWatch Events to trigger the Lambda function

Create an event rule to trigger the Lambda function.

  1. Log in to the CloudWatch console.
  2. Choose Events, Create rule.
  3. For Select event source, choose Auto Scaling.
  4. For Specific instance event(s), choose EC2 Instance-terminate Lifecycle Action and for Specific group name(s), choose ASGBackup.
  5. For Targets, choose Lambda function and for Function, select the Lambda function that you previously created, “lambda_backup”.
  6. Choose Configure details.
  7. In Rule definition, type a name and choose Create rule.

Your event rule is now created; whenever your Auto Scaling group “ASGBackup” starts terminating an instance, your Lambda function will be triggered.

Step 8 – Test the environment

From the Auto Scaling console, you can change the desired capacity and the minimum for your Auto Scaling group to 0 so that the instance running starts being terminated. After the termination starts, you can see from Instances tab that the instance lifecycle status changed to Termination:Wait. While the instance is in this state, the Lambda function and the command are executed.

You can review your CloudWatch logs to see the Lambda output. In the CloudWatch console, choose Logs and /aws/lambda/lambda_backup to see the execution output.

You can go to your S3 bucket and check that the files were uploaded. You can also check Command History in the EC2 console to see if the command was executed correctly.

Conclusion

Now that you’ve seen an example of how you can combine various AWS services to automate the backup of your files by relying only on AWS services, I hope you are inspired to create your own solutions.

Auto Scaling lifecycle hooks, Lambda, and EC2 Run Command are powerful tools because they allow you to respond to Auto Scaling events automatically, such as when an instance is terminated. However, you can also use the same idea for other solutions like exiting processes gracefully before an instance is terminated, deregistering your instance from service managers, and scaling stateful services by moving state to other running instances. The possible use cases are only limited by your imagination.

Learn more about:

I’ve open-sourced the code in this example in the awslabs GitHub repo; I can’t wait to see your feedback and your ideas about how to improve the solution.

AWS Lambda sessions at re:Invent 2016

Vyom Nagrani Vyom Nagrani, Manager of Product Management, AWS Lambda

AWS Lambda was announced on 13th Nov, 2014, which makes us 2 years old today! We have come a long way in these last 2 years, with many small and big customers using AWS Lambda in production, many new features, and a broader partner network.

Come talk to the AWS Lambda team at re:Invent 2016, where we plan to host a re:Source Mini Con at The Mirage focused on Serverless Computing. The AWS Lambda subject matter experts and existing AWS Lambda customers will be presenting on the following breakout sessions:

Breakout Sessions at Serveless Computing Mini Con

  • SVR202 – What’s New with AWS Lambda
  • SVR301 – Real-time Data Processing Using AWS Lambda
  • SVR302 – Optimizing the Data Tier in Serverless Web Applications
  • SVR303 – Coca-Cola: Running Serverless Applications with Enterprise Requirements
  • SVR304 – bots + serverless = ❤
  • SVR305 – ↑↑↓↓←→←→ BA Lambda Start
  • SVR306 – Serverless Computing Patterns at Expedia
  • SVR307 – Application Lifecycle Management in a Serverless World
  • SVR308 – Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year
  • SVR311 – The State of Serverless Computing
  • SVR401 – Using AWS Lambda to Build Control Systems for Your AWS Infrastructure
  • SVR402 – Operating Your Production API

There will be many other breakout sessions in other tracks that talk about AWS Lambda, a few ones I’d like to highlight are:

Additional Breakout Sessions related to AWS Lambda

  • ALX302 – Build a Serverless Back End for Your Alexa-Based Voice Interactions
  • ARC202 – Accenture Cloud Platform Serverless Journey
  • ARC312 – Compliance Architecture: How Capital One Automates the Guard Rails for 6,000 Developers
  • BDM303 – JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing
  • CMP211 – Getting Started with Serverless Architectures
  • DAT309 – How Fulfillment by Amazon (FBA) and Scopely Improved Results and Reduced Costs with a Serverless Architecture
  • DEV205 – Monitoring, Hold the Infrastructure: Getting the Most from AWS Lambda
  • DEV301 – Amazon CloudWatch Logs and AWS Lambda: A Match Made in Heaven
  • DEV308 – Chalice: A Serverless Microframework for Python

Feel you know enough about AWS Lambda, and would rather build something on top of it? We will also be hosting the following Workshops:

Workshops related to AWS Lambda

  • SPL02 – Spotlight Lab: Serverless Architectures Using Amazon CloudWatch Events and Scheduled Events with AWS Lambda
  • ALX203 – Workshop: Creating Voice Experiences with Alexa Skills: From Idea to Testing in Two Hours
  • DCS205 – Workshop: Building Serverless Bots on AWS – Botathon
  • SVR309 – Wild Rydes Takes Off – The Dawn of a New Unicorn
  • SVR310 – All Your Chats are Belong to Bots: Building a Serverless Customer Service Bot
  • MBL305 – Developing Mobile Apps and Serverless Microservices for Enterprises using AWS
  • MBL306 – Serverless Authentication and Authorization: Identity Management for Serverless Architectures

Still have questions, join us for an open Q&A session at the Dev Lounge at the Venetian at 3:00 pm on Thursday 12/1. And of course, we will be at the AWS Booth at re:Invent Central at the Expo hall, come talk to us at the Compute table.

Looking forward to meeting you at re:Invent 2016!

Build Serverless Applications in AWS Mobile Hub with New Cloud Logic and User Sign-in Features

Last month, we showed you how to power a mobile back end using a serverless stack, with your business logic in AWS Lambda and the resulting cloud APIs exposed to your app through Amazon API Gateway. This pattern enables you to create and test mobile cloud APIs backed by business logic functions you develop, all without managing servers or paying for unused capacity. Further, you can share your business logic across your iOS and Android apps.

Today, AWS Mobile Hub is announcing a new Cloud Logic feature that makes it much easier for mobile app developers to implement this pattern, integrate their mobile apps with the resulting cloud APIs, and connect the business logic functions to a range of AWS services or on-premises enterprise resources. The feature automatically applies access control to the cloud APIs in API Gateway, making it easy to limit access to app users who have authenticated with any of the user sign-in options in Mobile Hub, including two new options that are also launching today:

  • Fully managed email- and password-based app sign-in
  • SAML-based app sign-in

In this post, we show how you can build a secure mobile back end in just a few minutes using a serverless stack.

 

Get started with AWS Mobile Hub

We launched Mobile Hub last year to simplify the process of building, testing, and monitoring mobile applications that use one or more AWS services. Use the integrated Mobile Hub console to choose the features you want to include in your app.

With Mobile Hub, you don’t have to be an AWS expert to begin using its powerful back-end features in your app. Mobile Hub then provisions and configures the necessary AWS services on your behalf and creates a working quickstart app for you. This includes IAM access control policies created to save you the effort of provisioning security policies for resources such as Amazon DynamoDB tables and associating those resources with Amazon Cognito.

Get started with Mobile Hub by navigating to it in the AWS console and choosing your features.

serverless-apps-mobile-hub-console

 

New user sign-in options

We are happy to announce that we now support two new user sign-in options that help you authenticate your app users and provide secure access to control to AWS resources.

The Email and Password option lets you easily provision a fully managed user directory for your app in Amazon Cognito, with sign-in parameters that you configure. The SAML Federation option enables you to authenticate app users using existing credentials in your SAML-enabled identity provider, such as Active Directory Federation Service (ADFS). Mobile Hub also provides ready-to-use app flows for sign-up, sign-in, and password recovery codes that you can add to your own app.

Navigate to the User Sign-in tile in Mobile Hub to get started and choose your sign-in providers.

serverless-apps-mobile-hub-sign-in

Read more about the user sign-in feature in this blog and in the Mobile Hub documentation.

 

Enhanced Cloud Logic

We have enhanced the Cloud Logic feature (the right-hand tile in the top row of the above Mobile Hub screenshot), and you can now easily spin up a serverless stack. This enables you to create and test mobile cloud APIs connected to business logic functions that you develop. Previously, you could use Mobile Hub to integrate existing Lambda functions with your mobile app. With the enhanced Cloud Logic feature, you can now easily create Lambda functions, as well as API Gateway endpoints that you invoke from your mobile apps.

The feature automatically applies access control to the resulting REST APIs in API Gateway, making it easy to limit access to users who have authenticated with any of the user sign-in capabilities in Mobile Hub. Mobile Hub also allows you to test your APIs within your project and set up the permissions that your Lambda function needs for connecting to software resources behind a VPC (e.g., business applications or databases), within AWS or on-premises. Finally, you can integrate your mobile app with your cloud APIs using either the quickstart app (as an example) or the mobile app SDK; both are custom-generated to match your APIs. Here’s how it comes together:

serverless-apps-mobile-hub-enhanced-cloud-logic

Create an API

After you have chosen a sign-in provider, choose Configure more features. Navigate to Cloud Logic in your project and choose Create a new API. You can choose to limit access to your Cloud Logic API to only signed-in app users:

serverless-apps-mobile-hub-cloud-logic-api

Under the covers, this creates an IAM role for the API that limits access to authenticated, or signed-in, users.

serverless-apps-mobile-hub-cloud-logic-auth

Quickstart app

The resulting quickstart app generated by Mobile Hub allows you to test your APIs and learn how to develop a mobile UX that invokes your APIs:

serverless-apps-mobile-hub-quickstart-app

Multi-stage rollouts

To make it easy to deploy and test your Lambda function quickly, Mobile Hub provisions both your API and the Lambda function in a Development stage, for instance, https://<yoururl>/Development. This is mapped to a Lambda alias of the same name, Development. Lambda functions are versioned, and this alias is always points to the latest version of the Lambda function. This way, changes you make to your Lambda function are immediately reflected when you invoke the corresponding API in API Gateway.

When you are ready to deploy to production, you can create more stages in API Gateway, such as Production. This gives you an endpoint such as https://<yoururl>/Production. Then, create an alias of the same name in Lambda but point this alias to a specific version of your Lambda function (instead of $LATEST). This way, your Production endpoint always points to a known version of your Lambda function.

 

Summary

In this post, we demonstrated how to use Mobile Hub to create a secure serverless back end for your mobile app in minutes using three new features – enhanced Cloud Logic, email and password-based app sign-in, and SAML-based app sign-in. While it was just a few steps for the developer, Mobile Hub performed several underlying steps automatically–provisioning back-end resources, generating a sample app, and configuring IAM roles and sign-in providers–so you can focus your time on the unique value in your app. Get started today with AWS Mobile Hub.

Ad Hoc Big Data Processing Made Simple with Serverless MapReduce


Sunil Mallya
Solutions Architect

Big data processing solutions have been using AWS Lambda more lately; customers have been creating solutions such as building metadata indexes for Amazon S3 using Lambda and Amazon DynamoDB and stream processing of data in S3. In this post, I discuss a serverless architecture to run MapReduce jobs using Lambda and S3.

Big data AWS services

Lambda launched in 2015, enabling customers to execute code on demand without any dedicated infrastructure. Since then, customers have successfully used Lambda for various use cases like event-driven processing for data ingested into S3 or Amazon Kinesis, web API backends, and producer/consumer architectures, among others. Lambda has emerged as a key component in building serverless architectures for these architecture paradigms.

S3 integrates directly into other AWS services, such as providing an easy export facility into Amazon Redshift and Amazon Elasticsearch Service, and providing an underlying file system (EMRFS) for Amazon EMR, making it an ideal data lake in the AWS ecosystem.

Amazon Redshift, EMR and the Hadoop ecosystem offer comprehensive solutions for big data processing. While EMR makes it easy, fast and cost-effective to run Hadoop clusters, it still requires provisioning servers, as well as knowledge of Hadoop and its components.

Serverless MapReduce overview

I wanted to extend some of these solutions and present a serverless architecture along with a customizable framework that enables customers to run ad hoc map reduce jobs. Apart from the benefit of not having to manage any servers, this framework was significantly cheaper and, in some cases, faster than existing solutions when running the well-known big data processing Amplab benchmark.

This framework allows you to process big data with no operational effort and no prior Hadoop knowledge. While Hadoop is the most popular big data processing framework today, it does have a steep learning curve given the number of components and their inherent complexity. By minimizing the number of components in the framework and abstracting the infrastructure management, it simplifies data processing for developers or data scientists. They can focus on modeling the data processing problem and deriving value from the data stored in S3.

In addition to reduced complexity, this solution is much cheaper than existing solutions for ad hoc MapReduce workloads. Given that the solution is serverless, customers pay only when the MapReduce job is executed. The cost for the solution is the aggregate usage cost of Lambda and S3. Given that both the services are on-demand, you can compute the cost per query, a feature that's unique to the solution. This is extremely useful as you can now budget your data processing needs precisely to the query.

For customers who want enhanced security, they can process the data in a VPC by configuring the Lambda functions to access resources in a VPC and creating a VPC endpoint for S3. There are no major performance implications of running multiple variants of the same job or different jobs on the same dataset concurrently. The Lambda function resources are independent, thus allowing for fast iterations and development. This technology is really exciting for our customers, as it enables a truly pay-for-what-you-use cost effective and secure model for big data processing.

Reference architecture for serverless MapReduce

The goals are to:

  • Abstract infrastructure management
  • Get close to a "zero" setup time
  • Provide a pay-per-execution model for every job
  • Be cheaper than other data processing solutions
  • Enable multiple executions on the same dataset

The architecture is composed of three main Lambda functions:

  • Mapper
  • Reducer
  • Coordinator

The MapReduce computing paradigm requires maintaining state, so architecturally you require a coordinator or a scheduler to connect the map phase of the processing to the reduce phase. In this architecture, you use a Lambda function as a coordinator to make this completely serverless. The coordinator maintains all its state information by persisting the state data in S3.

Execution workflow:

  • The workflow starts with the invocation of the driver script that reads in the config file, which includes the mapper and reducer function file paths and the S3 bucket or folder path.
  • The driver function finds the relevant objects in S3 to process by listing the keys and matching by prefix in the S3 bucket. The keys are aggregated to created batches, which are then passed to the mapper. The batch size is determined by a simple heuristic that tries to achieve maximum parallelism while optimizing the data fit based on the mapper memory size.
  • Mapper, reducer, and coordinator Lambda functions are created and the code is uploaded.
  • A S3 folder called job folder is created as a temporary workspace for the current job and is configured as an event source for the coordinator.
  • The mappers write their outputs to the job folder; after all the mappers finish, the coordinator uses the aforementioned logic to create batches and invoke the reducers.
  • The coordinator is notified through the S3 event source mapping when each of the reducers end and continues to create subsequent stages of reducers until a single reduced output is created.

The following diagram shows the overall architecture:

Getting started with serverless MapReduce

In this post, I show you how to build and deploy your first serverless MapReduce job with this framework for data stored in S3. The code used in this post, along with detailed instructions about how to set up and run the application, is available in the awslabs lambda-refarch-mapreduce GitHub repo.

Use the dataset generated by Intel's Hadoop benchmark tools and data sampled from the Common Crawl document corpus also used by the Amplab benchmark.

For the first job, you compute an aggregation (i.e., sum of ad revenue for every source IP address) on the dataset of size 25.4GB and 155 million individual rows.

Dataset:

s3://big-data-benchmark/pavlo/text/1node/

Schema in CSV:

Each row of the Uservisits dataset is composed of the following:

sourceIP VARCHAR(116)
destURL VARCHAR(100)
visitDate DATE
adRevenue FLOAT
userAgent VARCHAR(256)
countryCode CHAR(3)
languageCode CHAR(6)
searchWord VARCHAR(32)
duration INT

SQL representation of the intended operation:

SELECT SUBSTR(sourceIP, 1, 8), SUM(adRevenue) FROM uservisits GROUP BY SUBSTR(sourceIP, 1, 8)

Prerequisites

You need the following prerequisites for working with this serverless MapReduce framework:

  • AWS account
  • S3 bucket
  • Lambda execution role
  • IAM credentials with access to create Lambda functions and list S3 buckets

Limits

In its current incarnation, this framework is best suited for ad hoc processing of data in S3. For sustained usage and time sensitive workloads, Amazon Redshift or EMR may be better suited. The framework has the following limits:

  • By default, each account has a concurrent Lambda execution limit of 100. This is a soft limit and can be increased by opening up a limit increase support ticket.
  • Lambda currently has a maximum container size limit of 1536 MB.

Components

The application has the following components:

  • Driver script and config file
  • Mapper Lambda function
  • Reducer Lambda function
  • Coordinator Lambda function

Driver script and config file

The driver script creates and configures the mapper, reducer, and coordinator Lambda functions in your account based on the config file. Here is an example configuration file that the driver script uses.

  sourceBucket: "big-data-benchmark",
  jobBucket: "my-job-bucket",
  prefix: "pavlo/text/1node/uservisits/",
  region: "us-east-1",
  lambdaMemory: 1536,
  concurrentLambdas: 100,
  mapper: {
        name: "mapper.py",
        handler: "mapper.handler",
        zip: "mapper.zip"
    },
  reducer:{
        name: "reducer.py",
        handler: "reducer.handler",
        zip: "reducer.zip"
    },
  reducerCoordinator:{
        name: "reducerCoordinator.py",
        handler: "reducerCoordinator.handler",
        zip: "reducerCor.zip"
    }

Mapper Lambda function

This is where you perform your map logic; the data is streamed line by line into your mapper function. In this function, you map individual records to a value or as an optimization; also, you need to perform the first reduce step especially when computing aggregations. This is efficient given that you may be reading from multiple input sources in the mapper. In this example, the mapper maps the sourceIP on to adRevenue and stores in a dictionary a running total of the adRevenue of every sourceIP.

    # Download and process all keys

    for key in src_keys:
        response = s3_client.get_object(Bucket=src_bucket, Key=key)
        contents = response['Body'].read()
        for line in contents.split('\n')[:-1]:
            line_count +=1
            try:
                data = line.split(',')
                srcIp = data[0][:8]
                if srcIp not in totals:
                    totals[srcIp] = 0
                totals[srcIp] += float(data[3])
            except Exception, e:
                print e

Reducer Lambda function

The mapper and reducer functions look identical structurally, but perform different operations on the data. The reducer keeps a dictionary of aggregate sums of adRevenue of every sourceIP. The reducer is also responsible for the termination of the job. When the final reducer runs, it then reduces the intermediate results into a single file stored in the job bucket.

    # Download and process all mapper output

    for key in reducer_keys:
        response = s3_client.get_object(Bucket=job_bucket, Key=key)
        contents = response['Body'].read()
        try:
            for srcIp, val in json.loads(contents).iteritems():
                line_count +=1
                if srcIp not in results:
                    results[srcIp] = 0
                results[srcIp] += float(val)
        except Exception, e:
            print e

Coordinator Lambda function

The coordinator function keeps track of the job state with the job bucket as a Lambda event source. It is invoked every time a mapper or reducer finishes. After all the mappers finish, it then invokes the reducers to compute the final output.

The pseudo code for the coordinator looks like the following:

If all mappers are done:
    If currently in the reduce phase:
            number_of_reducers = compute_number_of_reducers(previous_reducer_step_outputs)
Else:
    number_of_reducers = compute_number_of_reducers(mapper_outputs)
    If  number_of_reducers == 1:
       Invoke single reducer and write the results to S3
Job done;
    Else create event source for reducer
Else:
    Return

The coordinator doesn't need wait for all the mappers to finish in order to start the reducers, but for simplicity, the first version of the framework chooses to wait for all mappers.

Running the job

The driver function is the interface to start the job. It reads the job details like the mapper and reducer code source to create the Lambda functions in your account for the serverless MapReduce job.

Execute the following command:

$ python driver.py

Intermediate outputs from the mappers and reducers are stored in the specified job bucket and the final result is stored as JobBucket/JobID/result. The contents of the job bucket look like the following:

smallya$ aws s3 ls s3://JobBucket/py-bl-1node-2 --recursive --human-readable --summarize

2016-09-26 15:01:17   69 Bytes py-bl-1node-2/jobdata
2016-09-26 15:02:04   74 Bytes py-bl-1node-2/reducerstate.1
2016-09-26 15:03:21   51.6 MiB py-bl-1node-2/result
2016-09-26 15:01:46   18.8 MiB py-bl-1node-2/task/
….
smallya$ head –n 3 result

67.23.87,5.874290244999999
30.94.22,96.25011190570001
25.77.91,14.262780186000002

Cost Analysis

The cost for this job is 2.49 cents, which processed over 25 GB of data and took less than 2 minutes. The majority of the cost can be attributed to the Lambda components and the mapper function in particular, given that the map phase takes the longest and is the most resource-intensive.

A component cost breakdown for the example above is plotted in the following chart.

The cost model is shown to scale almost linearly when the same job is run for a bigger dataset (126.8 GB, 775 million rows) for the Amplab benchmark (more details in the awslabs lambda-refarch-mapreduce GitHub repo), costing around 11 cents and executing in 3.5 minutes.

Summary

In this post, I showed you how to build a simple MapReduce task using a serverless framework for data stored in S3. If you have ad hoc data processing workloads, the cost effectiveness, speed, and price-per-query model makes it very suitable.

The code has been made available in the awslabs lambda-refarch-mapreduce GitHub repo . You can modify or extend this framework to build more complex applications for your data processing needs.

If you have questions or suggestions, please comment below.

Powering Mobile Backend Services with AWS Lambda and Amazon API Gateway

Daniel Austin
Solutions Architect

Asif Khan
Solutions Architect

Have you ever wanted to create a mobile REST API quickly and easily to make database calls and manipulate data sources? The Node.js and Amazon DynamoDB tutorial shows how to perform CRUD operations (Create, Read, Update, and Delete) easily on DynamoDB tables using Node.js.

In this post, I extend that concept by adding a REST API that calls an AWS Lambda function from Amazon API Gateway. The API allows you to perform the same operations on DynamoDB, from any HTTP-enabled device, such as a browser or mobile phone. The client device doesn't need to load any libraries, and with serverless architecture and API Gateway, you don't need to maintain any servers at all!

Walkthrough

In this post, I show you how to write a Lambda function so that it can handle all of the API calls in a single code function, and then add a RESTful API on top of it. API Gateway tells you what function was called.

The problem to solve: how to use API Gateway, AWS Lambda, and DynamoDB to simplify DynamoDB access? Our approach involves using a single Lambda function to provide a CRUD façade on DynamoDB. This required solving two additional problems:

  1. Sending the date from API Gateway about which API method was called, along with POST information about the DynamoDB operations. This is solved by using a generic mapping template for each API call, sending all HTTP data to the Lambda function in a standard JSON format, including the path of the API call, i.e., '/movies/add-movie'.

  2. Providing a generic means in Node.js to use multiple function calls and properly use callbacks to send the function results back to the API, again in a standard JSON format. This required writing a generic callback mechanism (a very simple one) that is invoked by each function, and gathers the data for the response.

This is a very cool and easy way to implement basic DynamoDB functions to HTTP(S) calls from API Gateway. It works from any browser or mobile device that understands HTTP.

Mobile developers can write backend code in Java, Node.js, or Python and deploy on Lambda.

In this post, I continue the demonstration with a sample mobile movies database backend, written in Node.js, using DynamoDB. The API is hosted on API Gateway.

Optionally, you can use AWS Mobile Hub to develop and test the mobile client app.

The steps to deploy a mobile backend in Lambda are:

  1. Set up IAM users and roles to allow access to Lambda and DynamoDB.
  2. Download the sample application and edit to include your configuration.
  3. Create a table in DynamoDB using the console or the AWS CLI.Create a new Lambda function and upload the sample app.
  4. Create endpoints in API Gateway
  5. Test the API and Lambda function.

Set up IAM roles to allow access to Lambda and DynamoDB

To set up your API, you need an IAM user and role that has permissions to access DynamoDB and Lambda.

In the IAM console, choose Roles , Create role. Choose AWS Lambda from the list of service roles, then choose AmazonDynamoDBFullAccess and attach another policy, AWSLambdaFullAccess. You need to add this role to an IAM user: you can create a new user for this role, or use an existing one.

Download the sample application and edit to include your configuration

Now download the sample application and edit its configuration file.

The archive can be downloaded from GitHub: https://github.com/awslabs/aws-serverless-crud-sample

git clone https://github.com/awslabs/aws-serverless-crud-sample

After you download the archive, unzip it to an easily-found location and look for the file app_config.json. This file contains set up information for your Lambda function. Edit the file to include your access key ID and secret access key. If you created a new user in step 1, use those credentials. You also need to add your AWS region to the file – this is the region where you will create the DynamoDB table.

Create a table in DynamoDB using the console or the AWS CLI.

To create a table in DynamoDB, use the instructions in the Node.js and DynamoDB tutorial, in the Amazon DynamoDB Getting Started Guide. Next, run the file createMoviesTable.js from the downloaded code in the previous step. You could also use the AWS CLI with this input:

aws DynamoDB create-table  --cli-input-json file://movies-table.json  --region us-west-2  --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

The file movies-table.json is in the code archive linked below. If you use the CLI, then the user must have sufficient permissions.

IMPORTANT : The table must be created before completing the rest of the walkthrough.

Create a new Lambda function and upload the sample app.

It can be a little tricky creating the archive for the Lambda function. Make sure you are not zipping the folder, but its contents. This is important; it should look like the following:

This creates a file called "Archive.zip", which is the file to be uploaded to Lambda.

In the Lambda console, choose Create a Lambda function and skip the Blueprints and Configure triggers sections. In the Configure function section, for Name , enter 'movies-db'. For Runtime , choose 'Node.js4.3'. For Code entry type, choose 'Upload a zip file'. Choose Upload and select the archive file that you created in the previous step. For Handler , choose 'movies-dynamodb.handler', which is the name of the JavaScript function inside the archive that will be called via Lambda from API Gateway. For Role , choose Choose an existing role and select the role that you created in the first step.

You can leave the other options unchanged and then review and create your Lambda function. You can test the function using the following bit of JSON (this mimics the data that will be sent to the Lambda function from API Gateway):

{
  "method": "POST",
  "body" : { "title": "Godzilla vs. Dynamo", "year": "2016", "info": "New from Acme Films, starring John Smith."},
  "headers": {
      },
  "queryParams": {
      },
  "pathParams": {
      },
  "resourcePath": "/add-movie"
}

Create endpoints in API Gateway

Now, you create five API methods in API Gateway. First, navigate to the API Gateway console and choose Create API , New API. Give the API a name, such as 'MoviesDP-API'.

In API Gateway, you first create resources and then the methods associated with those resources. The steps for each API call are basically the same:

  1. Create the resource (/movies or /movies/add-movie).
  2. Create a method for the resource – GET for /movies, POST for all others
  3. Choose Integration request , Lambda and select the node-movies Lambda function created earlier. All the API calls use the same Lambda function.
  4. Under Integration request , choose Body mapping templates , create a new template with type application/json , and copy the template file shown below into the form input.
  5. Choose Save.

Use these steps for the following API resources and methods:

  • /movies – lists all movies in the DynamoDB table
  • /movies/add-movie – add an item to DynamoDB
  • /movies/delete-movie – deletes a movie from DynamoDB
  • /movies/findbytitleandyear – finds a movie with a specific title and year
  • /movies/update-movie – modifies and existing movie item in DynamoDB

This body mapping template is a standard JSON template for passing information from API Gateway to Lambda. It provides the calling Lambda function with all of the HTTP input data – including any path variables, query strings, and most importantly for this purpose, the resourcePath variable, which contains the information from API Gateway about which function was called.

Here's the template:

{
  "method": "$context.httpMethod",
  "body" : $input.json('$'),
  "headers": {
    #foreach($param in $input.params().header.keySet())
    "$param": "$util.escapeJavaScript($input.params().header.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "queryParams": {
    #foreach($param in $input.params().querystring.keySet())
    "$param": "$util.escapeJavaScript($input.params().querystring.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "pathParams": {
    #foreach($param in $input.params().path.keySet())
    "$param": "$util.escapeJavaScript($input.params().path.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "resourcePath": "$context.resourcePath"
}

Notice the last line where the API Gateway variable $context.resourcePath is sent to the Lambda function as the JSON value of a field called (appropriately enough) resourcePath. This value is used by the Lambda function to perform the required action on the DynamoDB table.

(I originally found this template online, and modified it to add variables like resourcePath. Thanks to Kenn Brodhagen!)

As you create each API method, copy the requestTemplate.vel file from the original code archive and paste it into the Body mapping template field, using application/json as the type. Do this for each API call (using the same file). The file is the same one shown above, but it's easier to copy the file than cut and paste from a web page!

After the five API calls are created, you are almost done. You need to test your API, and then deploy it before it can be used from the public web.

Test your API and Lambda function

To test the API, add a movie to the DB. The API Gateway calls expect a JSON payload that is sent via POST to your Lambda function. Here's an example, it's the same one used above to test the Lambda function:

{ "title": "Love and Friendship", "year": "2016", "info": "New from Amazon Studios, starring Kate Beckinsale."}

To add this movie to your DB, test the /movies/add-movie method, as shown here:

Check logs on the test page of your Lambda function and in API Gateway

[picture10.png]

Conclusion

In this post, I demonstrated a quick way to get started on AWS Lambda for your mobile backend. You uploaded a movie database app which performed CRUD operations on a movies DB table in DynamoDB. The API was hosted on API Gateway for scale and manageability. With the above deployment, you can focus on building your API and mobile clients with serverless architecture. You can reuse the mapping template in future API Gateway projects.

Interested in reading more? See Access Resources in a VPC from Your Lambda Functions and

AWS Mobile Hub – Build, Test, and Monitor Mobile Applications.

If you have questions or suggestions, please comment below.

Going Serverless: Migrating an Express Application to Amazon API Gateway and AWS Lambda

Brett Andrews
Brett Andrews
Software Development Engineer

Amazon API Gateway recently released three new features that simplify the process of forwarding HTTP requests to your integration endpoint: greedy path variables, the ANY method, and proxy integration types. With this new functionality, it becomes incredibly easy to run HTTP applications in a serverless environment by leveraging the aws-serverless-express library.

In this post, I go through the process of porting an "existing" Node.js Express application onto API Gateway and AWS Lambda, and discuss some of the advantages, disadvantages, and current limitations. While I use Express in this post, the steps are similar for other Node.js frameworks, such as Koa, Hapi, vanilla, etc.

Modifying an existing Express application

Express is commonly used for both web applications as well as REST APIs. While the primary API Gateway function is to deliver APIs, it can certainly be used for delivering web apps/sites (HTML) as well. To cover both use cases, the Express app below exposes a web app on the root / resource, and a REST API on the /pets resource.

The goal of this walkthrough is for it to be complex enough to cover many of the limitations of this approach today (as comments in the code below), yet simple enough to follow along. To this end, you implement just the entry point of the Express application (commonly named app.js) and assume standard implementations of views and controllers (which are more insulated and thus less affected). You also use MongoDB, due to it being a popular choice in the Node.js community as well as providing a time-out edge case. For a greater AWS serverless experience, consider adopting Amazon DynamoDB.

//app.js
'use strict'
const path = require('path')
const express = require('express')
const bodyParser = require('body-parser')
const cors = require('cors')
const mongoose = require('mongoose')
// const session = require('express-session')
// const compress = require('compression')
// const sass = require('node-sass-middleware')

// Lambda does not allow you to configure environment variables, but dotenv is an
// excellent and simple solution, with the added benefit of allowing you to easily
// manage different environment variables per stage, developer, environment, etc.
require('dotenv').config()

const app = express()
const homeController = require('./controllers/home')
const petsController = require('./controllers/pets')

// MongoDB has a default timeout of 30s, which is the same timeout as API Gateway.
// Because API Gateway was initiated first, it also times out first. Reduce the
// timeout and kill the process so that the next request attempts to connect.
mongoose.connect(process.env.MONGODB_URI, { server: { socketOptions: { connectTimeoutMS: 10000 } } })
mongoose.connection.on('error', () => {
   console.error('Error connecting to MongoDB.')
   process.exit(1)
})

app.set('views', path.join(__dirname, 'views'))
app.set('view engine', 'pug')
app.use(cors())
app.use(bodyParser.json())
app.use(bodyParser.urlencoded({ extended: true }))

/*
* GZIP support is currently not available to API Gateway.
app.use(compress())

* node-sass is a native binary/library (aka Addon in Node.js) and thus must be
* compiled in the same environment (operating system) in which it will be run.
* If you absolutely need to use a native library, you can set up an Amazon EC2 instance
* running Amazon Linux for packaging your Lambda function.
* In the case of SASS, I recommend to build your CSS locally instead and
* deploy all static assets to Amazon S3 for improved performance.
const publicPath = path.join(__dirname, 'public')
app.use(sass({ src: publicPath, dest: publicPath, sourceMap: true}))
app.use(express.static(publicPath, { maxAge: 31557600000 }))

* Storing local state is unreliable due to automatic scaling. Consider going stateless (using REST),
* or use an external state store (for MongoDB, you can use the connect-mongo package)
app.use(session({ secret: process.env.SESSION_SECRET }))
*/

app.get('/', homeController.index)
app.get('/pets', petsController.listPets)
app.post('/pets', petsController.createPet)
app.get('/pets/:petId', petsController.getPet)
app.put('/pets/:petId', petsController.updatePet)
app.delete('/pets/:petId', petsController.deletePet)

/*
* aws-serverless-express communicates over a Unix domain socket. While it's not required
* to remove this line, I recommend doing so as it just sits idle.
app.listen(3000)
*/

// Export your Express configuration so that it can be consumed by the Lambda handler
module.exports = app

Assuming that you had the relevant code implemented in your views and controllers directories and a MongoDB server available, you could uncomment the listen line, run node app.js and have an Express application running at http://localhost:3000. The following "changes" made above were specific to API Gateway and Lambda:

  • Used dotenv to set environment variables.
  • Reduced the timeout for connecting to DynamoDB so that API Gateway does not time out first.
  • Removed the compression middleware as API Gateway does not (currently) support GZIP.
  • Removed node-sass-middleware (I opted for serving static assets through S3, but if there is a particular native library your application absolutely needs, you can build/package your Lambda function on an EC2 instance).
  • Served static assets through S3/CloudFront. Not only is S3 a better option for static assets for performance reasons, API Gateway does not currently support binary data (e.g., images).
  • Removed session state for scalability (alternatively, you could have stored session state in MongoDB using connect-mongo.
  • Removed app.listen() as HTTP requests are not being sent over ports (not strictly required).
  • Exported the Express configuration so that you can consume it in your Lambda handler (more on this soon).

Going serverless with aws-serverless-express

In order for users to be able to hit the app (or for developers to consume the API), you first need to get it online. Because this app is going to be immensely popular, you obviously need to consider scalability, resiliency, and many other factors. Previously, you could provision some servers, launch them in multiple Availability Zones, configure Auto Scaling policies, ensure that the servers were healthy (and replace them if they weren't), keep up-to-date with the latest security updates, and so on…. As a developer, you just care that your users are able to use the product; everything else is a distraction.

Enter serverless. By leveraging AWS services such as API Gateway and Lambda, you have zero servers to manage, automatic scaling out-of-the-box, and true pay-as-you-go: the promise of the cloud, finally delivered.

The example included in the aws-serverless-express library library includes a good starting point for deploying and managing your serverless resources.

  1. Clone the library into a local directory git clone https://github.com/awslabs/aws-serverless-express.git
  2. From within the example directory, run npm run config <accountId> <bucketName> [region] (this modifies some of the files with your own settings).
  3. Edit the package-function command in package.json by removing "index.html" and adding "views" and "controllers" (the additional directories required for running your app).
  4. Copy the following files in the example directory into your existing project's directory:

    • simple-proxy-api.yaml – A Swagger file that describes your API.
    • cloudformation.json – A CloudFormation template for creating the Lambda function and API.
    • package.json – You may already have a version of this file, in which case just copy the scripts and config sections. This includes some helpful npm scripts for managing your AWS resources, and testing and updating your Lambda function.
    • api-gateway-event.json – Used for testing your Lambda function locally.
    • lambda.js – The Lambda function, a thin wrapper around your Express application.

    Take a quick look at lambda.js so that you understand exactly what's going on there. The aws-serverless-express library transforms the request from the client (via API Gateway) into a standard Node.js HTTP request object; sends this request to a special listener (a Unix domain socket); and then transforms it back for the response to API Gateway. It also starts your Express server listening on the Unix domain socket on the initial invocation of the Lambda function. Here it is in its entirety:

// lambda.js
'use strict'
const awsServerlessExpress = require('aws-serverless-express')
const app = require('./app')
const server = awsServerlessExpress.createServer(app)

exports.handler = (event, context) => awsServerlessExpress.proxy(server, event, context)

TIP: Everything outside of the handler function is executed only one time per container: that is, the first time your app receives a request (or the first request after several minutes of inactivity), and when it scales up additional containers.

Deploying

Now that you have more of an understanding of how API Gateway and Lambda communicate with your Express server, it's time to release your app to the world.

From the project's directory, run:

npm run setup

This command creates the Amazon S3 bucket specified earlier (if it does not yet exist); zips the necessary files and directories for your Lambda function and uploads it to S3; uploads simple-proxy-api.yaml to S3; creates the CloudFormation stack; and finally opens your browser to the AWS CloudFormation console where you can monitor the creation of your resources. To clean up the AWS resources created by this command, simply run npm run delete-stack. Additionally, if you specified a new S3 bucket, run npm run delete-bucket.

After the status changes to CREATE_COMPLETE (usually after a couple of minutes), you see three links in the Outputs section: one to the API Gateway console, another to the Lambda console, and most importantly one for your web app/REST API. Clicking the link to your API displays the web app; appending /pets in the browser address bar displays your list of pets. Your Express application is available online with automatic scalability and pay-per-request without having to manage a single server!

Additional features

Now that you have your REST API available to your users, take a quick look at some of the additional features made available by API Gateway and Lambda:

  • Usage plans for monetizing the API
  • Caching to improve performance
  • Authorizers for authentication and authorization microservices that determine access to your Express application
  • Stages and versioning and aliases when you need additional stages or environments (dev, beta, prod, etc.)
  • SDK generation to provide SDKs to consumers of your API (available in JavaScript, iOS, Android Java, and Android Swift)
  • API monitoring for logs and insights into usage

After running your Express application in a serverless environment for a while and learning more about the best practices, you may start to want more: more performance, more control, more microservices!

So how do you take your existing serverless Express application (a single Lambda function) and refactor it into microservices? You strangle it. Take a route, move the logic to a new Lambda function, and add a new resource or method to your API Gateway API. However, you'll find that the tools provided to you by the aws-serverless-express example just don't cut it for managing these additional functions. For that, you should check out Claudia; it even has support for aws-serverless-express.

Conclusion

To sum it all up, you took an existing application of moderate complexity, applied some minimal changes, and deployed it in just a couple of commands. You now have no servers to manage, automatic scaling out-of-the-box, true pay-as-you-go, loads of features provided by API Gateway, and as a bonus, a great path forward for refactoring into microservices.

If that's not enough, or this server-side stuff doesn't interest you and the front end is where you live, consider using aws-serverless-express for server-side rendering of your single page application.

If you have questions or suggestions, please comment below.

Easier integration with AWS Lambda and Amazon API Gateway

This week, Amazon API Gateway announced three new features that make it easier for you to leverage API Gateway and AWS Lambda to build your serverless applications.

First, we now support catch-all path variables. You can define routes such as /store/{proxy+}, where the + symbol tells API Gateway to intercept all requests to the /store/* path. Second, we now support a new method type called ANY. You can use the catch-all ANY method to define the same integration behavior for all requests (GET, POST, etc). Third, you can now use a new proxy integration type for Lambda functions and HTTP endpoints. Lambda function proxy integrations apply a default mapping template to send the entire request to your functions, and it automatically maps Lambda output to HTTP responses. HTTP proxy integrations simply pass the entire request and response directly through to your HTTP endpoint.

One way to use these new features is to migrate Express applications to Lambda and API Gateway. Previously, in order to preserve your API routes in Express, you had to redefine each API method and its corresponding Express integration endpoint on API Gateway. Now, you can simply define one catch-all resource in API Gateway and configure it as a proxy integration with a Lambda function that wraps your Express application.

Head on over to Jeff Barr’s blog to read more about these new features!