AWS Database Blog

Use Amazon DynamoDB Accelerator (DAX) from AWS Lambda to increase performance while reducing costs

April 01, 2020 update: Changed the security to add a least privileged IAM policy to the role instead of a wide open managed policy, switched to HttpApi in API Gateway for auto-deployment as well as cost, and added to node.js code to detect if a requesting client is base64 encoding the body of the request and decode if so.


Using Amazon DynamoDB Accelerator (DAX) from AWS Lambda has several benefits for serverless applications that also use Amazon DynamoDB. DAX can improve the response time of your application by dramatically reducing read latency, as compared to using DynamoDB. Using DAX can also lower the cost of DynamoDB by reducing the amount of provisioned read throughput needed for read-heavy applications. For serverless applications, DAX provides an additional benefit: Lower latency results in shorter Lambda execution times, which means lower costs.

Connecting to a DAX cluster from Lambda functions requires some special configuration. In this post, I show an example URL-shortening application based on the AWS Serverless Application Model (AWS SAM). The application uses Amazon API Gateway, Lambda, DynamoDB, DAX, and AWS CloudFormation to demonstrate how to access DAX from Lambda.

A simple serverless URL shortener

The example application in this post is a simple URL shortener. I use AWS SAM templates to simplify the setup for API Gateway, Lambda, and DynamoDB. The entire configuration is presented in an AWS CloudFormation template for repeatable deployments. The sections that create the DAX cluster, roles, security groups, and subnet groups do not depend on the SAM templates, and you can use them with regular AWS CloudFormation templates.

Like all AWS services, DAX was designed with security as a primary consideration. As a result, it requires clients to connect to DAX clusters as part of a virtual private cloud (VPC), which means that you can’t access a DAX cluster directly over the internet. Therefore, you must attach any Lambda function that needs to access a DAX cluster to a VPC that can access the cluster. The AWS CloudFormation template in the following section contains all the necessary pieces and configuration to make DAX and Lambda work together. You can customize the template to fit the needs of your application.

The following diagram illustrates this solution.The solution diagram

As illustrated in the diagram:Amazon DynamoDB Accelerator (DAX) from AWS Lambda to increase performa

  1. The client sends an HTTP request to API Gateway.
  2. API Gateway forwards the request to the appropriate Lambda functions.
  3. The Lambda functions are run inside your VPC, which allows them to access VPC resources such as your DAX cluster.
  4. The DAX cluster is also inside your VPC, which means it can be reached by the Lambda functions.

The AWS CloudFormation template

Let’s start with the AWS CloudFormation template (template.yaml). The first section of the code contains the AWS CloudFormation template, AWS SAM prologue, and AWS SAM function definition.

AWSTemplateFormatVersion: '2010-09-09'
Description: A sample application showing how to use Amazon DynamoDB Accelerator (DAX) with AWS Lambda and AWS CloudFormation.
Transform: AWS::Serverless-2016-10-31
Resources:
  siteFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: geturl.zip
      Description: Resolve/Store URLs
      Environment:
        Variables:
          DAX_ENDPOINT: !GetAtt getUrlCluster.ClusterDiscoveryEndpoint
          DDB_TABLE: !Ref getUrlTable
      Events:
        getUrl:
          Type: HttpApi
          Properties:
            Method: get
            Path: /{id+}
        postUrl:
          Type: HttpApi
          Properties:
            Method: post
            Path: /
      Handler: lambda/index.handler
      Runtime: nodejs12.x
      Timeout: 10
      VpcConfig:
        SecurityGroupIds: 
            - !GetAtt getUrlSecurityGroup.GroupId
        SubnetIds:
            - !Ref getUrlSubnet
      Role: !GetAtt getUrlRole.Arn

This section of the template specifies the following:

  • Location of the code package
  • Environment variables used by the function
  • URL formats
  • Security policies
  • Language runtime
  • VPC configuration (in the VpcConfig stanza), which allows the Lambda function to reach a DAX cluster
  • Role, which is defined later on, but this calls it to be created. It gives only the access necessary for this project to run

This example creates its VPC and subnets so that they are defined using references to later sections of the file. If the VPC already exists, you should use the existing identifiers instead.

Using AWS::Serverless::Function takes care of creating the Lambda function definition with the appropriate permissions in addition to creating an API Gateway endpoint that calls the Lambda function on each HTTP request. Users access the URL shortener through this endpoint.

The next section of this code example creates a DynamoDB table.

  getUrlTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: GetUrl-sample
      AttributeDefinitions:
        - 
          AttributeName: id
          AttributeType: S
      KeySchema:
        - 
          AttributeName: id
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST

This table has only a single hash key (KeySchema has only the id column). The ProvisionedThroughput ReadCapacityUnits are kept low because DAX serves most of the read traffic. DynamoDB is called only if DAX has not cached the item.

Now the template specifies the DAX cluster.

getUrlCluster:
    Type: AWS::DAX::Cluster
    Properties:
      ClusterName: getUrl-sample
      Description: Cluster for GetUrl Sample
      IAMRoleARN: !GetAtt getUrlRole.Arn
      NodeType: dax.t2.small
      ReplicationFactor: 1
      SecurityGroupIds:
        - !GetAtt getUrlSecurityGroup.GroupId
      SubnetGroupName: !Ref getUrlSubnetGroup

  getUrlRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action:
            - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
              - dax.amazonaws.com
        Version: '2012-10-17'
      RoleName: getUrl-sample-Role
      Policies:
        -
         PolicyName: DAXAccess
         PolicyDocument:
           Version: '2012-10-17'
           Statement:
             - Effect: Allow
               Resource: '*'
               Action:
                - 'dax:PutItem'
                - 'dax:GetItem'
                - 'dynamodb:DescribeTable'
                - 'dynamodb:GetItem'
                - 'dynamodb:PutItem'
      ManagedPolicyArns:
         - arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole

The cluster is created using a single dax.t2.small node for demonstration purposes. Production workloads should use a cluster size (ReplicationFactor) of at least 3 for redundancy and consider using an appropriately-sized dax.r4.* instance (NodeType). The getUrlRole stanza defines an AWS Identity and Access Management (IAM) role and policy that grants the DAX cluster permission to access your DynamoDB data, but also be useable by Lambda Function. (Don’t edit or remove this role after creating it, or the cluster won’t be able to access DynamoDB.)

Next, the template sets up a security group with a rule to allow Lambda to send traffic to DAX on TCP port 8111. If you look earlier in this post at the serverless function definition, the VpcConfig stanza refers to this security group. Security groups control how network traffic is allowed to flow in a VPC.

  getUrlSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security Group for GetUrl
      GroupName: getUrl-sample
      VpcId: !Ref getUrlVpc
  
  getUrlSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: getUrlSecurityGroup
    Properties:
      GroupId: !GetAtt getUrlSecurityGroup.GroupId
      IpProtocol: tcp
      FromPort: 8111
      ToPort: 8111
      SourceSecurityGroupId: !GetAtt getUrlSecurityGroup.GroupId

Finally, the template creates the networking configuration for the example, including the VPC, subnet, and a subnet group.

  getUrlVpc:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: getUrl-sample
  
getUrlSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ''
      CidrBlock: 10.0.0.0/20
      Tags:
        - Key: Name
          Value: getUrl-sample
      VpcId: !Ref getUrlVpc
  
  getUrlSubnetGroup:
    Type: AWS::DAX::SubnetGroup
    Properties:
      Description: Subnet group for GetUrl Sample
      SubnetGroupName: getUrl-sample
      SubnetIds: 
        - !Ref getUrlSubnet

This part of the template creates a new VPC and adds a subnet to it in the first available Availability Zone of the current AWS Region, and then it creates a DAX subnet group for that subnet. DAX uses the subnets in a subnet group to determine how to distribute the cluster nodes. For production use, it is highly recommended that you use multiple nodes in multiple Availability Zones for redundancy. Each Availability Zone requires its own subnet to be created and added to the subnet group.

The code

I present the URL-shortening code in a single file (lambda/index.js) for simplicity. How the code works: A POST request takes the URL, creates a hash of it, stores the hash in DynamoDB, and returns the hash. A GET request to that hash looks up the URL in DynamoDB and redirects to the actual URL. The full code example is available on GitHub.

This function imports the DAX Client SDK for JavaScript to perform operations on the DAX cluster, such as scan and query.

const AWS = require('aws-sdk');
const AmazonDaxClient = require('amazon-dax-client');
const crypto = require('crypto');

// Store this at file level so that it is preserved between Lambda executions
var dynamodb;

exports.handler = function(event, context, callback) {
  event.headers = event.headers || [];
  main(event, context, callback);
};

function main(event, context, callback) {
  // Initialize the 'dynamodb' variable if it has not already been done. This 
  // allows the initialization to be shared between Lambda runs to reduce 
  // execution time. This will be rerun if Lambda has to recycle the container 
  // or use a new instance.
  if(!dynamodb) {
    if(process.env.DAX_ENDPOINT) {
      console.log('Using DAX endpoint', process.env.DAX_ENDPOINT);
      dynamodb = new AmazonDaxClient({endpoints: [process.env.DAX_ENDPOINT]});
    } else {
      // DDB_LOCAL can be set if using lambda-local with dynamodb-local or another local
      // testing environment
      if(process.env.DDB_LOCAL) {
        console.log('Using DynamoDB local');
        dynamodb = new AWS.DynamoDB({endpoint: 'http://localhost:8000', region: 'ddblocal'});
      } else {
        console.log('Using DynamoDB');
        dynamodb = new AWS.DynamoDB();
      }
    }
  }

 let body = event.body;
 
  // Depending on the HTTP Method, save or return the URL
  if (event.requestContext.http.method == 'GET') {
    return getUrl(event.pathParameters.id, callback);
  } else if (event.requestContext.http.method == 'POST' && event.body) {

    // if base64 encoded event.body is sent in, decode it
    if (event.isBase64Encoded) {
      let buff = Buffer.from(body, 'base64');
      body = buff.toString('utf-8');
    }

    return setUrl(body, callback);
  } else {
    console.log ('HTTP method ', event.requestContext.http.method, ' is invalid.');
    return done(400, JSON.stringify({error: 'Missing or invalid HTTP Method'}), 'application/json', callback);
  }
}

// Get URLs from the database and return
function getUrl(id, callback) {
  const params = {
    TableName: process.env.DDB_TABLE,
    Key: { id: { S: id } }
  };
  
  console.log('Fetching URL for', id);
  dynamodb.getItem(params, (err, data) => {
    if(err) {
      console.error('getItem error:', err);
      return done(500, JSON.stringify({error: 'Internal Server Error: ' + err}), 'application/json', callback);
    }

    if(data && data.Item && data.Item.target) {
      let url = data.Item.target.S;
      return done(301, url, 'text/plain', callback, {Location: url});
    } else {
      return done(404, '404 Not Found', 'text/plain', callback);
    }
  });
}

/**
 * Compute a unique ID for each URL.
 *
 * To do this, take the MD5 hash of the URL, extract the first 40 bits, and 
 * then return that in Base32 representation.
 *
 * If the salt is provided, prepend that to the URL first. 
 * This resolves hash collisions.
 * 
 */
function computeId(url, salt) {
  if(salt) {
    url = salt + '$' + url
  }

  // For demonstration purposes MD5 is fine
  let md5 = crypto.createHash('md5');

  // Compute the MD5, and then use only the first 40 bits
  let h = md5.update(url).digest('hex').slice(0, 10);

  // Return results in Base32 (hence 40 bits, 8*5)
  return parseInt(h, 16).toString(32);
}

// Save the URLs to the database
function setUrl(url, callback, salt) {
  let id = computeId(url, salt);

  const params = {
    TableName: process.env.DDB_TABLE,
    Item: {
      id: { S: id },
      target: { S: url }
    },
    // Ensure that puts are idempotent
    ConditionExpression: "attribute_not_exists(id) OR target = :url",
    ExpressionAttributeValues: {
      ":url": {S: url}
    }
  };

  dynamodb.putItem(params, (err, data) => {
    if (err) {
      if(err.code === 'ConditionalCheckFailedException') {
        console.warn('Collision on ' + id + ' for ' + url + '; retrying...');
        // Retry with the attempted ID as the salt.
        // Eventually, there will not be a collision.
        return setUrl(url, callback, id);
      } else {
        console.error('Dynamo error on save: ', err);
        return done(500, JSON.stringify({error: 'Internal Server Error: ' + err}), 'application/json', callback);
      }
    } else {
      return done(200, id, 'text/plain', callback);
    }
  });
}

// We're done with this. Lambda, return to the client with given parameters
function done(statusCode, body, contentType, callback, headers) {
  full_headers = {
      'Content-Type': contentType
  }

  if(headers) {
    full_headers = Object.assign(full_headers, headers);
  }

  callback(null, {
    statusCode: statusCode,
    body: body,
    headers: full_headers,
    isBase64Encoded: false,
  });
}

The Lambda handler uses environment variables for configuration: DDB_TABLE is the name of the table containing the URL information, and DAX_ENDPOINT is the cluster endpoint. In this example, these variables are configured automatically in the AWS CloudFormation template.

The dynamodb instance is at global scope so that it persists between function executions. It is initialized on the first run and continues to exist as long as the underlying Lambda instance exists. As a result, you don’t have to reconnect on every execution, which can be an expensive operation when using DAX. By reusing the dynamodb instance for both direct DynamoDB access and DAX access, the code also shows that the DynamoDB and DAX clients are source-compatible, except for the initialization code.

Some clients, e.g. curl, may send the body of the request base64 encoded. If that happens, we detect and decode the body on the way into plain text to be written to DynamoDB.

Package info

The last piece that is needed is a package.json for npm (the most common JavaScript package manager) so that it can find and download the proper versions of the example’s dependencies: the AWS JavaScript SDK v2 and the DAX client SDK.

{
  "name": "geturljs",
  "version": "1.0.1",
  "repository": "https://github.com/aws-samples/amazon-dax-lambda-nodejs-sample",
  "description": "Amazon DynamoDB Accelerator (DAX) Lambda Node.js Sample",
  "main": "index.js",
  "scripts": {
    "test": "test"
  },
  "author": "author@example.com",
  "license": "MIT",
  "dependencies": {
    "amazon-dax-client": "^1.1.0",
    "aws-sdk": "^2.202.0"
  }
}

Deployment

You package Lambda functions as .zip files for deployment. For this example, the .zip archive must contain the lambda directory (for the example code) and the node_modules directory (for the dependencies) so that Lambda has everything it needs to run the function. Run all the following commands from a Bash shell.

Install both npm and the AWS CLI if you haven’t already.

# Install dependencies with npm
npm install

# Make a zip file with necessary folders
zip -qur geturl node_modules lambda

This code creates geturl.zip, which is the Lambda package. Now you need an Amazon S3 bucket to put the package in so that AWS CloudFormation can find it.

aws s3 mb s3://<your-bucket-name>

Then, create an AWS CloudFormation package of the code in that bucket.

aws cloudformation package --template-file template.yaml --output-template-file packaged-template.yaml --s3-bucket <your-bucket-name>

Finally, deploy the AWS CloudFormation stack to create all the resources.

aws cloudformation deploy --template-file packaged-template.yaml --capabilities CAPABILITY_NAMED_IAM --stack-name geturl

Using the URL shortener

You can now access the URL shortener by using the API Gateway endpoint that was created by the AWS CloudFormation template. The URLs created by API Gateway contain a REST ID that is specific to each endpoint. You can find the ID for the example endpoint using the AWS CLI.

# Use the AWS CLI to find the API Gateway HTTP Endpoint
endpointUrl=$(aws apigatewayv2 get-apis --query "Items[?Name == 'amazon-dax-lambda-nodejs-sample'].ApiEndpoint" --output text)

To shorten a URL, use the following command.

curl -d 'https://www.amazon.com' "$endpointUrl"

This command returns a “slug” that you can use to go to the URL.

curl -v "$endpointUrl/$slug"

You also can create a custom URL by using Amazon Route 53.

Conclusion

In this post, we showed how to use AWS CloudFormation to create a Lambda function that uses DAX and DynamoDB to implement a simple URL shortener. The AWS CloudFormation template includes all the configuration necessary to ensure that the Lambda function can reach the DAX cluster and use it to access the data in DynamoDB.

By combining the high performance of DAX with your serverless Lambda applications, you can both increase your performance while reducing your costs which is a win for you and your customers.


About the Authors

Kirk Kirkconnell is a Senior Technologist on Amazon DynamoDB and Amazon Managed Apache Cassandra Service with Amazon Web Services.

Jeff Hardy was a Software Development Engineer at Amazon Web Services.