AWS Database Blog

Use Amazon DynamoDB Accelerator (DAX) from AWS Lambda to increase performance while reducing costs

Using Amazon DynamoDB Accelerator (DAX) from AWS Lambda has several benefits for serverless applications that also use Amazon DynamoDB. DAX can improve the response time of your application by dramatically reducing read latency, as compared to using DynamoDB. Using DAX can also lower the cost of DynamoDB by reducing the amount of provisioned read throughput needed for read-heavy applications. For serverless applications, DAX provides an additional benefit: Lower latency results in shorter Lambda execution times, which means lower costs.

Connecting to a DAX cluster from Lambda functions requires some special configuration. In this post, I show an example URL-shortening application based on the AWS Serverless Application Model (AWS SAM). The application uses Amazon API Gateway, Lambda, DynamoDB, DAX, and AWS CloudFormation to demonstrate how to access DAX from Lambda.

A simple serverless URL shortener

The example application in this post is a simple URL shortener. I use AWS SAM templates to simplify the setup for API Gateway, Lambda, and DynamoDB. The entire configuration is presented in an AWS CloudFormation template for repeatable deployments. The sections that create the DAX cluster, roles, security groups, and subnet groups do not depend on the SAM templates, and you can use them with regular AWS CloudFormation templates.

Like all AWS services, DAX was designed with security as a primary consideration. As a result, it requires clients to connect to DAX clusters as part of a virtual private cloud (VPC), which means that you can’t access a DAX cluster directly over the internet. Therefore, you must attach any Lambda function that needs to access a DAX cluster to a VPC that can access the cluster. The AWS CloudFormation template in the following section contains all the necessary pieces and configuration to make DAX and Lambda work together. You can customize the template to fit the needs of your application.

The following diagram illustrates this solution.The solution diagram

As illustrated in the diagram:

  1. The client sends an HTTP request to API Gateway.
  2. API Gateway forwards the request to the appropriate Lambda functions.
  3. The Lambda functions are run inside your VPC, which allows them to access VPC resources such as your DAX cluster.
  4. The DAX cluster is also inside your VPC, which means it can be reached by the Lambda functions.

The AWS CloudFormation template

Let’s start with the AWS CloudFormation template (template.yaml). The first section of the code contains the AWS CloudFormation template, AWS SAM prologue, and AWS SAM function definition.

AWSTemplateFormatVersion: '2010-09-09'
Description: A sample application showing how to use Amazon DynamoDB Accelerator (DAX) with AWS Lambda and AWS CloudFormation.
Transform: AWS::Serverless-2016-10-31
Resources:
  siteFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: geturl.zip
      Description: Resolve/Store URLs
      Environment:
        Variables:
          DAX_ENDPOINT: !GetAtt getUrlCluster.ClusterDiscoveryEndpoint
          DDB_TABLE: !Ref getUrlTable
      Events:
        getUrl:
          Type: Api
          Properties:
            Method: get
            Path: /{id+}
        postUrl:
          Type: Api
          Properties:
            Method: post
            Path: /
      Handler: lambda/index.handler
      Policies:
          - AmazonDynamoDBFullAccess
          - AWSLambdaVPCAccessExecutionRole
      Runtime: nodejs6.10
      Timeout: 10
      VpcConfig:
        SecurityGroupIds: 
            - !GetAtt getUrlSecurityGroup.GroupId
        SubnetIds:
            - !Ref getUrlSubnet

This section of the template specifies the following:

  • Location of the code package
  • Environment variables used by the function
  • URL formats
  • Security policies
  • Language runtime
  • VPC configuration (in the VpcConfig stanza), which allows the Lambda function to reach a DAX cluster

This example creates its VPC and subnets so that they are defined using references to later sections of the file. If the VPC already exists, you should use the existing identifiers instead.

Using AWS::Serverless::Function takes care of creating the Lambda function definition with the appropriate permissions in addition to creating an API Gateway endpoint that calls the Lambda function on each HTTP request. Users access the URL shortener through this endpoint.

The next section of this code example creates a DynamoDB table.

  getUrlTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: GetUrl-sample
      AttributeDefinitions:
        - 
          AttributeName: id
          AttributeType: S
      KeySchema:
        - 
          AttributeName: id
          KeyType: HASH
      ProvisionedThroughput:
        ReadCapacityUnits: 100
        WriteCapacityUnits: 10

This table has only a single hash key (KeySchema has only the id column). The ProvisionedThroughput ReadCapacityUnits are kept low because DAX serves most of the read traffic. DynamoDB is called only if DAX has not cached the item.

Now the template specifies the DAX cluster.

getUrlCluster:
    Type: AWS::DAX::Cluster
    Properties:
      ClusterName: getUrl-sample
      Description: Cluster for GetUrl Sample
      IAMRoleARN: !GetAtt getUrlRole.Arn
      NodeType: dax.t2.small
      ReplicationFactor: 1
      SecurityGroupIds:
        - !GetAtt getUrlSecurityGroup.GroupId
      SubnetGroupName: !Ref getUrlSubnetGroup

  getUrlRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action:
            - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
              - dax.amazonaws.com
        Version: '2012-10-17'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
      RoleName: getUrl-sample-Role

The cluster is created using a single dax.t2.small node for demonstration purposes. Production workloads should use a cluster size (ReplicationFactor) of at least 3 for redundancy and consider using an appropriately-sized dax.r4.* instance (NodeType). The getUrlRole stanza defines an AWS Identity and Access Management (IAM) role that grants the DAX cluster permission to access your DynamoDB data. (Don’t edit or remove this role after creating it, or the cluster won’t be able to access DynamoDB.)

Next, the template sets up a security group with a rule to allow Lambda to send traffic to DAX on TCP port 8111. If you look earlier in this post at the serverless function definition, the VpcConfig stanza refers to this security group. Security groups control how network traffic is allowed to flow in a VPC.

  getUrlSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security Group for GetUrl
      GroupName: getUrl-sample
      VpcId: !Ref getUrlVpc
  
  getUrlSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: getUrlSecurityGroup
    Properties:
      GroupId: !GetAtt getUrlSecurityGroup.GroupId
      IpProtocol: tcp
      FromPort: 8111
      ToPort: 8111
      SourceSecurityGroupId: !GetAtt getUrlSecurityGroup.GroupId

Finally, the template creates the networking configuration for the example, including the VPC, subnet, and a subnet group.

  getUrlVpc:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      InstanceTenancy: default
      Tags:
        - Key: Name
          Value: getUrl-sample
  
getUrlSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
        Fn::Select:
          - 0
          - Fn::GetAZs: ''
      CidrBlock: 10.0.0.0/20
      Tags:
        - Key: Name
          Value: getUrl-sample
      VpcId: !Ref getUrlVpc
  
  getUrlSubnetGroup:
    Type: AWS::DAX::SubnetGroup
    Properties:
      Description: Subnet group for GetUrl Sample
      SubnetGroupName: getUrl-sample
      SubnetIds: 
        - !Ref getUrlSubnet

This part of the template creates a new VPC and adds a subnet to it in the first available Availability Zone of the current AWS Region, and then it creates a DAX subnet group for that subnet. DAX uses the subnets in a subnet group to determine how to distribute the cluster nodes. For production use, it is highly recommended that you use multiple nodes in multiple Availability Zones for redundancy. Each Availability Zone requires its own subnet to be created and added to the subnet group.

The code

I present the URL-shortening code in a single file (lambda/index.js) for simplicity. How the code works: A POST request takes the URL, creates a hash of it, stores the hash in DynamoDB, and returns the hash. A GET request to that hash looks up the URL in DynamoDB and redirects to the actual URL. The full code example is available on GitHub.

const AWS = require('aws-sdk');
const AmaxonDaxClient = require('amazon-dax-client');
const crypto = require('crypto');

// Store this at file level so that it is preserved between Lambda executions
var dynamodb;

exports.handler = function(event, context, callback) {
  event.headers = event.headers || [];
  main(event, context, callback);
};

function main(event, context, callback) {
  // Initialize the 'dynamodb' variable if it has not already been done. This 
  // allows the initialization to be shared between Lambda runs to reduce 
  // execution time. This will be rerun if Lambda has to recycle the container 
  // or use a new instance.
  if(!dynamodb) {
    if(process.env.DAX_ENDPOINT) {
      console.log('Using DAX endpoint', process.env.DAX_ENDPOINT);
      dynamodb = new AmaxonDaxClient({endpoints: [process.env.DAX_ENDPOINT]});
    } else {
      // DDB_LOCAL can be set if using lambda-local with dynamodb-local or another local
      // testing environment
      if(process.env.DDB_LOCAL) {
        console.log('Using DynamoDB local');
        dynamodb = new AWS.DynamoDB({endpoint: 'http://localhost:8000', region: 'ddblocal'});
      } else {
        console.log('Using DynamoDB');
        dynamodb = new AWS.DynamoDB();
      }
    }
  }

  // Depending on the HTTP method, save or return the URL
  if (event.httpMethod == 'GET') {
    return getUrl(event.pathParameters.id, callback);
  } else if (event.httpMethod == 'POST' && event.body) {
    return setUrl(event.body, callback);
  } else {
    return done(400, JSON.stringify({error: 'Missing or invalid HTTP Method'}), 'application/json', callback);
  }
}

// Get URLs from the database and return
function getUrl(id, callback) {
  const params = {
    TableName: process.env.DDB_TABLE,
    Key: { id: { S: id } }
  };
  
  console.log('Fetching URL for', id);
  dynamodb.getItem(params, (err, data) => {
    if(err) {
      console.error('getItem error:', err);
      return done(500, JSON.stringify({error: 'Internal Server Error: ' + err}), 'application/json', callback);
    }

    if(data && data.Item && data.Item.target) {
      let url = data.Item.target.S;
      return done(301, url, 'text/plain', callback, {Location: url});
    } else {
      return done(404, '404 Not Found', 'text/plain', callback);
    }
  });
}

/**
 * Compute a unique ID for each URL.
 *
 * To do this, take the MD5 hash of the URL, extract the first 40 bits, and 
 * then return that in Base32 representation.
 *
 * If the salt is provided, prepend that to the URL first. 
 * This resolves hash collisions.
 * 
 */
function computeId(url, salt) {
  if(salt) {
    url = salt + '$' + url
  }

  // For demonstration purposes MD5 is fine
  let md5 = crypto.createHash('md5');

  // Compute the MD5, and then use only the first 40 bits
  let h = md5.update(url).digest('hex').slice(0, 10);

  // Return results in Base32 (hence 40 bits, 8*5)
  return parseInt(h, 16).toString(32);
}

// Save the URLs to the database
function setUrl(url, callback, salt) {
  let id = computeId(url, salt);

  const params = {
    TableName: process.env.DDB_TABLE,
    Item: {
      id: { S: id },
      target: { S: url }
    },
    // Ensure that puts are idempotent
    ConditionExpression: "attribute_not_exists(id) OR target = :url",
    ExpressionAttributeValues: {
      ":url": {S: url}
    }
  };

  dynamodb.putItem(params, (err, data) => {
    if (err) {
      if(err.code === 'ConditionalCheckFailedException') {
        console.warn('Collision on ' + id + ' for ' + url + '; retrying...');
        // Retry with the attempted ID as the salt.
        // Eventually, there will not be a collision.
        return setUrl(url, callback, id);
      } else {
        console.error('Dynamo error on save: ', err);
        return done(500, JSON.stringify({error: 'Internal Server Error: ' + err}), 'application/json', callback);
      }
    } else {
      return done(200, id, 'text/plain', callback);
    }
  });
}

// We're done with this. Lambda, return to the client with given parameters
function done(statusCode, body, contentType, callback, headers) {
  full_headers = {
      'Content-Type': contentType
  }

  if(headers) {
    full_headers = Object.assign(full_headers, headers);
  }

  callback(null, {
    statusCode: statusCode,
    body: body,
    headers: full_headers,
    isBase64Encoded: false,
  });
}

The Lambda handler uses environment variables for configuration: DDB_TABLE is the name of the table containing the URL information, and DAX_ENDPOINT is the cluster endpoint. In this example, these variables are configured automatically in the AWS CloudFormation template.

The dynamodb instance is at global scope so that it persists between function executions. It is initialized on the first run and continues to exist as long as the underlying Lambda instance exists. As a result, you don’t have to reconnect on every execution, which can be an expensive operation when using DAX. By reusing the dynamodb instance for both direct DynamoDB access and DAX access, the code also shows that the DynamoDB and DAX clients are source-compatible, except for the initialization code.

Package info

The last piece that is needed is a package.json for npm (the most common JavaScript package manager) so that it can find and download the proper versions of the example’s dependencies.

{
  "name": "geturljs",
  "version": "1.0.0",
  "repository": "https://github.com/aws-samples/amazon-dax-lambda-nodejs-sample",
  "description": "Amazon DynamoDB Accelerator (DAX) Lambda Node.js Sample",
  "main": "index.js",
  "scripts": {
    "test": "test"
  },
  "author": "author@example.com",
  "license": "MIT",
  "dependencies": {
    "amazon-dax-client": "^1.1.0",
    "aws-sdk": "^2.202.0"
  }
}

Deployment

You package Lambda functions as .zip files for deployment. For this example, the .zip archive must contain the lambda directory (for the example code) and the node_modules directory (for the dependencies) so that Lambda has everything it needs to run the function. Run all the following commands from a Bash shell.

Install both npm and the AWS CLI if you haven’t already.

# Install dependencies with npm
npm install

# Make a zip file with necessary folders
zip -qur geturl node_modules lambda

This code creates geturl.zip, which is the Lambda package. Now you need an Amazon S3 bucket to put the package in so that AWS CloudFormation can find it.

aws s3 mb s3://bucket-name

Then, create an AWS CloudFormation package of the code in that bucket.

aws cloudformation package --template-file template.yaml --output-template-file packaged-template.yaml --s3-bucket bucket-name

Finally, deploy the AWS CloudFormation stack to create all the resources.

aws cloudformation deploy --template-file packaged-template.yaml --capabilities CAPABILITY_NAMED_IAM --stack-name geturl

Using the URL shortener

You can now access the URL shortener by using the API Gateway endpoint that was created by the AWS CloudFormation template. The URLs created by API Gateway contain a REST ID that is specific to each endpoint. You can find the ID for the example endpoint using the AWS CLI.

# Use the AWS CLI to find the API Gateway REST ID
gwId=$(aws apigateway get-rest-apis --query "items[?name == 'geturl'].id | [0]" --output text)
# Construct the endpoint URL using the REST ID
endpointUrl="https://$gwId.execute-api.region.amazonaws.com/Prod"

To shorten a URL, use the following command.

curl -d 'https://www.amazon.com' "$endpointUrl"

This command returns a “slug” that you can use to go to the URL.

curl -v "$endpointUrl/$slug"

You also can create a custom URL by using Amazon Route 53.

Conclusion

In this post, I showed how to use AWS CloudFormation to create a Lambda function that uses DAX and DynamoDB to implement a simple URL shortener. The AWS CloudFormation template includes all the configuration necessary to ensure that the Lambda function can reach the DAX cluster and use it to access the data in DynamoDB.

By combining the high performance of DAX with your serverless Lambda applications, you can both increase your performance while reducing your costs which is a win for you and your customers.


About the Author

Jeff Hardy is a software development engineer at Amazon Web Services.