How to use the new Amazon DynamoDB key diagnostics library to visualize and understand your application’s traffic patterns

Update (December 14, 2020): The DynamoDB team recommends that you use Amazon CloudWatch Contributor Insights for Amazon DynamoDB, which is a diagnostic tool for understanding your traffic patterns and identifying frequently accessed keys. You can enable CloudWatch Contributor Insights for DynamoDB in the AWS Management Console, or in the AWS API or AWS CLI.

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more.

Today, we released the Amazon DynamoDB key diagnostics library, which enables you to view graphs and dashboards of your most accessed database items. A DynamoDB table can be used in two different capacity modes—provisioned and on-demand. DynamoDB automatically supports your access patterns as long as the traffic against a given item does not exceed 3,000 read capacity units per second or 1,000 write capacity units per second. The key diagnostics library gives you a tool to analyze the traffic to your tables and display a graph showing the number of reads and writes to each partition key so that you can monitor the the traffic in your application in near-real time and adjust to unpredictable and uneven workloads.

For example, if you want to know the most popular products in your inventory table that are driving the most page views, the library can help identify the products receiving the most interest. If you manage a mobile game, you can quickly identify and rate-limit potential bad actors who are driving excess traffic that negatively affects other gamers.

In this blog post, I show you how to set up the key diagnostics library. Then I walk through how to use the library’s visualizations to identify keys with uneven access distribution in an example movie database.

Overview of the DynamoDB key diagnostics library

The DynamoDB key diagnostics library is a lightweight wrapper around the DynamoDB SDK that is easy to integrate with your existing application. The library includes an AWS CloudFormation template that creates the required Amazon Kinesis Data Firehose Stream, Amazon Kinesis Data Streams, Amazon Kinesis Data Analytics, Amazon CloudWatch Logs, CloudWatch alarms, Amazon Athena, and Amazon QuickSight resources in your AWS account. You then can log DynamoDB access by partition key, analyze the frequency of access (or the “heat”) of each primary key, and display the results in a dashboard. The following diagram shows the resources that are deployed by the template when you follow the setup guide in this post.

DynamoDB key diagnostics library architectural diagram

For instructions about how to integrate the SDK, deploy the CloudFormation template. To view the dashboard displaying uneven access distribution and associated metrics, see the next section.

Getting started step-by-step guide

To set up the recommended AWS resources and run the example application yourself, clone this GitHub repository and take the steps that follow.

Note: At the time of this post’s publication, the library aggregates the metrics for keys at minute and second granularity. Depending on your business requirements, you might want to modify the client to aggregate data at different granularity. In addition, you can set up the included CloudFormation template in the following AWS Regions because all dependent services are available only in these regions: US East (N. Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt). This availability might change over time, so see the AWS Regions table for up-to-date information.

Setup Guide: How to install the Key Diagnostics Library, set up AWS resources, run the demo application, and integrate it with your code

Prerequisites

To use the key diagnostics library or run the demo, you must have the following installed on your machine:

Java 1.8
Apache Maven 3.0 or later
AWS Command Line Interface (AWS CLI)

Step 1: Install the key diagnostics library

To install the key diagnostics library, run the following command.

$ mvn install

Step 2: Configure your AWS credentials

Configure your AWS CLI credentials, if you haven’t already. The following AWS resources are created under the configured account.

$ aws configure

Make sure you have Amazon S3, AWS Lambda, Amazon Kinesis, Amazon CloudWatch and CloudFormation permissions with the configured credentials.

Step 3: Create and deploy the required AWS resources by using the CloudFormation template

You now will deploy a Lambda function for reporting and monitoring metrics. To do this, first upload the provided Lambda function to Amazon S3. If you don’t have an Amazon S3 bucket already, create one (throughout this post, replace the placeholder names with your own names).

$ export BUCKET_NAME=my_cool_new_bucket
$ aws s3 mb s3://$BUCKET_NAME

Then, package the provided Hot Key Lambda function the Amazon S3 bucket.

$ aws cloudformation package \
   --template-file resources/DynamoDB_Key_Diagnostics_Library.yaml \
   --s3-bucket $BUCKET_NAME \
   --output-template-file packaged.yaml

You can then create the rest of the necessary AWS resources (such as the Kinesis Data Streams stream, Kinesis Data Analytics application, and CloudWatch alarm) as follows. Also, provide a CloudFormation stack name.

$ STACK_NAME=KeyDiagnosticsStack 
$ aws cloudformation deploy \
   --template-file packaged.yaml \
   --stack-name $STACK_NAME \
   --capabilities CAPABILITY_IAM

CloudFormation does not automatically start the Kinesis Data Analytics application, so to start the application, navigate to the Amazon Kinesis console or run the following commands.

# Find the Kinesis Data Analtyics application name by going to the Kinesis console
# or run `aws kinesisanalytics list-applications`
$ KINESIS_ANALYTICS_APP_NAME="Put your application name here"

# Then, find the InputID
$ INPUT_ID=`aws kinesisanalytics describe-application \
   --application-name $KINESIS_ANALYTICS_APP_NAME \
   --query 'ApplicationDetail.InputDescriptions[0].InputId'`

# Start the Kinesis Data Analytics application
$ aws kinesisanalytics start-application \
   --application-name $KINESIS_ANALYTICS_APP_NAME \
   --input-configurations \
        Id=$INPUT_ID,\
        InputStartingPositionConfiguration={InputStartingPosition=NOW}

You now are ready to run the demo Movies example application in the repository (step 3.1) or change your code to use the key diagnostics library (step 3.2).

Step 3.1: Run the example Movies application

This example application uses the IMDb dataset to simulate a hot-key scenario in which some movies are rated more frequently than others. To run the application, first install the library on your local repository. Then, navigate to the samples/movies directory, and execute the demo by running the following commands.

$ KINESIS_STREAM_NAME="Put your Kinesis Data Stream name here"
$ REGION="Put the region where your Kinesis Stream and DynamoDB table are set up"

$ mvn package exec:java@movies -Dexec.args="traffic $KINESIS_STREAM_NAME $REGION"

Step 3.2: Change your code to use the key diagnostics library

To use the key diagnostics library, provide the Kinesis client name and Kinesis Data Streams name for logging DynamoDB usage information, in addition to the original DynamoDB client.

DynamoDBKeyDiagnosticsClient client =
DynamoDBKeyDiagnosticsClient.monitorAllPartitionKeys(
    dynamoDBClient,
    kinesisClient,
    kinesisStreamName
);

By default, the setup in this post monitors all key attributes for all the tables and global secondary indexes accessed through this client. If you need to specify attributes to monitor, you can pass in a map of table name to key attribute name.

DynamoDBKeyDiagnosticsClient client = new DynamoDBKeyDiagnosticsClient(
   dynamoDBClient,
   kinesisClient,
   kinesisStreamName,
   ImmutableMap.of("MyTable", ImmutableList.of("MyAttribute"))
);

After you create the DynamoDBKeyDiagnosticsClient instance, you can use it everywhere you would have used the AmazonDynamoDB client (the instance implements the AmazonDynamoDB interface).

To simplify the construction of the DynamoDB and Kinesis clients, you also can use the provided DynamoDBKeyDiagnosticsClientBuilder. The following example code closes the thread pool created to log the usage information asynchronously to Kinesis.

try (final DynamoDBKeyDiagnosticsClient instrumented = 
       DynamoDBKeyDiagnosticsClientBuilder.defaultClient(kinesisStreamName)) {
   // Your application
}

Step 4: Set up Amazon Athena and Amazon QuickSight for visualizations

If you are interested in creating dashboards or querying key usage information, or if you want to understand the access patterns of specific attributes, we highly recommend setting up Athena and Amazon QuickSight.

Navigate to the Athena console, paste following under New query 1, and then choose Run query. This command creates an Athena database for the key usage information stored in Amazon S3.

CREATE DATABASE IF NOT EXISTS dynamodbkeydiagnosticslibrary
COMMENT 'Athena database for DynamoDBKeyDiagnosticsLibrary';

Create the Athena table. Following the example application, the table name is movies. If you used the provided CloudFormation template in Step 1, the Amazon S3 location should be something similar to: s3://keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/

CREATE EXTERNAL TABLE `movies`(
`second` timestamp COMMENT 'Second aggregated results',
`tablename` string COMMENT 'DynamoDB table name',
`hashkey` string COMMENT 'The partition key attribute name',
`hashkeyvalue` string COMMENT 'The partition key attribute value',
`operation` string COMMENT 'DynamoDB operation',
`totalio` float COMMENT 'Total IO consumed')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/'

After you create the Athena table, you can use Amazon QuickSight to visualize the key usage pattern of your application:

Navigate to the Amazon QuickSight console and choose Manage data.
Choose New data set, choose Athena, and specify a data source name. You then should be able to select the Athena database and table you created in the previous section.
Choose Import to SPICE for quicker analytics, and then choose Visualize!
Now you should be able to create graphs by filtering on table names, time ranges, and partition keys to visualize movie popularity over a time range.

Note: With the preceding setup, the key diagnostics library will log the values of your partition key, sort key, or any attributes you want to monitor for the selected DynamoDB table. The key usage information is stored in Amazon S3, and specific hot keys are logged and displayed through CloudWatch and Amazon QuickSight. Be careful about exposing sensitive data stored in your DynamoDB table after integrating with the library.

The following visualizations show example output from the library when integrated with the sample movies application. The visualizations show the relative heat of each movie record as the library reads records from the database and how much traffic each record has had over time. Visualizing each record’s traffic allows the application owner to make decisions based on traffic patterns.

Summary

In this post, I show how you can identify your most accessed database items by using the new DynamoDB key diagnostics library. Using this library, you can visualize your application traffic through graphs and visual dashboards. If your database has hot keys and hot spots, the library gives you one more tool to help you adjust to unpredictable and uneven workloads.

About the authors

Ryan Chan is a software development engineer at Amazon Web Services. He works on the DynamoDB team, developing features such as encryption at rest and on-demand.

Mostafa Elhemali is a senior software development engineer at Amazon Web Services.

Padma Malligarjunan is a senior product manager at AWS with industry background in finance, gaming, and retail. She is passionate about training and evangelizing development best practices for building distributed and highly scalable applications using Amazon DynamoDB.

AWS Database Blog