Implement event-driven architectures with Amazon DynamoDB – Part 2

In the previous post (Part 1) of this series, we discussed how you can use Amazon EventBridge Scheduler for precise data eviction management in Amazon DynamoDB. In this post (Part 2), we explore another method which uses global secondary indexes (GSIs) to handle fine-grained Time to Live (TTL) requirements.

When dealing with high-throughput applications, precise control over data expiration is important. The inherent delay in the native DynamoDB TTL feature may not always meet the needs of applications that require immediate data removal. By using GSIs, we can create a more responsive system that aligns with your applications real-time requirements.

Solution overview

Using a GSI for this use case allows you to efficiently query data that is eligible for eviction at intervals tailored to your specific needs. This solution can provide near real-time granularity, similar to the previous EventBridge Scheduler method. By setting up a sharded GSI with a TTL attribute as the sort key, we can periodically query and remove expired items with a precision close to 1 minute.

The following diagram illustrates the solution architecture.

EventBridge Scheduler triggers a Lambda function on a defined schedule.
The Lambda function queries a GSI to retrieve items marked for deletion.
The Lambda function deletes the retrieved items from the DynamoDB table.

Architecture drawing

The need for sharding

To efficiently manage high write and read throughput, sharding the GSI is essential. Sharding distributes the load across multiple partitions, preventing any single partition from becoming a bottleneck and causing throttling. This can be achieved by adding a random or calculated suffix to the partition key values. By randomizing the partition key, writes are spread more evenly across the partition key space, improving parallelism and overall throughput.

However, querying for expired data requires issuing a query for each shard in the range. This approach ensures that all possible partitions are checked for expired items, but it does necessitate multiple queries to cover all shards.

The number of shards is calculated based on your expected peak write throughput on your base table using the following formula:

number of shards = (peak write capacity units (WCU) / 1000) + buffer

Including a buffer in your calculation is recommended to ensure you have sufficient shards to avoid throttling on your DynamoDB table. The buffer also provides additional distribution to handle unexpected spikes in write throughput, maintaining the performance and reliability of your application.

If in the longer-term your expected peak WCU increases, you can first modify the Lambda function to query additional shards, and then update the application to increase the shard count while writing data to the table. The Lambda can query non-existing shards in the GSI without issues.

GSI primary key selection

When selecting keys for your GSI, the partition key GSI_PK will be created using a random shard, with the range defined by our previous calculation to ensure even load distribution. The sort key will be your TTL attribute, which can be represented using either an epoch number or a string time format. This combination allows efficient querying of items based on their expiration time, while the sharded partition key helps distribute the write and read operations across multiple partitions, optimizing the overall performance and scalability of your DynamoDB table.

Cost-optimized indexing

Maintaining an efficient GSI involves using the KEYS_ONLY projection and ensuring the index is sparse. By using KEYS_ONLY, the GSI only stores the primary key attributes and the index key attributes, significantly reducing storage and throughput costs. Additionally, making the GSI sparse means it only includes items that have a TTL attribute. This selective indexing ensures that only relevant items, those with an expiration time, are included in the GSI, further optimizing performance and resource utilization. This not only enhances query efficiency but also helps in managing costs by keeping the index lean and focused on items that require TTL management.

Sample data model

The following example data model is designed for handling session state management in a high-throughput environment, where sessions require fine-grained TTL management. By sharding our GSI, we distribute the write load across four shards to efficiently handle a peak throughput of 3,000 WCU, with the shard count chosen to include additional buffer capacity beyond the expected peak. Each session entry includes a TTL attribute to ensure timely expiration, and the GSI’s partition key (GSI_PK) is the random shard identifier to enable precise querying and deletion of expired sessions, maintaining optimal performance and data integrity.

The following example shows our SessionTable which we created using NoSQL Workbench:

Sample Table

The following example shows our TTL GSI.

Sample index

With this data model, we can now query each one of our shards using the GSI, applying a condition on the TTL timestamp attribute to identify records older than the current time. This enables us to efficiently find and evict expired sessions.

Prerequisites

Before diving into implementing the custom TTL solution, you should have the following prerequisites in place:

AWS account – Access to an active AWS account
DynamoDB basics – A foundational understanding of DynamoDB concepts, including tables, items, attributes, and basic CRUD operations, is necessary to effectively configure and manage the database
Sharding – A basic understanding of sharding techniques for DynamoDB.
Lambda functions – You should have familiarity with AWS Lambda, because you’ll be creating and deploying Lambda functions to query DynamoDB and run the custom TTL logic
EventBridge basics – Basic knowledge of Amazon EventBridge is necessary for setting up EventBridge Scheduler rules to invoke the Lambda function at specific intervals
AWS CLI or console proficiency – For configuring services and monitoring logs. We use the AWS Management Console throughout this post.

Create a DynamoDB table with a GSI

Our first step is to create a DynamoDB table with Amazon DynamoDB Streams enabled. Complete the following steps:

On the DynamoDB console, choose Tables in the navigation pane.
Choose Create table.
For Table name, enter a name for your new table.
For Partition key, enter PK as the name and choose String as the type.
For Sort key, enter SK as the name and choose String as the type.
For Table settings, select Customize settings.
The Customize settings option also allows you to define secondary indexes on the table, toggle deletion protection, switch encryption type, and add resource tags on table creation.
For Read/write capacity settings, make sure On-demand is selected.
Scroll down to the Secondary indexes section.
Choose Create global index.
For Partition key, enter GSI_PK as the name and choose String as the type.
For Sort key, enter TTL as the name and choose String as the type.
For Index name, enter a name for your new index.
For Attribute projections, select Only keys.
Choose Create index.
Leave all other configurations as default and choose Create table.

Create a Lambda function

Next, we configure a Lambda function that will be invoked by EventBridge Scheduler. This Lambda function is responsible for obtaining the items that are eligible for deletion and deleting those items. Complete the following steps:

On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
Select Author from scratch.
For Function name, enter a name (for example, StrictDataManagement).
For Runtime, choose Python 3.12.

Add a policy that allows the function to read from the GSI and write to the table:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadFromGSI",
            "Effect": "Allow",
            "Action": [
                "dynamodb:Query"
            ],
            "Resource": [
                "arn:aws:dynamodb:<region>:<account-id>:table/<table-name>",
                "arn:aws:dynamodb:<region>:<account-id>:table/<table-name>/index/<gsi-name>"
            ]
        },
        {
            "Sid": "WriteToTable",
            "Effect": "Allow",
            "Action": [
                "dynamodb:DeleteItem"
            ],
            "Resource": "arn:aws:dynamodb:<region>:<account-id>:table/<table-name>"
        }
    ]
}

Choose Create function.

On the Code tab of the Lambda function, replace the Lambda function code with the following code. Alter the constants, SHARDS, TABLE_NAME, and INDEX_NAME to suit your specific requirements.

import boto3
from datetime import datetime

# Constants
SHARDS = 4
TABLE_NAME = 'TTL-Table'
INDEX_NAME = 'TTL-index'

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)

def lambda_handler(event, context):
    current_time =  datetime.now().replace(second=0, microsecond=0).isoformat()
    
    for shard in range(SHARDS):
        shard_key = str(shard)
        query_and_delete_expired_items(shard_key, current_time)

def query_and_delete_expired_items(shard_key, current_time):
    last_evaluated_key = None
    
    
    while True:
        print(shard_key, current_time)
        if last_evaluated_key:
            response = table.query(
                IndexName=INDEX_NAME,
                KeyConditionExpression='GSI_PK = :shard AND #ttl < :current_time',
                ExpressionAttributeValues={
                    ':shard': shard_key,
                    ':current_time': current_time
                },
                 ExpressionAttributeNames={
                    '#ttl': 'TTL'
                },
                ExclusiveStartKey=last_evaluated_key
            )
        else:
            response = table.query(
                IndexName=INDEX_NAME,
                KeyConditionExpression='GSI_PK = :shard AND #ttl < :current_time',
                ExpressionAttributeValues={
                    ':shard': shard_key,
                    ':current_time': current_time
                }, 
                ExpressionAttributeNames={
                    '#ttl': 'TTL'
                }
            )

        items_to_delete = response.get('Items', [])
        if items_to_delete:
            delete_expired_items(items_to_delete)
        
        last_evaluated_key = response.get('LastEvaluatedKey')
        if not last_evaluated_key:
            break

def delete_expired_items(items):
    with table.batch_writer() as batch:
        for item in items:
            batch.delete_item(
                Key={
                    'PK': item['PK'],
                    'SK': item['SK']
                }
            )

Choose Deploy to deploy the latest function code.

The delete_expired_items function uses Boto3 batch_writer to perform batch deletions for efficiency. However, batch_writer does not support ConditionExpression, meaning there’s no way to check whether an item is still eligible for deletion at the time of the write. This can be risky in use cases where the TTL value may have changed between the initial read and the deletion attempt. To avoid accidentally deleting items that are no longer expired, it’s recommended to use the DeleteItem operation with a ConditionExpression that verifies the TTL value is still within the expected range.

Create an EventBridge Scheduler

Now we create an EventBridge schedule that invokes the Lambda function every 5 minutes. You can adjust this interval to align with your data eviction requirements.

On the EventBridge console, choose Schedules in the navigation pane.
Choose Create schedule.
For Schedule name, enter a name for your new schedule.
For Schedule pattern, select Recurring schedule.
For Schedule type, select Rate-based schedule.
For Rate expression, set Value to 5 and Unit to minutes.
For Flexible time window, choose Off.
Leave all other configurations as default and choose Next.
For Templated targets, select AWS Lambda Invoke.
For Lambda function, choose your Lambda function StrictDataManagement.
Leave all other configurations as default and choose Next.
For Action after schedule completion, choose NONE.
Leave all other configurations as default and choose Next.
Review your configuration and choose Create schedule.

Generate sample items to see the solution in action

You can test the solution by adding items to your DynamoDB table with TTL values. Here’s an example that creates 10 sample items with a TTL value using the AWS CLI:

#!/bin/bash
TABLE="TTL-Table"

for PK_VALUE in {1..10};
do
  ISO_TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
  GSI_PK_VALUE=$((RANDOM % 4))  # Generate a random shard value between 0 and 3
  aws dynamodb put-item --table-name $TABLE \
  --item '{"PK": {"S": "'$PK_VALUE'"}, "SK": {"S": "StaticSK"}, "GSI_PK": {"S": "'$GSI_PK_VALUE'"}, "TTL": {"S": "'$ISO_TIMESTAMP'"}, "SessionData": {"S": "{\"cart\": \"item'${PK_VALUE}'\"}"}}'
done

To monitor delete operations on your DynamoDB table, navigate to the Monitoring tab on the DynamoDB console. In this setup, the EventBridge schedule invokes a Lambda function every 5 minutes to fetch and delete items using the BatchWriteItem command. You can track the delete operations by viewing the SuccessfulRequestLatency metric for the BatchWriteItem operation, using the Sample Count statistic to see the number of delete invocations. For more details on DynamoDB metrics, refer to DynamoDB Metrics and dimensions.

The following graph shows that BatchWriteItem was called four times, an invocation for each of the shards defined for our use case.

Scheduler Metrics

Cost considerations

The cost of using this approach for 1,000,000 TTL items is estimated in the following table with a comparison to using the native DynamoDB functionality. Each DynamoDB Item is less than 1KB in size and stored in a table using on-demand mode in the us-east-1 Region. Free tiers are not considered for this analysis.

–

	Near-realtime TTL	Native DynamoDB TTL
DynamoDB global secondary index	Write: $0.63Read: $0.11	–
Lambda
EventBridge Scheduler	$1	–
DynamoDB Delete	$1.26	–
Total Cost	$2.43	$0

Clean up

If you created a test environment to follow along this post, make sure to:

Delete the DynamoDB table
Delete the Lambda function
Delete the EventBridge schedule
Delete any remaining IAM roles created during this process
Delete any other resources you created for testing the solution.

Summary

In this post, we discussed how DynamoDB GSIs combined with EventBridge and Lambda offer a robust solution for managing data expiration in real time, providing optimal application performance and efficient data handling.

In Part 3, we explore how EventBridge Scheduler can enable fine-grained scheduling of downstream events. This method provides precise future data management for your application, enhancing your event-driven architecture.

AWS Database Blog

Implement event-driven architectures with Amazon DynamoDB – Part 2

Solution overview

The need for sharding

GSI primary key selection

Cost-optimized indexing

Sample data model

Prerequisites

Create a DynamoDB table with a GSI

Create a Lambda function

Create an EventBridge Scheduler

Generate sample items to see the solution in action

Cost considerations

Clean up

Summary

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help