Archive to cold storage with Amazon DynamoDB

Amazon DynamoDB is a fully managed NoSQL database that provides fast and predictable performance with seamless scalability. Customers across multiple industries utilize DynamoDB as their database of choice.

Numerous education technology (EdTech) companies utilize DynamoDB as a persistent data store to track students’ exam scores and course progress. As students advance through various grades, their interaction with specific course materials and exam scores undergo changes. After completing a class or graduating, students significantly reduce their frequency of accessing past educational assets. Due to compliance or contractual obligations, EdTech companies must keep this data readily available to students for an extended period of time, often exceeding 5 years. This pattern extends beyond the education sector — customers across different industries face similar challenges in data access patterns. Consequently, there is a growing demand for cost-effective storage solutions in DynamoDB that maintain data accessibility.

DynamoDB organizes data into tables, offering two distinct storage classes: Standard and Standard Infrequent Access (Standard-IA). The Standard class is the default configuration and generally suitable for most workloads. On the other hand, the Standard-IA class is specifically designed for tables that store infrequently accessed data. This class provides a more cost-efficient price per gigabyte (GB) of data stored. As data ages and becomes less frequently accessed, migrating from a Standard table to a Standard-IA table becomes a cost-effective strategy. Organizations can save on storage costs while ensuring they have the same performance and integrations as the Standard DynamoDB table class. This approach allows businesses to strike a balance between cost optimization and data availability, making efficient use of DynamoDB’s storage capabilities. Customers that have DynamoDB tables where storage accounts for approximately 50% or more of their costs should consider moving their data to a Standard-IA table.

For a detailed pricing example between the DynamoDB Standard table class and Standard-IA table class you can view our pricing page’s example using different table classes.

In this post, we explore the process of creating a customized solution that uses Amazon DynamoDB Streams, DynamoDB Time to Live (TTL), and AWS Lambda to archive data from a Standard DynamoDB table to a Standard-IA table.

Solution overview

By combining the power of DynamoDB Streams and Lambda, we can capture changes made to the Standard table and trigger specific actions based on those changes. With the help of TTL, we can automatically mark data as expired in the Standard table when it reaches a certain age and generate a record in DynamoDB streams containing the expired data. Then, with Lambda event filtering, we can selectively process only the expired data events from the DynamoDB streams. This filtering mechanism allows us to efficiently handle and migrate the expired data to the Standard-IA table while avoiding unnecessary processing and costs.

The following diagram illustrates the solution architecture.

The workflow contains the following steps:

DynamoDB TTL deletes expired items from DynamoDB Standard tables based on an item attribute.
DynamoDB Streams generates stream records containing the expired items.
Lambda processes the deletion event from DynamoDB Streams. With Lambda event filtering, Lambda is only invoked by deletion events from DynamoDB TTL.
The data is written to the DynamoDB Standard-IA table.

Delete data with DynamoDB TTL

DynamoDB TTL offers a convenient way to manage the lifecycle of your data in DynamoDB. With TTL, you can assign a timestamp to each item in your table, indicating when it is considered expired or no longer needed. After the specified timestamp, DynamoDB automatically removes the item from the table, eliminating the need for you to manually delete it. The primary benefit of TTL is that it allows you to reduce stored data volumes by eliminating outdated or irrelevant items with no operational overhead. This can be particularly useful in scenarios like the one outlined earlier, where you have large amounts of data that become outdated over time. You can keep your table lean and ensure that you’re only retaining the most relevant and current data for your workload by automatically removing expired items.

Importantly, DynamoDB deletes expired items without consuming any write throughput, meaning you won’t incur additional costs or impact performance when removing outdated data. When utilizing DynamoDB global tables with TTL, DynamoDB replicates TTL deletes across all replicas. This incurs write costs as per your table’s configured capacity mode and table class for replicated TTL deletes in each replica region.

Overview of DynamoDB Streams

DynamoDB Streams provides a time-ordered log containing changes to items in a DynamoDB table. When an application creates, updates, or deletes an item in a table, a record of the modification is written to the table’s corresponding stream.

By default, DynamoDB Streams collects the following actions performed on DynamoDB items:

INSERT – A new item was added to the table
MODIFY – One or more of an existing item’s attributes were modified
REMOVE – An item was deleted from the table

Users can choose what data to capture from the following options:

Key attributes only – Only the key attributes of the modified item
New image – The entire item, as it appears after it was modified
Old image – The entire item, as it appeared before it was modified
New and old images – Both the new and the old images of the item

With DynamoDB Streams, you can natively collect information of the items that are expired by TTL that can be used for further processing.

When items are expired with DynamoDB TTL, they create a record in DynamoDB streams with the following fields:

Records[<index>].userIdentity.type "Service"
Records[<index>].userIdentity.principalId "dynamodb.amazonaws.com"

These properties can then be added as an event filter for Lambda functions as seen below:

"Records": [
  ...
  {
    "userIdentity": {
      "type": "Service",
      "principalId": "dynamodb.amazonaws.com"
    }
  }
  ...
]

By utilizing this event filter, customers can make sure Lambda functions are only invoked from DynamoDB TTL deletes. This results in fewer invocations and larger cost savings.

Use Lambda to write to DynamoDB Standard-IA tables

Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda runs code on a highly available compute infrastructure in response to an event, and you are charged only for the resources consumed. Lambda has out-of-the-box integrations with a variety of AWS services, including DynamoDB Streams. You can add DynamoDB Streams as a trigger for a Lambda function to put data in the Standard-IA table. An insert, update, or delete of an item will be recorded in DynamoDB Streams and trigger a Lambda function in response. To avoid unnecessary processing, you can utilize Lambda event filtering to only respond to the events you desire. DynamoDB Streams can integrate with a variety of AWS services, including Lambda. Additionally, you are not charged for GetRecords API calls invoked by Lambda as part of consuming data from DynamoDB Streams. Standard charges for Lambda invocation duration will apply as per AWS Lambda Pricing.

Conclusion

In this post, we discussed archiving data from DynamoDB Standard tables to DynamoDB Standard-IA tables using DynamoDB TTL, DynamoDB Streams, and Lambda with event filtering. By taking advantage of AWS services and their native integrations, you can build a fully managed and cost-effective solution to archive data within DynamoDB Standard-IA tables. Customers who want to maintain accessibility to their data through DynamoDB API’s while savings costs from storing cold data should implement the above solution.

Join the conversation! Your feedback and experiences are invaluable to us and our community. Dive into the comments below to share your insights, ask questions, or offer alternative viewpoints. Let’s collaboratively enhance our understanding! For more information about using DynamoDB, please see the developer guide.

About the Authors

Andrew Chen is an Edtech Solutions Architect with an interest in data analytics, machine learning, and virtualization of infrastructure. Andrew has previous experience in management consulting in which he worked as a technical lead for various cloud migration projects. In his free time, Andrew enjoys fishing, hiking, kayaking, and keeping up with financial markets.

Lee Hannigan, is a Sr. DynamoDB Specialist Solutions Architect based in Donegal, Ireland. He brings a wealth of expertise in distributed systems, backed by a strong foundation in big data and analytics technologies. In his role as a DynamoDB Specialist Solutions Architect, Lee excels in assisting customers with the design, evaluation, and optimization of their workloads leveraging DynamoDB’s capabilities.

AWS Database Blog