Cross-account replication with Amazon DynamoDB
Hundreds of thousands of customers use Amazon DynamoDB for mission-critical workloads. In some situations, you may want to migrate your DynamoDB tables into a different AWS account, for example, in the eventuality of a company being acquired by another company. Another use case is adopting a multi-account strategy, in which you have a dependent account and want to replicate production data in DynamoDB to this account for development purposes. Finally, for disaster recovery, you can use DynamoDB global tables to replicate your DynamoDB tables automatically across different AWS Regions, thereby achieving sub-minute Recovery Time and Point Objectives (RTO and RPO). However, you might want to replicate not only to a different Region, but also to another AWS account.
In this post, we cover a cost-effective method to migrate and sync DynamoDB tables across accounts while having no impact on the source table performance and availability.
Overview of solution
We split this article into two main sections: initial migration and ongoing replication. We complete the initial migration by using a new feature that allows us to export DynamoDB tables to any Amazon Simple Storage Service (Amazon S3) bucket and use an AWS Glue job to perform the import. For ongoing replication, we use Amazon DynamoDB Streams and AWS Lambda to replicate any subsequent INSERTS, UPDATES, and DELETES. The following diagram illustrates this architecture.
The new native export feature leverages the point in time recovery (PITR) capability in DynamoDB and allows us to export a 1.3 TB table in a matter of minutes without consuming any read capacity units (RCUs), which is considerably faster and more cost-effective than what was possible before its release.
Alternatively, for smaller tables that take less than 1 hour to migrate (from our tests, tables smaller than 140 GB), we can use an AWS Glue job to copy the data between tables without writing into an intermediate S3 bucket. Step-by-step instructions to deploy this solution are available in our GitHub repository.
Exporting the table with the native export feature
To export the DynamoDB table to a different account using the native export feature, we first need to grant the proper permissions by attaching two AWS Identity and Access Management (IAM) policies: one S3 bucket policy and one identity-based policy on the IAM user who performs the export, both allowing write and list permissions.
The following code is the S3 bucket policy (target account):
The following code is the IAM user policy (source account):
For instructions on performing the export, see New – Export Amazon DynamoDB Table Data to Your Data Lake in Amazon S3, No Code Writing Required. When doing the export, you can choose the output format in either DynamoDB JSON or Amazon Ion. In this post, we choose DynamoDB JSON.
The files are exported in the following S3 location:
After the export has finished, the objects are still owned by the user in the source account, so no one in the target account has permissions to access them. To fix this, we can change the owner by using the
bucket-owner-full-control ACL. We use the AWS Command Line Interface (AWS CLI) in the source account and the following command to list all the objects in the target S3 bucket and output the object keys to a file:
Then, we created a bash script to go over every line and update the owner of each object using the put-object-acl command. Edit the script by changing the path of the file, <nameofyourbucket> and run the script.
Importing the table
Now that we have our data exported, we use an AWS Glue job to read the compressed files from the S3 location and write them to the target DynamoDB table. The job requires a schema containing metadata in order to know how to interpret the data. The AWS Glue Data Catalog is a managed service that lets you store, annotate, and share metadata in the AWS Cloud. After the data is cataloged, it’s immediately available for querying and transformation using Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, and AWS Glue. To populate the Data Catalog, we use an AWS Glue crawler to infer the schema and create a logical table on top of our recently exported files. For more information on how configure the crawler, see Defining Crawlers.
Most of the code that the job runs can be generated by AWS Glue Studio, so we don’t have to type all the existing fields manually. For instructions, see Tutorial: Getting started with AWS Glue Studio. In this post, we focus on just two sections of the code: the data transformation and the sink operation. Our GitHub repo has the full version of the code.
The following is the data transformation snippet of the generated code:
Now we have to make sure all the key names, data types, and nested objects have the same values and properties as in the source. For example, we need to change the key name
date type from
map, and so on. The location object contains nested JSON and again, we have to make sure the structure is respected in the target as well. Our snippet looks like the following after all the required code changes are implemented:
Another essential part of our code is the one that allows us to write directly to DynamoDB. Here we need to specify several parameters to configure the sink operation. One of these parameters is dynamodb.throughput.write.percent, which allows us to specify what percentage of write capacity the job should use. For this post, we choose 1.0 for 100% of the available WCUs. Our target table is configured using provisioned capacity and the only activity on the table is this initial import. Therefore, we configure the AWS Glue job to consume all write capacity allocated to the table. For on-demand tables, AWS Glue handles the write capacity of the table as 40,000.
This is the code snippet responsible for the sink operation:
Finally, we start our import operation with an AWS Glue job backed by 17 standard workers. Because the price difference was insignificant even when we used half this capacity, we chose the number of workers that resulted in the shortest import time. The maximum number of workers correlates with the table’s capacity throughput limit, so there is a theoretical ceiling based on the write capacity of the target table.
The following graph shows that we’re using DynamoDB provisioned capacity (which can be seen as the red line) because we know our capacity requirements and can therefore better control cost.
We requested a write capacity limit increase using AWS Service Quota to double the table default limit of 40,000 WCUs so the import finishes faster. DynamoDB account limits are soft limits that can be raised by request if you need to increase the speed at which data is exported and imported. There is virtually no limit on how much capacity you request, but each request is subject to review by the DynamoDB service.
Our AWS Glue job took roughly 9 hours and 20 minutes, leaving us with a total migration time of 9 hours and 35 minutes. It’s considerably faster when compared to the total of 14 hours migration using Data Pipeline, for a lower cost.
After the import finishes, change the target DynamoDB table write capacity to either one of the following options based on the target table’s use case:
- On-demand – Choose this option if you don’t start the ongoing replication immediately after the initial migration or if the target table is a development table.
- The same WCU you have in the source table – If you’re planning to start the ongoing replication immediately after the initial migration finishes, this is the most cost-effective option. Also, if this is a DR use case, use this option to match the target throughput capacity with the source.
To ensure data integrity across both tables, the initial (full load) migration should be completed before enabling ongoing replication. In the ongoing replication process, any item-level modifications that happened in the source DynamoDB table during and after the initial migration are captured by DynamoDB Streams. DynamoDB streams store these time-ordered records for 24 hours. Then, a Lambda function reads records from the stream and replicates those changes to the target DynamoDB table. The following diagram (option 1) depicts the ongoing replication architecture.
However, if the initial migration takes more than 24 hours, we have to use Amazon Kinesis Data Streams instead. In our case, we migrated 1.3 TB in just under 10 hours. Therefore, if the table you’re migrating is bigger than 3 TB (with the DynamoDB table write limit increased to 80,000 WCUs), the initial migration part could take more than 24 hours. In this case, use Kinesis Data Streams as a buffer to capture changes to the source table, thereby extending the retention from 1 day to 365 days. The following diagram (option 2) depicts the ongoing replication architecture if we use Kinesis Data Streams as a buffer.
All updates happening on the source table can be automatically copied to a Kinesis data stream using the new Kinesis Data Streams for DynamoDB feature. A Lambda function reads records from the stream and replicates those changes to the target DynamoDB table.
In this post, we use DynamoDB Streams (option 1) to capture the changes on the source table. The ongoing replication solution is available in the GitHub repo. We use an AWS Serverless Application Model (AWS SAM) template to create and deploy the Lambda function that processes the records in the stream. This function assumes an IAM role in the target account to write modified and new items to the target DynamoDB table. Therefore, before deploying the AWS SAM template, complete the following steps in the target account to create the IAM role:
- On the IAM console, choose Roles in the navigation pane.
- Choose Create role.
- Select Another AWS account.
- For Account ID, enter the source account number.
- Choose Next.
- Create a new policy and copy the following permissions to the policy. Replace <Target_DynamoDB_table_ARN> with the target DynamoDB table ARN.
- Return to the role creation wizard and refresh the list of policies.
- Chose the newly created policy.
- Choose Next.
- Enter the target role name.
- Choose Create role.
Deploying and running the ongoing replication solution
Follow the instructions in the GitHub repo to deploy the template after the initial migration is finished. When deploying the template, you’re prompted to enter several parameters including but not limited to
TargetRoleName (the IAM role created in last step) and
SourceTableStreamARN. For more information, see Parameter Details in the GitHub repo.
One of the most important parameters is the
MaximumRecordAgeInSeconds, which defines the oldest record in the stream that the Lambda function starts processing. If DynamoDB Streams was enabled only a few minutes before starting the initial export, set this parameter to -1 to process all records in the stream.
If you didn’t have to turn on the stream because it was already enabled, set the
MaximumRecordAgeInSeconds parameter to a few minutes (2–5 minutes) before the initial export starts. Otherwise, Lambda processes items that were already copied during the migration step, thereby consuming unnecessary Lambda resources and DynamoDB write capacity. For example, let’s assume you started the initial export at 2:00 PM and it took 1 hour, finishing at 3:00 PM. After that, the import has started and took 7 hours to complete. If you deploy the template at 10:00 PM, set the age to 28,920 seconds (8 hours, 2 minutes).
The deployment creates a Lambda function that reads from the source DynamoDB Streams and writes to the table in the target account. It also creates a disabled DynamoDB event source mapping. The reason why this was disabled is because the moment we enable it, the function starts processing records in the stream automatically. Because we should start the ongoing replication only after the initial migration finishes, we need to control when to enable the trigger.
To start the ongoing replication, enable the event source mapping on the Lambda console, as shown in the following screenshot.
Alternatively, you can also use the update-event-source-mapping command to enable the trigger or change any of the settings, such as
Verifying the number of records in the source and target tables
To verify the number of records in both the source and target tables, check the Item summary section on the DynamoDB console, as shown in the following screenshot.
You can also use the following command to determine the number of items as well as the size of the source and target tables:
DynamoDB updates the size and item count values approximately every 6 hours.
Alternatively, you can run an item count on the DynamoDB console. This operation does a full scan on the table to retrieve the current size and item count, and therefore it’s not recommended to run this action on large tables.
Delete the resources you created if you no longer need them:
- Delete the IAM roles we created.
- Disable DynamoDB Streams.
- Disable PITR in the source table.
- Delete the AWS SAM template to delete the Lambda functions:
- Delete the AWS Glue job, tables, and database.
In this post, we showcased the fastest and most cost-effective way to migrate DynamoDB tables between AWS accounts, using the new DynamoDB export feature along with AWS Glue for the initial migration, and Lambda in conjunction with DynamoDB Streams for the ongoing replication. Should you have any questions or suggestions, feel free to reach out to us on GitHub and we can take the conversation further. Until next time, enjoy your cloud journey!
About the Authors
Ahmed Zamzam is a Solutions Architect with Amazon Web Services. He supports SMB customers in the UK in their digital transformation and their cloud journey to AWS, and specializes in Data Analytics. Outside of work, he loves traveling, hiking, and cycling.
Dragos Pisaroc is a Solutions Architect supporting SMB customers in the UK in their cloud journey, and has a special interest in big data and analytics. Outside of work, he loves playing the keyboard and drums, as well as studying psychology and philosophy.