How can I back up a DynamoDB table to Amazon S3?

Last updated: 2020-06-16

How can I back up an Amazon DynamoDB table using Amazon Simple Storage Service (Amazon S3)?

Short Description

DynamoDB offers two built-in backup methods:

Both of these methods use Amazon S3. However, you don't have access to the S3 buckets that are used for these backups. To create backups that you can download locally or use in another AWS service, use AWS Data Pipeline, Amazon EMR, or AWS Glue.

Resolution

Data Pipeline

Use AWS Data Pipeline to export your table to an S3 bucket in the same account or in a different account. For more information, see Import and Export DynamoDB Data Using AWS Data Pipeline.

  • Pros: This is the easiest method. Choose this method when you want to make a one-time backup using the lowest amount of AWS resources possible. Data Pipeline uses Amazon EMR to create the backup, and the scripting is done for you. You don't have to learn Apache Hive or Apache Spark to accomplish this task.
  • Cons: This method isn't as customizable as the others. If you want to create continuous backups to Amazon S3, choose one of the other methods. It's also not the best practice to use if you want to use the backup in other AWS services.

Amazon EMR

Use Hive to export your data to an S3 bucket. For more information, see Exporting Data from DynamoDB. Or, use the open-source emr-dynamodb-connector to manage your own custom backup method in Spark or Hive.

  • Pros: These methods are the best practice to use if you're an active Amazon EMR user and are comfortable with Hive or Spark. These methods offer more control than the Data Pipeline method.
  • Cons: If you're new to Amazon EMR, these methods aren't a best practice. If you don't use Amazon EMR but you want a continuous, customizable solution, the AWS Glue method is the best practice—even if you're not familiar with AWS Glue.

AWS Glue

Use AWS Glue to copy your table to Amazon S3. For more information, see How to export an Amazon DynamoDB table to Amazon S3 using AWS Step Functions and AWS Glue.

  • Pros: This is the best practice to use if you want automated, continuous backups that you can also use in another service, such as Amazon Athena.
  • Cons: If you're not familiar with AWS Glue, this method might be challenging—but probably not as challenging as the Amazon EMR methods. This method is usually more expensive than the Data Pipeline method.

If none of these options offer the flexibility that you want, use the DynamoDB API to create your own solution.


Did this article help you?

Anything we could improve?


Need more help?