How can I back up a DynamoDB table to Amazon S3?
Last updated: 2020-12-17
How can I back up an Amazon DynamoDB table using Amazon Simple Storage Service (Amazon S3)?
DynamoDB offers two built-in backup methods:
- On-demand: Create backups when you choose.
- Point-in-time recovery: Enable automatic, continuous backups.
Both of these methods use Amazon S3. However, you don't have access to the S3 buckets that are used for these backups. The DynamoDB Export to S3 feature is the easiest way to create backups that you can download locally or use in another AWS service. If you need more customization, use AWS Data Pipeline, Amazon EMR, or AWS Glue instead.
DynamoDB Export to S3 feature
For an example of how to use the Export to S3 feature, see Export Amazon DynamoDB table data to your data lake in Amazon S3, no code writing required.
- Pros: This is the easiest method. This feature allows you to export data across AWS Regions and accounts without building custom applications or writing code. The exports don't affect the read capacity or the availability of your production tables. You can also set up automatic exports to make sure that your data is backed up on a regular schedule.
- Cons: This feature exports table data in DynamoDB JSON or Amazon Ion format only. If you want to export table data in a different format, use one of the following methods instead.
Use AWS Data Pipeline to export your table to an S3 bucket in the same account or in a different account. For more information, see Import and export DynamoDB data using AWS Data Pipeline.
- Pros: Data Pipeline uses Amazon EMR to create the backup, and the scripting is done for you. You don't have to learn Apache Hive or Apache Spark to accomplish this task.
- Cons: This method isn't as customizable as the others. If you want to create continuous backups to Amazon S3, choose one of the other methods. It's also not the best method to use if you want to use the backup in other AWS services.
Use Hive to export your data to an S3 bucket. For more information, see Exporting data from DynamoDB. Or, use the open-source emr-dynamodb-connector to manage your own custom backup method in Spark or Hive.
- Pros: If you're an active Amazon EMR user and are comfortable with Hive or Spark, these methods offer more control than the Data Pipeline and Export to S3 methods.
- Cons: If you're new to Amazon EMR, these methods aren't a best practice. If you don't use Amazon EMR but you want a continuous, customizable solution, the AWS Glue method is the best practice—even if you're not familiar with AWS Glue.
Use AWS Glue to copy your table to Amazon S3. For more information, see How to export an Amazon DynamoDB table to Amazon S3 using AWS Step Functions and AWS Glue.
- Pros: This is the best practice if you want a continuous, customizable solution that doesn't use Amazon EMR.
- Cons: If you're not familiar with AWS Glue, this method might be challenging—but probably not as challenging as the Amazon EMR methods. This method is usually more expensive than the Data Pipeline method.
If none of these options offer the flexibility that you want, use the DynamoDB API to create your own solution.