How can I use Data Pipeline to back up a DynamoDB table to an S3 bucket that is in a different account?

Last updated: 2019-07-09

I want to use AWS Data Pipeline to back up an Amazon DynamoDB table to an Amazon Simple Storage Service (Amazon S3) bucket that is in a different AWS account.

Short Description

1.    In the source account, attach an AWS Identity and Access Management (IAM) policy that grants Amazon S3 permissions to the DataPipelineDefaultRole and DataPipelineDefaultResourceRole roles.

2.    In the destination account, create a bucket policy that allows the DataPipelineDefaultRole and DataPipelineDefaultResourceRole roles in the source account to access the S3 bucket.

3.    In the source account, create a pipeline using the Export DynamoDB table to S3 Data Pipeline template.

4.    Add the BucketOwnerFullControl or AuthenticatedRead canned access control list (ACL) to the Step field of the pipeline's EmrActivity object.

5.    Activate the pipeline to back up the DynamoDB table to the S3 bucket in the destination account.

6.    To restore the table in the destination account, create a pipeline using the Import DynamoDB Backup Data from S3 Data Pipeline template.

Resolution

Attach an IAM policy to the Data Pipeline default roles

1.    In the source account, open the IAM console.

2.    Choose Policies, and then choose Create policy.

3.    Choose the JSON tab, and then enter an IAM policy similar to the following. Replace bucket_in_s3_account with the name of the S3 bucket in the destination account.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_in_s3_account/*",
                "arn:aws:s3:::bucket_in_s3_account"     
            ]
        }
    ]
}

4.    Choose Review policy.

5.    Enter a Name for the policy, and then choose Create policy.

6.    In the list of policies, select the check box next to the name of the policy that you just created. You can use the Filter menu and the search box to filter the list of policies.

7.    Choose Policy actions, and then choose Attach.

8.    Select DataPipelineDefaultRole and DataPipelineDefaultResourceRole, and then choose Attach policy.

Add a bucket policy to the S3 bucket

In the destination account, create a bucket policy similar to the following. Replace these values in the following example:

  • 111122223333: the ID of the Data Pipeline account. For more information, see Finding Your AWS Account ID.
  • bucket_in_s3_account: the name of the S3 bucket
{
    "Version": "2012-10-17",
    "Id": "",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:*",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::111122223333:role/DataPipelineDefaultRole",
                    "arn:aws:iam::111122223333:role/DataPipelineDefaultResourceRole"
                ]
            },
            "Resource": [
                "arn:aws:s3:::bucket_in_s3_account",
                "arn:aws:s3:::bucket_in_s3_account/*"
            ]
        }
    ]
}

Create and activate the pipeline

1.    In the source account, create a pipeline using the Export DynamoDB table to S3 Data Pipeline template:

In the Parameters section, enter the Source DynamoDB table name and the Output S3 folder. Use the format s3://bucket_in_s3_account/ for the bucket. In the Security/Access section, for IAM roles, choose Default.

2.    Before you Activate the pipeline, choose Edit in Architect.

3.    Open the Activities section, and then find the EmrActivity object.

4.    In the Step field, add the BucketOwnerFullControl or AuthenticatedRead canned access control list (ACL). These canned ACLs give the Amazon EMR Apache Hadoop job permissions to write to the S3 bucket in the destination account. Be sure to use the format -Dfs.s3.canned.acl=BucketOwnerFullControl. Put the statement between org.apache.hadoop.dynamodb.tools.DynamoDbExport and #{output.directoryPath}. Example:

s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,-Dfs.s3.canned.acl=BucketOwnerFullControl,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}

5.    Choose Save, and then choose Activate to activate the pipeline and back up the DynamoDB table to the S3 bucket in the destination account.

Restore the backup in the destination account

In the destination account, create a pipeline using the Import DynamoDB Backup Data from S3 Data Pipeline template:

  • In the Parameters section, for Input S3 folder, enter the S3 bucket where the DynamoDB backup is stored.
  • In the Security/Access section, for IAM roles, choose Default.

When you activate the pipeline, Data Pipeline creates a DynamoDB table from the backup.


Did this article help you?

Anything we could improve?


Need more help?