How can I use AWS Data Pipeline to back up an Amazon DynamoDB table to an Amazon Simple Storage Service (Amazon S3) bucket that is in a different AWS account?

To set up cross-account access for an Amazon S3 bucket, attach an AWS Identity and Access Management (IAM) policy that grants Amazon S3 permissions to the DataPipelineDefaultRole and DataPipelineDefaultResourceRole roles. Then, create a bucket policy that allows those roles to access the bucket.

Attach an IAM policy to the Data Pipeline default roles

1.    In the Data Pipeline account, open the IAM console.

2.    Choose Policies, and then choose Create policy.

3.    Choose the JSON tab, and then enter an IAM policy similar to the following. Replace bucket_in_s3_account with the name of the S3 bucket that you want to access with Data Pipeline.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_in_s3_account/*",
                "arn:aws:s3:::bucket_in_s3_account"     
            ]
        }
    ]
}

4.    Choose Review policy.

5.    Enter a Name for the policy, and then choose Create policy.

6.    In the list of policies, select the check box next to the name of the policy that you just created. You can use the Filter menu and the search box to filter the list of policies.

7.    Choose Policy actions, and then choose Attach.

8.    Select DataPipelineDefaultRole and DataPipelineDefaultResourceRole, and then choose Attach policy.

Add a bucket policy to the S3 bucket

Create a bucket policy similar to the following for the destination bucket. Replace the following values:

  • 111122223333: the ID of the Data Pipeline account. For more information, see Finding Your AWS Account ID.
  • bucket_in_s3_account: the name of the S3 bucket.
{
    "Version": "2012-10-17",
    "Id": "",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:*",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::111122223333:role/DataPipelineDefaultRole",
                    "arn:aws:iam::111122223333:role/DataPipelineDefaultResourceRole"
                ]
            },
            "Resource": [
                "arn:aws:s3:::bucket_in_s3_account",
                "arn:aws:s3:::bucket_in_s3_account/*"
            ]
        }
    ]
}

When you activate and run your pipeline, you should be able to export DynamoDB tables to the S3 bucket.

(Optional) Add bucket owner permissions

When Data Pipeline writes files to an S3 bucket in a different account, the bucket owner account doesn't automatically have permission to access those files. To add access permissions for the bucket owner account, complete the following steps in the Data Pipeline account:

1.    Open the Data Pipeline console.

2.    On the List Pipelines page, choose your Pipeline ID, and then choose Edit Pipeline to open the Architect page.

3.    Open the Activities section, and then find the EmrActivity object.

4.    In the Add an optional field list, choose Post Step Command.

5.    Enter a command similar to the following. Replace bucket_in_s3_account with the name of the S3 bucket.

aws s3 cp s3://bucket_in_s3_account/ --recursive --acl bucket-owner-full-control --storage-class STANDARD

(You can also use the list-buckets and put-object-acl commands to change permissions for each object. If there are multiple objects in the bucket, the shell command is faster.)

6.    Choose Save. When you activate and run your pipeline, Amazon EMR writes data to the S3 bucket. The bucket owner can then access the files.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2019-03-13