AWS Storage Blog

How to use AWS DataSync to migrate data between Amazon S3 buckets

Customers often need to move data between Amazon S3 buckets as part of their business operations, account consolidation process, or for business continuity purposes. S3 buckets with millions of existing objects or petabytes of existing data have previously required deploying clusters of compute to move data at scale. This not only required infrastructure to be deployed and managed, but also required continuous monitoring to ensure completion of the transfer.

Data can now be moved easily between S3 buckets for one time migrations or periodic scheduled transfers with AWS DataSync, reducing operational overhead and infrastructure costs. DataSync is a fully managed solution that enables the transfer of data between S3 buckets with just a few clicks in the console, and without managing any additional infrastructure.

In this blog post, I demonstrate setting up an AWS DataSync task to transfer objects from one S3 bucket to another within an AWS account, across AWS Regions, and across different AWS accounts without deploying a DataSync agent on Amazon EC2.

Copying objects within the same Amazon S3 account

Log in to the AWS Management Console, navigate to the DataSync page, select Tasks on the left menu bar, then choose Create task. For the source location, select Create a new location, and from the Location type dropdown select Amazon S3.

(1) Creating a DataSync task - For the source location, select Create a new location, and from the Location type dropdown select Amazon S3.

Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button.

(updated) Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button

For the destination location, select Create a new location, and from the Location type dropdown select Amazon S3.

Creating a DataSync task - For the destination location, select Create a new location, and from the Location type dropdown select Amazon S3.

Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button.

(1) Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button

Provide your task with a name and configure it to your specifications. When complete, choose Next:

Provide your task with a name and configure it to your specifications (Copying objects within the same Amazon S3 account)

Lastly, review your configurations and select Create task. You’re now ready to execute your task and start copying objects. Note that cross-Region S3 transfer rates still apply. This methodology can also be applied to transfer objects between S3 buckets inside the same Region.

Copying objects across accounts

When using DataSync to copy objects between S3 buckets across different accounts, you’ll need to create the necessary IAM roles in the account where the destination S3 bucket is located. Doing this allows DataSync to apply the correct object owner as it transfers the objects into the destination bucket.

Log in to the destination account and create an IAM role for the AWS DataSync service to access objects in the source S3 bucket. Create a new IAM role and attach a new IAM policy for the source S3 bucket location:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::SOURCEBUCKET"
        },
        {
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:PutObjectTagging",
                "s3:GetObjectTagging",
                "s3:PutObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::SOURCEBUCKET/*"
        }
    ]
}

Add the following trust relationship to the IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "datasync.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

In subsequent steps, you’ll need to run a command via the AWS CLI to create the source S3 bucket location for DataSync (configuring a cross-account location is currently not supported in the AWS DataSync console). Note the ARN of the IAM user or role you’ll be using to create the DataSync location from the AWS CLI and enter it on the following source S3 bucket policy. Copy the ARN for the IAM role you created for the source S3 bucket location. Now, log in to the source account. Open the source S3 bucket policy and apply the following policy to grant permissions for the IAM role to access the objects:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketPolicyForDataSync",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                "arn:aws:iam::DEST-ACCOUNT-ID:role/DEST-ACCOUNT-ROLE",
                "arn:aws:iam::DEST-ACCOUNT-ID:role/DEST-ACCOUNT-USER"]
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:GetObjectTagging",
                "s3:PutObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::SOURCEBUCKET",
                "arn:aws:s3:::SOURCEBUCKET/*"
            ]
        }
    ]
}

Now, launch the AWS CLI and ensure you’re using the same IAM identity you specified in the source S3 bucket policy created in the preceding step. For example, if you’re using an access key ID and secret access key with AWS CLI, this maps back to an IAM user.  From the AWS CLI, type the following command to get the ARN of the identity being used:

aws sts get-caller-identity

In case you do not have the AWS CLI installed on your local computer, it can easily be accessed through the AWS CloudShell.

aws datasync create-location-s3 --s3-bucket-arn arn:aws:s3:::SOURCEBUCKET --s3-config 
'{"BucketAccessRoleArn":"arn:aws:iam::DEST-ACCOUNT-ID:role/DEST-ACCOUNT-ROLE"}'

If the command ran successfully, you’ll get a result similar to the following:

{
"LocationArn": "arn:aws:datasync:Region:DEST-ACCOUNT-ID:location/loc-xxxxxxxxxxxxxx"
}

Now that the source S3 bucket location has been created, log in to the destination account, create the destination bucket location, and select Autogenerate to create the IAM policy for this location:

Log in to the destination account, create the destination bucket location, and select Autogenerate to create the IAM policy for this location

Once both the source and destination locations have been created, navigate to Tasks under the DataSync page and select Create task. First, select the source location, then select Next:

Once both the source and destination locations have been created, navigate to Tasks under the DataSync page and select Create task

Next select the destination location:

Select the destination location

Provide your task with a name and configure to your specifications. When complete, choose Next:

Provide your task with a name and configure to your specifications (Copying objects across accounts)

Lastly review your configurations and select Create task. You’re now ready to execute your task and start copying objects from the source S3 bucket to your destination S3 bucket.

Conclusion

In this blog post, I explored a step-by-step configuration of a DataSync task that copies objects from one S3 bucket to another without deploying an agent on EC2. Additional steps provided guidance on how to configure tasks for cross-Region and cross-account use cases.

Customers can benefit from easily migrating data between S3 buckets without managing additional infrastructure, saving operational time, and reducing the complexity of moving data at any scale. Try migrating objects between your own Amazon S3 buckets using AWS DataSync, today.