AWS Storage Blog

How to use AWS DataSync to migrate data between Amazon S3 buckets

Update (6/14/2022): The “Copying objects across accounts” section has been updated to reflect the new Amazon S3 Object Ownership feature, an S3 bucket-level setting that you can use to disable access control lists (ACLs) and take ownership of every object in your bucket. You no longer need to configure your cross-account AWS DataSync task to ensure that your destination account owns all of the objects copied over to its S3 bucket. Now, you can just use S3 Object Ownership to ensure that your destination account automatically owns all of the objects copied over to its S3 bucket.


Customers often need to move data between Amazon S3 buckets as part of their business operations, account consolidation process, or for business continuity purposes. S3 buckets with millions of existing objects or petabytes of existing data have previously required deploying clusters of compute to move data at scale. This not only required infrastructure to be deployed and managed, but also required continuous monitoring to ensure completion of the transfer.

Data can now be moved easily between S3 buckets for one time migrations or periodic scheduled transfers with AWS DataSync, reducing operational overhead and infrastructure costs. DataSync is a fully managed solution that enables the transfer of data between S3 buckets with just a few clicks in the console, and without managing any additional infrastructure.

In this blog post, I demonstrate setting up an AWS DataSync task to transfer objects from one S3 bucket to another within an AWS account, across AWS Regions, and across different AWS accounts without deploying a DataSync agent on Amazon EC2.

Copying objects within the same Amazon S3 account

Log in to the AWS Management Console, navigate to the DataSync page, select Tasks on the left menu bar, then choose Create task. For the source location, select Create a new location, and from the Location type dropdown select Amazon S3.

(1) Creating a DataSync task - For the source location, select Create a new location, and from the Location type dropdown select Amazon S3.

Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button.

(updated) Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button

For the destination location, select Create a new location, and from the Location type dropdown select Amazon S3.

Creating a DataSync task - For the destination location, select Create a new location, and from the Location type dropdown select Amazon S3.

Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button.

(1) Select your Region, S3 bucket, S3 storage class, and Folder. For the IAM role, select the Autogenerate button

Provide your task with a name and configure it to your specifications. When complete, choose Next:

Provide your task with a name and configure it to your specifications (Copying objects within the same Amazon S3 account)

Lastly, review your configurations and select Create task. You’re now ready to execute your task and start copying objects. Note that cross-Region S3 transfer rates still apply. This methodology can also be applied to transfer objects between S3 buckets inside the same Region.

Copying objects across accounts

In this scenario, we have two Amazon S3 buckets residing in different accounts. Account A contains the source S3 bucket and Account B the destination S3 bucket. When utilizing AWS DataSync to copy objects between S3 buckets across different accounts, you’ll need to create a new AWS Identity and Access Management (IAM) role in Account A to be referenced in the Account B destination-bucket policy.

Log in to Account A and create an IAM role that provides AWS DataSync service permissions to access the bucket in Account B:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::account-b-bucket"
    },
    {
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListMultipartUploadParts",
        "s3:PutObject",
        "s3:GetObjectTagging",
        "s3:PutObjectTagging"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::account-b-bucket/*"
    }
  ]
}

Replace arn:aws:s3:::account-b-bucket with the ARN of the Account B destination bucket. Take note of the ARN of this role as you will reference it in the Account B destination-bucket policy in later steps.

Add the following trust relationship to the IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "datasync.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To ensure that Account B has access to the data AWS DataSync is copying over, disable access control lists (ACLs) on the destination S3 bucket. As DataSync copies new objects to the destination bucket in Account B, Account B will take ownership of the objects.

Log in to Account B and navigate to the Permissions tab on the destination bucket. Under Object Ownership choose Edit and select ACLs disabled (recommended):

Under Object Ownership choose Edit and select ACLs disabled (recommended)

In subsequent steps, you’ll need to run a command via the AWS Command Line Interface (AWS CLI) in Account A to create the Account B destination S3 bucket location for AWS DataSync (configuring a cross-account location is currently not supported in the DataSync console).

Log in to Account B, open the destination S3 bucket policy, and apply the following policy to grant permissions for the IAM role to write the objects DataSync is copying over:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "DataSyncCreateS3LocationAndTaskAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account-a-id:role/name-of-role"
      },
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:AbortMultipartUpload",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListMultipartUploadParts",
        "s3:PutObject",
        "s3:GetObjectTagging",
        "s3:PutObjectTagging"
      ],
      "Resource": [
        "arn:aws:s3:::account-b-bucket",
        "arn:aws:s3:::account-b-bucket/*"
      ]
    },
    {
      "Sid": "DataSyncCreateS3Location",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account-a-id:user/name-of-user"
      },
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::account-b-bucket"
    }
  ]
}

Replace arn:aws:iam::account-a-id:role/name-of-role with the ARN of the Account A IAM role you created in previous steps. Replace arn:aws:s3:::account-b-bucket with the ARN of the Account B destination bucket. Replace arn:aws:iam::account-a-id:user/name-of-user with the ARN of the Account A IAM user or role you will use in the next step to execute the AWS CLI to create the Account B destination-bucket location.

Now, log in to Account A. Launch the AWS CLI and ensure you’re using the same IAM identity you specified in the Account B destination S3 bucket policy that you replaced arn:aws:iam::account-a-id:user/name-of-user with.

If you’re using an access key ID and secret access key with AWS CLI, this maps back to an IAM user or role. From the AWS CLI, type the following command to get the ARN of the identity being used:

aws sts get-caller-identity

In case you do not have the AWS CLI installed on your local computer, you can easily access it through the AWS CloudShell.

Note: If the bucket in Account B resides in a different Region than the bucket in Account A, add the --region option at the end of the command to specify the Region of the Account B bucket. For example, --region us-west-2.

aws datasync create-location-s3 --s3-bucket-arn arn:aws:s3:::account-b-bucket --s3-config '{"BucketAccessRoleArn":"arn:aws:iam::account-a-id:role/name-of-role"}'

If the command ran successfully, you’ll get a result similar to the following:

{
"LocationArn": "arn:aws:datasync:Region:ACCOUNT-A-ID:location/loc-xxxxxxxxxxxxxx"
}

Now that you have created the Account B destination S3 bucket location, log in to Account A and select the Region that the Account A source bucket resides in. Create the source bucket location, and select Autogenerate to create the IAM policy for this location:

Create the source bucket location, and select Autogenerate to create the IAM policy for this location

Note: If the bucket in Account B resides in a different Region than the bucket in Account A, create the DataSync task inside Account A but from the same Region as the Account B bucket.

Once you have created both the source and destination locations, navigate to Tasks under the DataSync page and select Create task. First, select the source location, then select Next:

Create the source bucket location, and select Autogenerate to create the IAM policy for this location
Next select the destination location:

Select the destination location

Provide your task with a name and configure to your specifications. When complete, choose Next:

Provide your task with a name and configure to your specifications. When complete, choose *Next*

Lastly review your configurations and select Create task. You’re now ready to execute your task and start copying objects from the source S3 bucket to your destination S3 bucket.

Note: To ensure you have permission to create and execute the task, create the task using the Account A IAM identity of the user or role that is specified in the Account B destination-bucket policy.

Conclusion

In this blog post, I explored a step-by-step configuration of a DataSync task that copies objects from one S3 bucket to another without deploying an agent on EC2. Additional steps provided guidance on how to configure tasks for cross-Region and cross-account use cases.

Customers can benefit from easily migrating data between S3 buckets without managing additional infrastructure, saving operational time, and reducing the complexity of moving data at any scale. Try migrating objects between your own Amazon S3 buckets using AWS DataSync, today.