AWS Storage Blog

Cross-account bulk transfer of files using Amazon S3 Batch Operations

As customers scale their business on AWS, they can have millions to billions of objects in their Amazon S3 buckets. Customers often run operations on a large number of these objects in their buckets, including copying objects across accounts, encrypting objects, or tagging. Running operations on a large number of objects in S3 involves listing all files and running the operation on each object. This can get complicated and time consuming as the number of objects scales up. Amazon S3 Batch Operations can be used to perform actions across billions of objects and petabytes of data with a single request.

Many organizations use Amazon S3 for numerous storage needs including data lakes, backup, archive, websites, and analytics. Large enterprises have multiple AWS accounts across departments with each department having their own S3 buckets. Customers also work with vendors and external entities that use Amazon S3. Sometimes, customers must transfer a large number of objects across AWS accounts. This includes transfer of objects to S3 buckets owned by other departments, vendors, or external organizations running on AWS. They can use S3 Batch Operations to build a simple repeatable process to transfer objects across AWS accounts. While tools and scripts exist to do this work, each one requires some development work to set up. S3 Batch Operations gives you a simple solution for copying large number of objects across accounts. Customers can also create, monitor, and manage their batch operations jobs using the S3 AWS CLI, the S3 console, or the S3 APIs.

In this post, I demonstrate two ways to copy objects in bulk from a bucket in a source AWS account to a bucket in a destination AWS account using S3 Batch Operations:

  • Option 1: Using an S3 inventory report delivered to the destination account to copy objects across AWS accounts
  • Option 2: Using a CSV manifest file stored in the source account to copy objects across AWS accounts

Prerequisites:

To follow along with the process outlined in this post, you need a source AWS account and a destination AWS account. The source account needs at least one Amazon S3 bucket to hold objects that must be transferred. In addition, the destination account must have at least one bucket to store the S3 inventory report or CSV manifest file. The process to create and store an S3 inventory report or a CSV manifest file of a source bucket in a destination bucket is shared later in the post.

Option 1:

Using an Amazon S3 Inventory report delivered to the destination account to copy objects across AWS accounts:

You can use Amazon S3 inventory to deliver the inventory report of the source account bucket to a bucket in the destination account. This inventory report is used in the destination account during batch job creation. Amazon S3 inventory generates inventories of the objects in a bucket. The resulting list is published to an output file. The bucket that is inventoried is called the source bucket, and the bucket where the inventory report file is stored is called the destination bucket. Amazon S3 inventory provides comma-separated values (CSV) and Apache optimized row columnar (ORC) or Apache Parquet (Parquet) output files that list the objects and their corresponding metadata on a daily or weekly basis.

Amazon S3 inventory can generate a list of 100 million objects for only $0.25 in the N. Virginia Region, making it a very affordable option for creating a bucket inventory. If you already have an S3 inventory report for this bucket, you can skip the following section that contains the steps to generate an S3 inventory report.

Generate a S3 Inventory report for the bucket in the source account:

  1. In the Amazon S3 console on the source AWS account, select the bucket with objects you want to copy to the destination AWS account.
  2. On the Management tab, choose Inventory, Add New.
  3. For Output format, select CSV and complete the other optional fields. You might also set the Daily frequency for report deliveries, as doing so delivers the first report to your bucket sooner.
  4. When you enter information for the destination bucket, choose buckets in another account. Then enter the name of the destination inventory report bucket. Optionally, you can enter the account ID of the destination account. Click Save.
  5. S3 can take up to 48 hours to deliver the first report, so check back when the first report arrives. If you want an automated notification sent when the first report is delivered, implement the solution from the documentation on knowing when an inventory is complete.

Add Bucket policy to bucket in the destination account:

  1. After the inventory configuration is saved, the console displays a message that says the following:

Amazon S3 could not create a bucket policy on the destination bucket. Ask the destination bucket owner to add the following bucket policy to allow Amazon S3 to place data in that bucket.

  1. The console displays a bucket policy that you can use for the destination bucket. Copy the destination bucket policy that appears on the console.
  2. In the destination account, add the copied bucket policy to the destination inventory report bucket where the inventory report will be stored.

Create batch operations role in destination account and update source bucket policy:

  1. Create a role in the destination account with the name “BatchOperationsDestinationRoleCOPY”. Choose the Amazon S3 service, and then choose the S3 bucket Batch Operations use case, which applies the trust policy to the role in the destination account.
  2. Then choose Create policy to attach the following policy to the role.
{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowBatchOperationsDestinationObjectCOPY",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:PutObjectVersionAcl",
                    "s3:PutObjectAcl",
                    "s3:PutObjectVersionTagging",
                    "s3:PutObjectTagging",
                    "s3:GetObject",
                    "s3:GetObjectVersion",
                    "s3:GetObjectAcl",
                    "s3:GetObjectTagging",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectVersionTagging"
                ],
                "Resource": [
                    "arn:aws:s3:::ObjectDestinationBucket/*",
                    "arn:aws:s3:::ObjectSourceBucket/*",
                    "arn:aws:s3:::ObjectDestinationInventoryReportBucket/*"
                ]
            }
        ]
}

The role uses the policy to grant batchoperations.s3.amazonaws.com permission to read the inventory report in the destination bucket. It also grants permissions to GET objects, access control lists (ACLs), tags, and versions in the source object bucket. Lastly, it grants permissions to PUT objects, ACLs, tags, and versions into the destination object bucket.

  1. In the source account, create a bucket policy for the source bucket that grants the role that you created in the previous step to GET objects, ACLs, tags, and versions in the source bucket. This step allows Amazon S3 Batch Operations to get objects from the source bucket through the trusted role. The following is an example of the bucket policy for the source account:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBatchOperationsSourceObjectCOPY",
            "Effect": "Allow",
            "Principal": {
          "AWS": "arn:aws:iam::DestinationAccountNumber:role/BatchOperationsDestinationRoleCOPY"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectTagging",
                "s3:GetObjectVersionAcl",
                "s3:GetObjectVersionTagging"
            ],
            "Resource": "arn:aws:s3:::ObjectSourceBucket/*"
        }
    ]
}
  1. After the inventory report is available, create an Amazon S3 Batch Operations PUT object copy job in the destination account, choosing the inventory report from the destination inventory report bucket. What follows is the procedure to create an S3 Batch Operations PUT object copy job.

Setting up and running your Amazon S3 Batch Operations job in the destination account using the S3 console:

The following steps provide a tutorial of the procedure to create an Amazon S3 Batch Operations job in the destination account. The batch job reads the source bucket S3 inventory report and copies objects from the source bucket to the destination bucket specified in the batch job.

Create Batch Operations job in the destination account:

  1. In the Amazon S3 console, choose Batch Operations on the left tab under Buckets.
  2. Choose Create Job.
  3. Choose the appropriate Region for your S3 bucket. Under Manifest, select S3 inventory report (manifest.json) as the manifest format.
  4. Under Manifest object, enter the path to the bucket in the destination account where the inventory report is stored. To enter the path of the bucket, select the Browse S3 button, then navigate to and select the manifest.json file.
  5. Choose Next.

Creating your Amazon S3 Batch Operations job in the S3 console and filling out the Region, Mainfest, Manifest objects, and optional info

  1. Under Operation type, choose Copy.
  2. Under Copy destination, enter the path to the bucket in the destination account where you want to copy the objects. To enter the path of the bucket you should select the Browse S3 button, navigate to destination bucket where you want to transfer the objects. Select other options such as Storage Class or Encryption as necessary.

Under Choose operation, select copy, and under Copy destination, enter the path to the bucket in the destination account wher eyou want to copy the objects.

  1. Choose Next.
  2. Under Completion report, enter the path of the bucket in the destination account where you would like to store the batch job report. To enter the path of the bucket you should select the Browse S3 button, navigate to bucket where you want to store the completion report. This bucket.
  3. Under Permissions, select the IAM Role created earlier (BatchOperationsDestinationObjectCOPY). Choose Next.

Under Completion report, enter the path of the bucket in the dest. account where you want to store the batch job report.

  1. Review and verify your job parameters before choosing Create Job.

Run Batch Operations job in the destination account:

  1. After Amazon S3 finishes reading the S3 inventory report, it moves the job to the “Awaiting your confirmation to run” status. For batch jobs that contain a large number of objects, pre-processing can take a long time. In such cases, the job remains in the “Preparing” status for a longer time. During pre-processing, you can see progress of the job going up the entire time (tasks succeeded and tasks failed) it is in the “Preparing” status. Once the job moves to the “Awaiting your confirmation to run” status, you can check the number of objects in the inventory report and choose Confirm the job to run it.
  2. To run the job, select the job that was created and choose Run Job. This takes you to the detailed page for the job. Verify all the information and scroll down to the bottom and choose Run Job to start execution.
  3. Once the Amazon S3 job status changes to “Completed,” you can navigate to the destination bucket to view the copied objects.

Option 2:

Using a CSV manifest stored in the source account to copy objects across AWS accounts

The following procedure shows how to set up permissions when using an Amazon S3 Batch Operations job to copy objects from a source account to a destination account with the CSV manifest file stored in the source account.

A manifest file is an Amazon S3 object that lists object keys that you want Amazon S3 to act upon. Each row in the file must include the bucket name, object key, and optionally, the object version. The following is an example manifest CSV file without version IDs:

An example of a manifest CSV file without version IDs (1)

Create a batch operations role in the destination account:

  1. Create a role in the destination account with the name “BatchOperationsDestinationRoleCOPY”. Choose the Amazon S3 service, and then choose the S3 bucket batch operations use case, which applies the trust policy to the role in the destination account.
  2. Then choose Create policy to attach the following policy to the role.
{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowBatchOperationsDestinationObjectCOPY",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:PutObjectVersionAcl",
                    "s3:PutObjectAcl",
                    "s3:PutObjectVersionTagging",
                    "s3:PutObjectTagging",
                    "s3:GetObject",
                    "s3:GetObjectVersion",
                    "s3:GetObjectAcl",
                    "s3:GetObjectTagging",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectVersionTagging"
                ],
                "Resource": [
                    "arn:aws:s3:::ObjectDestinationBucket/*",
                    "arn:aws:s3:::ObjectSourceBucket/*",
                    "arn:aws:s3:::ObjectSourceMainfestBucket/*"
                ]
            }
        ]
}

The role uses the policy to grant batchoperations.s3.amazonaws.com permission to read the inventory report in the destination bucket. It also grants permissions to GET objects, access control lists (ACLs), tags, and versions in the source object bucket. And it grants permissions to PUT objects, ACLs, tags, and versions into the destination object bucket.

  1. In the preceding policy, the arn:aws:s3:::ObjectSourceMainfestBucket/* item is the bucket in the source account containing the CSV manifest file named “manifest.csv.”

Create a bucket policy in the source bucket to allow access to the batch operations role and console user in the destination account:

  1. In the source account, create a bucket policy for the bucket that contains the “manifest.csv.” Grant the role created in steps 1 and 2 to GET objects and versions in the source manifest bucket. This step allows Amazon S3 Batch Operations to read the manifest using the trusted role. Apply the bucket policy to the bucket that contains the manifest. The following is an example of the bucket policy to apply to the source manifest bucket:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBatchOperationsSourceManfiestRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                   "arn:aws:iam::DestinationAccountNumber:user/ConsoleUserCreatingJob",
                "arn:aws:iam::DestinationAccountNumber:role/BatchOperationsDestinationRoleCOPY"
                ]
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::ObjectSourceManifestBucket/*"
        }
    ]
}

This policy also grants permissions to allow a console user who is creating a job in the destination account the same permissions in the source manifest bucket through the same bucket policy.

  1. In the source account, create a bucket policy for the source bucket that grants the role you created (BatchOperationsDestinationRoleCOPY) in step 1 to GET objects, ACLs, tags, and versions in the source object bucket. Amazon S3 Batch Operations can then get objects from the source bucket through the trusted role. The following is an example of the bucket policy for the bucket that contains the source objects:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBatchOperationsSourceObjectCOPY",
            "Effect": "Allow",
            "Principal": {
      "AWS": "arn:aws:iam::DestinationAccountNumber:role/BatchOperationsDestinationRoleCOPY"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectTagging",
                "s3:GetObjectVersionAcl",
                "s3:GetObjectVersionTagging"
            ],
            "Resource": "arn:aws:s3:::ObjectSourceBucket/*"
        }
    ]
}
  1. Create an S3 Batch Operations job in the destination account by following the steps in the section where I covered setting up and running your Amazon S3 Batch Operations job. When specifying the Manifest object while creating the batch job, enter the path to the bucket in the source account where the manifest.csv file is stored.

Cleaning Up

Once you have completed the examples, you may want to delete example resources to avoid incurring unwanted/unexpected future usage costs. Following are the resources you may want to clean up after completing the example:

  • S3 Inventory report: To delete the S3 inventory report, you can navigate to Inventory in the Management tab and either delete the inventory creation or reduce the frequency from daily to weekly.
  • Source and Destination buckets: To remove the source and destination account buckets and objects used in this example, you can first empty the buckets by following the procedure here. Once the buckets are empty, you can delete the buckets by following the procedure here.
  • Batch Operations jobs: All batch operations jobs are automatically deleted 90 days after they finish or fail.

Conclusion

This post reviewed two methods for setting up Amazon S3 Batch Operations to copy a large number of objects across AWS accounts in bulk using an Amazon S3 inventory report or a CSV manifest file. Running operations on a large number of objects in S3 can get complicated and time consuming as the number of objects scales up. The purpose of this post was to show you an example of using S3 Batch Operations to easily run operations such as copying on a very large number of objects. This eliminates writing scripts or custom functions that list objects or running operations on each object, which can get time consuming and complex.

As organizations scale up, they have to perform operations on a large number of objects in buckets in bulk. This is where AWS S3 Batch Operations is helpful as AWS manages the scheduling and management of your job. The example of copying objects in bulk helps business easily automate transfer of large number objects across AWS accounts between their internal departments and external vendors without the need to create a long running custom job on the client-side. In addition to copying objects in bulk, you can use S3 Batch operations to perform custom operations on objects by triggering a Lambda function. You can replace tags, lock objects, replace access control lists (ACL) and restore archived files from S3 Glacier, for many objects at once, quite easily.

Thanks for reading this blog post! If you have any comments or questions, please don’t hesitate to leave them in the comments section.