AWS Storage Blog

Replicating existing objects between S3 buckets

UPDATE (8/25/2021): The walkthrough in this blog post for setting up a replication rule in the Amazon S3 console has changed to reflect the updated Amazon S3 console.


Customers commonly have business requirements or enterprise policies that call for additional copies of their existing Amazon S3 objects. While Amazon S3 Replication is widely used to replicate newly uploaded objects between S3 buckets, the simplest way of replicating large numbers of existing objects between S3 buckets is not obvious to many customers. In this post, we show you how to trigger Cross-Region Replication (CRR) for existing objects by using Amazon S3 Replication.

Amazon S3 Replication is a managed, low cost, elastic solution for copying objects from one Amazon S3 bucket to another. With Amazon S3 Replication, you can set up rules to automatically replicate S3 objects across different AWS Regions by using Amazon S3 Cross-Region Replication (CRR). Alternatively, you can set up rules to replicate objects between buckets in the same AWS Region by using Amazon S3 Same-Region Replication (SRR). For information on what is replicated and what is not replicated when using S3 Replication, take a look at our documentation.

Customers can copy existing objects to another bucket in the same or different AWS Region by contacting AWS Support to add this functionality to the source bucket. Once support for replication of existing objects has been enabled on a source bucket, customers are able to use S3 Replication for all existing objects, in addition to newly uploaded objects. Once the replication process completes, customers have two buckets containing all objects, and newly uploaded objects are replicated to the destination bucket.

Existing object replication is an extension of the existing S3 Replication feature and includes all the same functionality. This includes the ability to replicate objects while retaining metadata (such as object creation date and time), replicate objects into different storage classes, and maintain object copies under different ownership. Amazon S3 Replication Time Control (S3 RTC) can also be enabled when configuring existing object replication. However, note that S3 RTC only applies to the replication of newly uploaded objects and not existing objects.

In this blog post, we show you how to enable and configure S3 Replication for existing objects.

Configure replication for existing objects

To enable existing object replication for your account, you must contact AWS Support. This is required to ensure that replication is configured correctly. To prevent your request from being delayed, give your AWS Support case the subject “Replication for Existing Objects” and be sure to include the following information:

  • Source bucket
  • Destination bucket
  • Estimated storage volume to replicate (in terabytes)
  • Estimated storage object count to replicate

NOTE: Once the support ticket is created, AWS Support will work with the S3 team and allow list your bucket for existing object replication.

Once AWS Support has enabled support for replicating existing objects for your bucket, it is a best practice to verify your replication configuration. Before applying to a larger dataset, you can test the accuracy of the configuration, in addition to the permissions, by setting up a small test bucket/prefix. This verification step is important because there is no easy way to retrigger replication for failed objects.

Also, remember to review the requirements before enabling replication. This includes ensuring that both the source and destination buckets have versioning enabled.

Getting started replicating existing objects with S3 Replication

In this example, we are replicating the entire source bucket (s3-replication-source1) in the us-east-1 Region to the destination bucket (s3-replication-destination1) in the us-west-1 Region. We also set the destination object storage class to S3 Standard-Infrequent Access.

Replicating the entire source bucket (s3-replication-source1) in the us-east-1 Region to the destination bucket (s3-replication-destination1) in the us-west-1 Region

Once your source bucket has been allow listed, you can configure a replication rule as follows:

 Part 1: Set up a replication rule in the Amazon S3 console

Here we begin the process of creating a replication rule on the source bucket. This involves selecting which objects we would like to replicate and enabling the replication of existing objects.

1. Sign in to the AWS Management Console and open the Amazon S3 console.

2. From the buckets list, choose the source bucket that has been allow-listed (by AWS Support) for existing object replication.

3. Navigate to the Management tab of the bucket. Under Replication Rules, choose Create Replication Rule. Creating this rule also enables standard CRR or SRR on the bucket.

4. Give your replication rule a name and select whether you want the rule to be enabled or disabled when created. The rule name is required and must be unique within the bucket.

Note: If the bucket has existing replication rules, you are asked to set a priority for the rule. This is used to avoid conflicts caused by objects that are included in the scope of more than one rule. In the case of overlapping rules, Amazon S3 uses the rule priority to determine which rule to apply. The higher the number, the higher the priority.

Replication rule configuration

5. Under Source bucket, select a rule scope. In this example, we’re applying the rule to all objects in my bucket. You can also select to limit the scope of the rule by prefix or tags if desired.

Under Source bucket, select a rule scope. In this example, we’re applying the rule to all objects in my bucket.

6. In the Destination section, choose whether you are replicating objects to a bucket in the same AWS account or a different AWS account. If the destination bucket is in the same account, click Browse S3 and select your destination bucket from the list. If the destination bucket is in another account, you’ll also need to specify its 12-digit AWS account ID. You will also have the option to change object ownership to the destination bucket owner. In this example the destination bucket, s3-replication-destination1, is in the same AWS account as the source bucket.

In the Destination section, choose whether you are replicating objects to a bucket in the same AWS account or a different AWS account

7. Next, create or select an existing AWS Identity and Access Management (AWS IAM) role that Amazon S3 can assume to replicate objects on your behalf. When creating a new role with the IAM role field selected, S3 creates a new role (s3crr_role_for_<SourceBucket>_to_<DestinationBucket>) with the following permissions:

    • Get and List permission on source bucket
    • ReplicateObject, ReplicateDelete, ReplicateTags, GetObjectVersionTagging permissions on destination bucket

Note: If the destination bucket is in a different AWS account, then the owner of the destination account must grant the source bucket permissions to store the replicas. More information can be found in this documentation.

In this example, we are creating a new IAM role.

Next, create or select an existing AWS Identity and Access Management (AWS IAM) role that Amazon S3 can assume to replicate objects on your behalf

Part 2: Configure additional replication options in the Amazon S3 console

Now we are ready to specify the destination storage class options and additional replication options.

1. Select the Replicate objects encrypted with AWS KMS check box if you would like to also replicate objects encrypted with AWS KMS. You’ll then choose which AWS KMS key you’d like to use for encrypting the destination objects. Click here for more information on replicating objects encrypted with AWS KMS.

2. You can change the storage class of replicated objects by checking the Change the storage class for the replicated objects box and selecting the destination storage class. In this example, we are setting the replicated object storage class to S3 Standard-Infrequent Access as shown in the following screenshot:

Specify the destination storage class options

3. Under Additional Replication Options, we select to replicate all objects. Selecting all objects also enables standard Cross-Region Replication (CRR) or Same-Region Replication (SRR) on the bucket for new objects. Note that if you do not see the All objects option (and only see the New objects option), your source bucket has not yet been allow-listed by AWS Support.

Notes regarding the additional replication options you can enable:

When you use S3 RTC or S3 replication metrics, additional fees apply.

Additional replication options - including Replicate New objects and All objects

4. Save your rule. After you save your rule, you can edit, enable, disable, or delete your rule on the Replication rules page in the S3 console.

After you save your rule, you can edit, enable, disable, or delete your rule on the Replication Rules page in the S3 console.

Note: Unlike the replication of newly uploaded object, it takes up to 48 hours for the replication to start after the replication rule is set up for existing objects.

Set up a replication policy using AWS CLI

To configure the replication rule using AWS CLI, follow the steps listed out in the S3 documentation discussing configuration replication examples. However, when you create the replication configuration (JSON document) you must add ExistingObjectReplication and set the status value to enable. This is shown in the following example:

{
    "Role": "<IAM-Role-ARN>",
    "Rules": [
        {
            "Status": "Enabled", 
            "Filter": {}, 
            "DeleteMarkerReplication": {
                "Status": "Disabled"
            }, 
            "Destination": {
                "Account": "<Destination-Account-ID>", 
                "AccessControlTranslation": {
                    "Owner": "Destination"
                }, 
                "Bucket": "arn:aws:s3:::<Destination-Bucket-Name>", 
                "StorageClass": "<Destination-Storage-Class>"
            }, 
            "Priority": <Rule-Priority>, 
            "ExistingObjectReplication": {
                "Status": "Enabled"
            }, 
            "ID": "<Replication-Rule-Name>"
        }
    ]
}

Monitoring Replication

In order to monitor the replication status of your existing objects, configure Amazon S3 Inventory on the source bucket at least 48 hours prior to enabling the replication. For detailed instructions on setting this up, see the user guide on configuring Amazon S3 Inventory.

You can query S3 Inventory using AWS CLI as described here or by using Athena as shown in this blog post. Replication status provides information on whether the object replication is pending, completed, or failed.

Cleaning up

If you followed along with us for testing purposes, remember to delete all objects and buckets that are no longer required to avoid incurring any unnecessary costs.

Conclusion

In this blog post, we demonstrated how you can enable existing object replication for your S3 buckets. We also showed how to configure S3 Replication for existing objects to a bucket in the same or different region, or to a bucket owned by a different AWS account. This enables you to easily replicate large numbers of existing objects, which can assist you in adhering to business policies that require additional copies of your S3 objects.

Here are some additional references you may find helpful:

Thanks for reading, remember to leave a comment in the comments section if you have any questions.