AWS Storage Blog

Replicating existing objects between S3 buckets

Customers commonly have business requirements or enterprise policies that call for additional copies of their existing Amazon S3 objects. While Amazon S3 Replication is widely used to replicate newly uploaded objects between S3 buckets, the simplest way of replicating large numbers of existing objects between S3 buckets is not obvious to many customers. In this post we show you how to trigger cross-region replication for existing objects by using Amazon S3 Replication.

Amazon S3 Replication is a managed, low cost, elastic solution for copying objects from one Amazon S3 bucket to another. With Amazon S3 Replication, you can set up rules to automatically replicate S3 objects across different AWS Regions by using Amazon S3 Cross-Region Replication (CRR). Alternatively, you can set up rules to replicate objects between buckets in the same AWS Region by using Amazon S3 Same-Region Replication (SRR).

Customers can copy existing objects to another bucket in the same or different AWS Region by contacting AWS Support to add this functionality to the source bucket. Once support for replication of existing objects has been enabled on a source bucket, customers are able to use S3 Replication for all existing objects, in addition to newly uploaded objects. Once the replication process completes, customers have two buckets containing all objects, and newly uploaded objects are replicated to the destination bucket.

Existing object replication is an extension of the existing S3 Replication feature and includes all the same functionality. This includes the ability to replicate objects while retaining metadata (such as object creation date and time), replicate objects into different storage classes, and maintain object copies under different ownership. Amazon S3 Replication Time Control (S3 RTC) can also be enabled when configuring existing object replication. However, note that S3 RTC only applies to the replication of newly uploaded objects and not existing objects.

In this blog post, we show you how to enable and configure S3 Replication for existing objects.

Configure replication for existing objects

To enable existing object replication for your account, you must contact AWS Support and create a technical support case (service: Amazon S3). This is required to ensure that replication is configured correctly. To prevent your request from being delayed, give your AWS Support case the subject “Replication for Existing Objects” and be sure to include the following information:

  • Source bucket
  • Destination bucket
  • Estimated storage volume to replicate (in terabytes)
  • Estimated storage object count to replicate

NOTE: Once the support ticket is created, AWS Support will work with the S3 team and whitelist your bucket for existing object replication.

Once AWS Support has enabled support for replicating existing objects for your bucket, it is a best practice to verify your replication configuration. Before applying to a larger dataset, you can test the accuracy of the configuration, in addition to the permissions, by setting up a small test bucket/prefix. This verification step is important because there is no easy way to retrigger replication for failed objects.

Also, remember to review the requirements before enabling replication. This includes ensuring that both the source and destination buckets have versioning enabled.

Getting started replicating existing objects with S3 Replication

In this example, we are replicating the entire source bucket (s3replication-source) in the us-east-1 Region to the destination bucket (s3replication-destination) in the us-west-1 Region. We also set the destination object storage class to S3 Standard-Infrequent Access.

Replicating the entire source bucket (s3replication-source) in the us-east-1 Region to the destination bucket (s3replication-destination) in the us-west-1 Region

Once your source bucket has been whitelisted, you can configure a replication rule as follows:

 Part 1: Set up a replication rule in the S3 Management Console

Here we begin the process of creating a replication rule on the source bucket. This involves selecting which objects we would like to replicate and enabling the replication of existing objects.

  1. Sign in to the AWS Management Console and open the Amazon S3 console.
  2. In the Bucket name list, choose the source bucket that has been whitelisted for existing object replication.
  3. Navigate to the Management tab of the bucket and choose Replication. This is where you create a replication rule to migrate the existing objects.
  4. Choose Add rule. Creating this rule also enables standard CRR or SRR on the bucket.
  5. In the Replication rule wizard, under Set source, choose Entire bucket to copy the all existing objects and new objects.
    • NOTE: You can also choose Prefix or tags if your use case is to copy objects with a specific tag or prefix.
  6. To replicate existing objects, under Replication criteria, check the Replicate existing objects box, which enables S3 replication for existing objects. Note that if you do not see the Replicate existing objects box, your source bucket has not yet been whitelisted.

To replicate existing objects, under Replication criteria, check the Replicate existing objects box, which enables S3 replication for existing objects

Part 2: Set up a replication rule in the S3 Management Console

Now we are ready to specify the destination options and rule options, which include storage class, object ownership, and Replication Time Control settings. Note S3 RTC only applies to new objects and not existing objects. We also select the IAM role used for replication and priority value.

  1. In the set destination section of the wizard, select the destination bucket from the drop-down list. In this example, the destination bucket is the s3repliction-destination bucket.
  2. You can change the storage class of replicated objects by checking the Change the storage class for the replicated objects box and selecting the destination storage class. As discussed earlier, in this example, we are setting the replicated object storage class to S3 Standard-Infrequent Access as shown in the following screenshot:

Change the storage class of replicated objects by checking the Change the storage class for the replicated objects box and selecting the destination storage class

  1. Create or provide an existing AWS Identity and Access Management (AWS IAM) role that Amazon S3 can assume to replicate objects on your behalf. When creating a new role with the IAM role field selected, S3 creates a new role (s3crr_role_for_<SourceBucket>_to_<DestinationBucket>) with the following permissions:
    • Get and List permission on source bucket
    • ReplicateObject, ReplicateDelete, ReplicateTags, GetObjectVersionTagging permissions on destination bucket

NOTE: If the destination bucket is in a different AWS account, then the owner of the destination account must grant the source bucket permissions to store the replicas. More information can be found in this documentation.

  1. Under Rule name, enter a name for your rule. The name is required and must be unique within the bucket.

NOTE: If the bucket has existing replication rules, you are asked to set a priority for the rule. This is used to avoid conflicts caused by objects that are included in the scope of more than one rule. In the case of overlapping rules, Amazon S3 uses the rule priority to determine which rule to apply. The higher the number, the higher the priority.

Under Rule name, enter a name for your rule. The name is required and must be unique within the bucket.

  1. Check the configuration on the Review. If you have enabled replication for existing objects, under Source, it shows Replicate set to All Objects, as shown in the following screenshot:

Check the configuration on the Review page. If you have enabled replication for existing objects, under Source, it shows Replicate set to All Objects

  1. After you save your rule, you can edit, enable, disable, or delete your rule on the Replication page in the S3 console.

After you save your rule, you can edit, enable, disable, or delete your rule on the Replication page in S3 console.

NOTE: Unlike the replication of newly uploaded object, it takes up to 48 hours for the replication to kick off after the replication rule is set up for existing objects.

Set up a replication policy using AWS CLI

To configure the replication rule using AWS CLI, follow the steps listed out in the S3 documentation discussing configuration replication examples. However, when you create the replication configuration (JSON document) you must add ExistingObjectReplication and set the status value to enable. This is shown in the following example:

{
    "Role": "<IAM-Role-ARN>",
    "Rules": [
        {
            "Status": "Enabled", 
            "Filter": {}, 
            "DeleteMarkerReplication": {
                "Status": "Disabled"
            }, 
            "Destination": {
                "Account": "<Destination-Account-ID>", 
                "AccessControlTranslation": {
                    "Owner": "Destination"
                }, 
                "Bucket": "arn:aws:s3:::<Destination-Bucket-Name>", 
                "StorageClass": "<Destination-Storage-Class>"
            }, 
            "Priority": <Rule-Priority>, 
            "ExistingObjectReplication": {
                "Status": "Enabled"
            }, 
            "ID": "<Replication-Rule-Name>"
        }
    ]
}

Monitoring Replication

In order to monitor the replication status of your existing objects, configure Amazon S3 Inventory on the source bucket at least 48 hours prior to enabling the replication. For detailed instructions on setting this up, see the user guide on configuring Amazon S3 Inventory.

You can query S3 Inventory using AWS CLI as described here or by using Athena as shown in this blog post. Replication status provides information on whether the object replication is pending, completed, or failed.

Cleaning up

If you followed along with us for testing purposes, remember to delete all objects and buckets that are no longer required to avoid incurring any unnecessary costs.

Conclusion

In this blog post, we demonstrated how you can enable existing object replication for your S3 buckets. We also showed how to configure S3 Replication for existing objects to a bucket in the same or different region, or to a bucket owned by a different AWS account. This enables you to easily replicate large numbers of existing objects, which can assist you in adhering to business policies that require additional copies of your S3 objects.

Here are some additional references you may find helpful:

Thanks for reading, remember to leave a comment in the comments section if you have any questions.