AWS Storage Blog

Managing delete marker replication in Amazon S3

Customers use Amazon S3 Replication to create a copy of their data within the same AWS Region or in another AWS Region for compliance, lower latency, or sharing data across accounts. In environments where data is constantly changing, customers have different replication needs for objects that have been, or will be, deleted. For some use cases, customers must delete the replicated objects; while for others, they must keep the replicated objects. In this blog, we cover the replication behavior of two configurations, V1 and V2. We also provide guidance on how to select a configuration that meets your specific compliance and governance needs.

S3 Replication overview

S3 Replication delivers elastic, fully managed, low-cost enterprise-ready replication features for any Amazon S3 storage class to protect against accidental deletion or provide data protection across different Regions. With Amazon S3 Replication, you can automatically and asynchronously replicate data between buckets in the same or different AWS Regions.

Same-Region Replication (SRR) and Cross-Region Replication (CRR) can be used to address a variety of use cases. For example, CRR helps you meet compliance requirements and minimize latency by keeping copies of your data in different geographical locations. SRR can be used to configure replication between developer and test accounts, and meet data sovereignty requirements. In either configuration, Amazon S3 replicates all objects in the source bucket to a destination bucket. Optionally, to control the objects that are replicated, a subset of objects can be replicated using prefixes and tags.

Soft delete operations and delete markers

S3 Replication requires versioning to be enabled on both the source bucket and destination bucket. For versioned buckets, when an object is deleted without specifying its version-id, the delete operation is commonly referred to as a “soft delete.” A soft delete results in a new null object version called “delete marker.”

Note that objects can also be deleted because of lifecycle expiration policies. When a current object version expires, a delete marker is added. In contrast, when a non-current object version expires, it is permanently deleted.

This brings up an interesting question: what should the replication behavior be, when an object is soft-deleted? There are two possible outcomes in this case:

  • The delete marker is replicated (V1 configuration). A subsequent GET request to the deleted object in both the source and the destination bucket does not return the object.
  • The delete marker is not replicated (V2 configuration). A subsequent GET request to the deleted object returns the object only in the destination bucket.

When you enable S3 Replication from the console, V2 configuration is enabled by default. However, if your use case requires you to delete replicated objects whenever they are deleted from the source bucket, you need the V1 configuration. Some common scenarios that are addressed by a V1 replication configuration include:

  • You must comply with standards such as GDPR. A V1 replication configuration can be used together with suitable lifecycle configurations for the source and destination buckets, to ensure that deleted objects are permanently expired.
  • The source bucket is frequently updated, and your application workflow requires the source and destination bucket to be in sync.

How to configure S3 Replication for V1 or V2:

You can configure S3 Replication using the Amazon S3 console, AWS Command Line Interface (AWS CLI), and AWS SDKs. In this example, we are configuring replication for buckets in the same account using AWS CLI. Follow the CLI instructions to set up replication. You create source and destination buckets, enable versioning on them, create an IAM role that gives Amazon S3 permission to replicate objects, and add the replication configuration to the source bucket.

For V1 replication configuration (to replicate delete marker):

{
    "Role": " IAM-role-ARN ",
    "Rules": [
        {
            "ID": "Replication V1 Rule",
            "Prefix": "",
            "Status": "Enabled",
            "Destination": {
                "Bucket": "arn:aws:s3:::<destination-bucket>"
            }
        }
    ]
}

The use of the “Prefix” field in a replication configuration indicates this is a V1 configuration. V1 replication configurations replicate delete markers by default. The “Prefix” field is not supported in V2.

You can test this replication configuration by saving the configuration as s3_replication_rule_v1.json and applying it:

$ aws s3api put-bucket-replication --bucket <sourcebucket> --replication-configuration file://s3_replication_rule_v1.json

For V2 replication configuration (this does not replicate delete markers):

{
    "Role": "IAM-role-ARN",
    "Rules": [
        {
            "ID": "Replication V2 Rule",
            "Priority": 1,
            "Filter": {},
            "Status": "Enabled",
            "Destination": {
                "Bucket": "arn:aws:s3:::<destination-bucket>"
            },
            "DeleteMarkerReplication": {
                "Status": "Disabled"
            }
        }
    ]
}

The use of the “Filter” field in a replication configuration indicates this is a V2 configuration. V1 does not have this field.

Note: you cannot omit the “DeleteMarkerReplication” field in a V2 configuration, and you cannot set it to anything other than “Disabled.”

You can test this replication configuration by saving the configuration as s3_replication_rule_v2.json and applying it:

$ aws s3api put-bucket-replication --bucket <sourcebucket> --replication-configuration file://s3_replication_rule_v2.json

You can check which replication configuration you have (if any) with this command:

aws s3api get-bucket-replication --bucket <sourcebucket>

You can use any valid replication configuration to replicate to a bucket owned by a different account. Detailed instructions can be read in this replication documentation.

The examples here address the most common use cases that require replicating every object in your source bucket. Refer to documentation for more detailed configurations that you can use to selectively replicate only objects with specified prefixes or tags. Keep in mind that certain replication features such as tag-based filtering and Replication Time Control (RTC) are only available in V2 configurations.

Conclusion

Amazon S3 provides the ability to control the behavior of deletes in a replication configuration. This flexibility allows customers to meet their disaster recovery and regulatory requirements. We have highlighted two common replication configurations, V1 and V2. V1 configuration soft deletes the replicated object in the destination bucket, while V2 configuration does not. In this blog post, we have outlined criteria that guide you in deciding which configuration to select for your use case when enabling replication.

If you have any questions or suggestions, leave your feedback in the comment section. If you need any further assistance on disaster recovery and compliance, contact your AWS account team or a trusted APN Partner.

For more information, see the AWS replication configuration page.

Vikas Shah

Vikas Shah

Vikas Shah is an Enterprise Solutions Architect at Amazon web services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.

Ganesh Sundaresan

Ganesh Sundaresan

Ganesh Sundaresan is a Senior Storage Specialist with Amazon Web Services. While based in Boston, he works with AWS customers globally to help with their enterprise data storage challenges. Outside of work, Ganesh likes to spend time with his family exploring the New England countryside.

Mike Burbey

Mike Burbey

Mike Burbey is a Senior Storage Specialist at Amazon Web Services.