Designing a resilient and cost-effective backup strategy for Amazon S3

Many organizations are protecting important business data against disasters like fires, floods or ransomware events. Proper backup and disaster recovery strategies can help safeguard critical data and ensure business continuity in a disaster scenario. Maintaining normal operations in a disaster recovery situation can save time and money.

AWS services like Amazon S3 and AWS Backup can help organizations protect their data from these data loss events. Amazon S3 is a foundational service in AWS that provides flexible, scalable and durable object storage in the cloud. Beyond the built-in resilience in Amazon S3 through the use of AWS global infrastructure, the service also offers several features to help support your data resiliency and backup needs.

In this post, we describe how to design your backup and restore strategies for your data in Amazon S3 using AWS Backup for Amazon S3 and Amazon S3 Cross-Region Replication (CRR). We walk you through use cases and differences between the two methods. Finally, we wrap up with performance and cost best practices on this topic.

Backup and replication design patterns for Amazon S3

AWS Backup for Amazon S3 enables you to copy your Amazon S3 backups across AWS Regions and AWS accounts. With backups of Amazon S3 in multiple AWS Regions, you can maintain separate, protected copies of your backup data to help meet resilience and compliance requirements for data protection and disaster recovery. Amazon S3 CRR enables automatic replication of objects across Amazon S3 buckets. You can replicate to a different Amazon S3 storage class and/or fail over to a bucket in another AWS Region. Amazon S3 Same-Region Replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS Region. It helps with single bucket log aggregation and live replication between different environments.

The logical place to start would be to understand the failure modes you are looking to recover from. For example, are you looking to protect your data in Amazon S3 from region-level failures or ransomware attacks? Next, understand what is considered acceptable recovery for your business. Do you recover all the data in Amazon S3, or a subset? What is the desired recovery time objective (RTO) and recovery point objective (RPO) for your data in Amazon S3?

Both AWS Backup for Amazon S3 and S3 CRR provide recovery solutions in case of accidental deletions or data corruption. S3 CRR asynchronously copies objects across Amazon S3 buckets in different AWS Regions while AWS Backup for Amazon S3 provides a single-click restore experience for Amazon S3 in a cost-effective way. Customers can also use S3 Replication Time Control (S3 RTC) to replicate data within the same AWS Region or between different regions within a predictable period. It replicates most objects uploaded to Amazon S3 within seconds, and 99.99% of objects within 15 minutes. Please refer to prerequisites for replication and replicating encrypted objects for more details.

The following table gives a quick comparison between the two approaches:

If your users are in two geographic locations, then you can minimize latency in accessing objects by maintaining object copies in AWS Regions that are geographically closer to your users.

	AWS Backup for Amazon S3	Amazon S3 CRR
Per bucket limits	Approximately 30 billion objects per bucket.	No limits on the number of the objects.
Traceability	Built-in controls that allow you to track backup and restore operations and generate auditor-ready reports to demonstrate backup compliance to auditor.	To get detailed metrics for S3 Replication, including replication rule count metrics, you can use Amazon S3 Storage Lens. S3 Storage Lens is a cloud-storage analytics feature that you can use to gain organization-wide visibility into object-storage usage and activity. Additionally, there are four Amazon CloudWatch metrics give good visibility into the ongoing replication process.
Point-in-time recovery	You can restore your Amazon S3 objects to a specific state from a previous backup.	You can combine Amazon S3 Versioning with Amazon EventBridge to build a solution for a near real-time event stream from Amazon S3 for point-in-time restore process at the bucket level.
Central management	Automated solution to centrally configure backup policies, helping you simplify backup lifecycle management, and making it easy to make sure that your application data across AWS services (such as Amazon S3) is centrally backed up.	This is not an available feature today.
Retaining metadata	For periodic backups, AWS Backup makes a best effort to track all changes to your object metadata. However, if you update a tag or ACL multiple times within one minute, AWS Backup might not capture all intermediate states.	You can use replication to make copies of your objects that retain all metadata, such as the original object creation times and version IDs. This capability is important if you must make sure that your replica is identical to the source object. Also if you enable S3 replica modification sync, it allows metadata changes made to replica objects to be captured and replicated back to the original source object. This makes replication bidirectional.
Latency	Designed to protect your application data but not to improve the performance of the application.	If your users are in two geographic locations, then you can minimize latency in accessing objects by maintaining object copies in AWS Regions that are geographically closer to your users.

Table 1: Comparison of AWS Backup for Amazon S3 and Amazon S3 CRR

AWS Backup for Amazon S3 has three modes of operation:

Snapshot Backups: Snapshot Backups scans the entire bucket and performs GET requests on every object, thereby incurring Amazon S3 Get costs. The backups are incremental with infinite retention, making sure of point-in-time consistency. However, Subsequent backups may take longer due to full bucket scans, especially for large buckets.
Continuous backups: After an initial backup of existing data, any subsequent changes to the data are continuously recorded and can be used for point-in-time restores within a retention period of up to 35 days. For example, if the initial backup takes 10 days to complete, then the system also tracks changes made during this initial period and applies them to the backup.
Continuous and Snapshot Backups (Combined): Combine Continuous and Snapshot backups for longer retention periods beyond 35 days. Snapshots are taken from Continuous backups, eliminating extra requests, thereby rescanning and reducing costs. Data lineage is shared if using the same vault, thus avoiding duplicate storage between snapshots and continuous recovery points.

Backup and restore

Effective planning upfront of your backup and restore strategy is essential. Make sure you review your service limits. If you need to raise the limits to accommodate your Amazon S3 usage during restore, then you can raise a support case with AWS. Have you determined what data is critical to your business? Restoring that data set, identified through prefix, should be the first restore task that you plan. The time taken to backup and restore can be different, since Amazon S3 actively shards the data as it is restored.

One of the features of AWS Backup for Amazon S3 is the vault that simplifies data protection with no management overhead. AWS Backup logically air-gapped vault (currently in preview) stores immutable backup copies that are locked by default, and isolated with encryption using AWS owned keys. Vault Lock can be enabled for added security to prevent backups from being deleted early. Note that the initial backup must be done in the same AWS Regions and in the same account as the source bucket. Once the initial backup is done, you can schedule a copy job to copy data to a bunker vault in other AWS Regions for isolation purposes and AWS Region level disasters. You need two copies of your data at minimum to protect against AWS Region level failures. Backup data must be restored first before the Amazon S3 client can actually access it. Therefore, AWS Backup does not have the same capability as CRR point Amazon S3 client to replica copy. As of today, AWS backup supports a warm storage option. Although it is possible to have multiple versions and archives of the Amazon S3 data, without the point-in-time recovery feature of AWS Backup for Amazon S3, the process of reproducing the bucket to achieve the desired point-in-time can be operationally time consuming.

Amazon S3 CRR allows you to replicate S3 objects at the bucket, prefix and object level using object tags. It supports object versioning and failover to a new bucket in the same or different AWS Region. You can restore objects from the replica bucket using Amazon S3 Batch Replication. If you need a more aggressive recovery time objective (RTO), the replica can be configured in an active-standby mode. This allows clients to be redirected to the replica in the event of primary bucket failures. For recovery point objectives (RPO), you can use S3 Replication Time Control (S3 RTC) which replicates most objects within seconds of being uploaded and 99.99% within 15 minutes. If you want to achieve point-in-time recovery, then you need to build a subset of objects list through Amazon S3 inventory to be fed into a batch operation. This can be complex and is not out-of-the-box, as compared to AWS Backup for Amazon S3.

Cost and performance considerations

As you choose your backup plan, you should consider additional charges, such as storage, API costs, Amazon EventBridge, Amazon S3 Versioning, and S3 GET requests on your Amazon S3 objects. For more information about the related costs, review the AWS Backup Pricing. For AWS Backup for Amazon S3, you should track and delete expired recovery points in your backup vault and partial backups, which add to your storage costs even though they are not used. When using Snapshot mode, you incur additional charges for GETs and EventBridge events, as well as encryption charges if using CMK. Moreover, set your S3 Lifecycle policies to enable the “delete expired object delete markers”; this helps reduce costs and improve performance. When using Amazon S3 CRR, consider leveraging the appropriate Amazon S3 storage tier in the replica.

For immediate access and redundancy, choose Amazon S3 Standard, Amazon S3 Standard-IA, or Amazon S3 Intelligent-Tiering, with Amazon S3 Standard ideal for active-active disaster recovery. For cost-optimized backups without immediate access, use Amazon Simple Storage Service Glacier (Amazon S3 Glacier) and S3 Glacier Deep Archive. To save costs, Amazon S3 One Zone-Infrequent Access (Amazon S3 One Zone-IA) is used in a secondary AWS Region, offering high availability and durability. Combining Amazon S3 One Zone-IA and Amazon S3 CRR makes sure of resilience, compliance, and cost savings. Customers should be aware of additional charges that may apply for services used when performing replication, such as replication PUT requests on the destination account, S3 retrieval charges, data transfer inter-Region (DITR) charges (applicable for S3 CRR only), S3 Replication Time Control (S3 RTC) Premium charges, and CloudWatch charges if metrics collection is enabled for replication monitoring. Refer to our pricing page and FAQs for full details on cross account data replication. Please refer to Replicate Existing Objects with Amazon S3 Batch Replication, Amazon S3 Storage Lens and Analyzing API operation on Amazon S3 for more information on possible solutions around Amazon S3 cost optimization.

Performance considerations are applicable both at the backup and restore stages. For Amazon S3 copy jobs, in AWS Backup for Amazon S3, performance depends on concurrent jobs, bucket size, object count, and backup type. Adjust retention periods during the first copy job until completed; and subsequent increments are faster. For better performance, replace old AWS Identity and Access Management (IAM) policies with AWS-managed ones, and implement combined continuous and snapshot backup rules. These measures lead to faster performance, reduced full bucket rescans, and cost benefits with AWS Backup for Amazon S3. S3 CRR can take anywhere from a few seconds up to several hours depending on the object size, quantity. But if we enable S3 RTC, we replicate objects 99.9% within 15 minutes. Filters such as Prefix or Tags optimize replication by focusing on specific object subsets. Amazon S3 supports high request rates, such as 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix. With Amazon S3 CRR, parallelization with prefixes improves performance.

Cleaning up

Please do not forget to cleanup your resources such as deleting restored AWS resource, backup plan, recovery points or backup vaults to avoid future charges.

Conclusion

In this post, we reviewed how AWS Backup for Amazon S3 and Amazon S3 CRR can help design backup and restore strategies for your data in Amazon S3. Implementing these services can help protect your organization’s data against loss or corruption while meeting your performance and compliance needs in a cost-effective manner. If you are interested in exploring further, then see this dev guide for AWS Backup for Amazon S3, and this feature guide for Amazon S3 Cross-Region Replication.

Let us know if you have any other questions! We’re also interested to hear your own experiences implementing backup and replication strategies. Please leave a comment below to continue the discussion.