Save 20% on storage costs for replicated data in multi-region applications
Customers are increasingly building multi-region applications where they keep multiple copies of their data in geographically isolated locations for reduced latency, compliance, security, disaster recovery, and other use cases. For example, one common use case is for compliance. While Amazon S3 stores your data across multiple physically separated AWS Availability Zones (AZs) in the same geographic area by default, compliance requirements might dictate that you store data in a different geographic location hundreds of miles apart. This can be achieved by storing a copy of backup data in a different AWS Region than production data, and still deliver immediate access to the copies when needed. Since primary data is already stored in another Region consisting of multiple, isolated, and physically separate Availability Zones within a geographic area, backup or compliance copies stored in a secondary Region often do not require Multi-AZ availability or resiliency. Customers can store these copies in a single Availability Zone in a secondary Region with S3 One Zone-Infrequent Access (S3 One Zone-IA), saving up to 20% on storage costs while meeting their latency, resiliency or geographic compliance requirements.
You can build these multi-region applications by replicating data across Regions with S3 Cross-Region Replication (CRR). S3 CRR creates one or more copies of your data in more than one Region. If you want to store data in a single Availability Zone in a secondary Region, S3 One Zone-IA provides a 99% availability SLA and 99.99999999% (11 9s) data durability within a single Availability Zone, and delivers storage cost savings of 20% compared to S3 Standard-Infrequent Access (S3 Standard-IA). You can store your replicated data in S3 One Zone-IA and have milliseconds access to your data when it must be accessed, while achieving your resiliency or geographic compliance requirements.
In this post, we start by covering an example for how you can use S3 One Zone-IA on a multi-region photo application to optimize storage costs. In this example, we review the architecture, performance characteristics, and S3 Replication capabilities. Then, we discuss alternative architectures to the photo application for multi-region applications, where you can choose and mix different storage classes for your primary and secondary Regions. Finally, we show you how to set up S3 Cross-Region Replication (S3 CRR) to store your copies in S3 One Zone-IA in a secondary Region.
Using S3 One Zone-IA for multi-region applications
In this example, we review a fictional customer, PictureCo, with a photo application where millions of users across the world upload and share their pictures between them. They require that production data be stored one Region, with a backup of all data in a second Region. This provides a way to restore data in the event of a disaster. Because end users of the application expect immediate access to their photos, PictureCo needs millisecond latency for their backup copies. To optimize costs, they can mix and match between different storage classes with different availability and costs characteristics. Below is a sample architecture PictureCo built to store their photos.
In this architecture, PictureCo stores data in bucket A in their primary Region, US East (Northern Virginia), and uses S3 CRR to replicate data into bucket B in the secondary Region, US West (Northern California). Photos are uploaded by end users into the primary Region and stored in S3 Intelligent-Tiering because the data often has unpredictable access patterns. Automatically, all data stored in the primary Region is replicated into the secondary Region using S3 CRR. In the secondary Region, data is stored in S3 One Zone-IA because backup copies are rarely accessed, up to one or two times per quarter, but if the data is needed, PictureCo expects immediate access.
By storing photos in S3 One Zone-IA in the secondary Region, PictureCo saved 20% on storage costs for replicated data compared to S3 Standard-IA. S3 One Zone-IA is designed for 99.5% availability and has 99.99999999% (11 9s) durability within a single Availability Zone. While S3 Standard-IA redundantly stores data across multiple Availability Zones, S3 One Zone-IA is cost-optimized by redundantly storing data within a single Availability Zone. Because these photos are already replicated within multiple Availability Zones in the primary Region, PictureCo does not require that backups are stored in multiple Availability Zones in the secondary Region. An Availability Zone is a collection of one or more discrete data centers each with redundant power, cooling, and physical security housed in separate facilities. Amazon S3 maintains redundancy even within one of the facilities in a single Availability Zone. For example, Amazon S3 replicates data across multiple disks, so even if one of them fails, PictureCo can still access their data with no downtime.
“Because these photos are already replicated within multiple Availability Zones in the primary Region, PictureCo does not require to store backups within multiple Availability Zones in the secondary Region. An Availability Zone is a collection of one or more discrete data centers each with redundant power, cooling, and physical security housed in separate facilities.”
By using S3 CRR, PictureCo achieves the most flexibility and functionality for replicating data. They can choose any storage class in the source Region and any Region as the destination. With S3 CRR, they also have the controls to protect their data and meet backup and compliance requirements. Since S3 Versioning is required to use S3 Replication, they keep multiple versions of an object for both copies. They also have the flexibility to store copies under a different AWS account and select a Region that meets physical distance requirements. With S3 Replication Time Control, metrics, and Event Notifications, PictureCo can replicate their data within 15 minutes and monitor replication progress. Additionally, they can obtain predictable, SLA-backed replication times for their backups, and meet internal business compliance requirements. Then, they can use S3 Replication metrics to monitor minute-by-minute progress of replication or use S3 Replication Event Notifications to receive replication failure events to assist in troubleshooting any configuration. To use S3 CRR, PictureCo pays applicable retrieval charges on the source data, replication PUT requests, and inter-Region data transfer OUT from S3 to the destination Region. When they use S3 Replication Time Control, they also pay a Replication Time Control data transfer fee and S3 Replication metrics charges that are billed at the same rate as Amazon CloudWatch custom metrics.
You might have a use case with a different requirement than the one discussed above. There are a couple of alternative architectures you can use. For backup use cases that do not require immediate access to data and where you want the lowest storage cost in the cloud, the S3 Glacier and S3 Glacier Deep Archive storage classes are a good choice to optimize costs, while storing data in multiple Availability Zones. For example, with S3 Glacier Deep Archive you can store data at a storage cost 91% lower than with S3 One Zone-IA, but retrieve the data within hours instead of milliseconds. In this use case, you should also be mindful that with S3 Glacier storage classes, the costs to retrieve and access the data are higher than with S3 One Zone-IA. Typically, object size, object lifetime, and frequency of access are characteristics you want to consider when deciding between S3 One Zone-IA or S3 Glacier storage classes to optimize costs. If you require immediate access and multiple Availability Zones for copies, you can choose to store data in S3 Standard or S3 Standard-IA, and in S3 Intelligent-Tiering if your data has unknown or changing access patterns. For example, you can choose S3 Standard in your secondary Region to mirror your primary Region if you require an active-active disaster recovery (DR) architecture where your workload is deployed to, and actively serving traffic from, multiple Region. For more information on how to architect multi-region applications in AWS look at this blog post and this CloudFormation template.
Setting up S3 Cross-Region Replication to use S3 One Zone-IA for your copies in a secondary Region
S3 Replication is simple to set up. You can start by adding a replication configuration to the source bucket in the primary Region. To replicate data to another Region, you can select the bucket you want to replicate data to in your target Region.
Finally, you can click on S3 One Zone-IA for your destination storage class.
After you create or update the replication rule, S3 Replication begins as soon as you add or update objects to your source bucket. For more details on how to set up S3 CRR, check out this other blog post.
In this post, we covered how you can save on storage costs for replicated data by using S3 One Zone-IA and S3 CRR for multi-region applications, where you can store copies in a Single-AZ while achieving your resilience or compliance requirements. In the example, we showed the architecture of the application, the performance, and storage cost savings that S3 One Zone-IA can provide, and the functionality and flexibility of S3 Replication.
The combination of S3 One Zone-IA and S3 CRR is ideal for multi-region applications such as backups or where a secondary copy is needed for geographic compliance requirements. For these use cases, you can store copies in a single-AZ while achieving your resilience or compliance requirements, saving on your storage costs using S3 One Zone-IA. Meanwhile, S3 CRR gives you the flexibility to mix and match multiple storage classes and Regions, so your data is optimized for costs and stored in the Regions of your choice to meet resiliency or industry compliance requirements. With S3 CRR, you also get the controls to protect your data and the functionality to monitor replication status and replicate data with predictable times.