Amazon S3 cost optimization for predictable and dynamic access patterns

Many customers of all sizes and industries tell us they are growing at an unprecedented scale – both their business and their data. Amazon S3 enables any developer to leverage Amazon’s own benefits at a massive scale with no up-front investment or performance compromises. Customers love the elasticity and agility they get with S3 because they can focus on creating entirely new applications and experiences for their customers to drive business growth. And with that growth, cost control and cost optimization are essential. Customers want to know what the best approach is to optimizing their S3 costs without impacting application performance or adding operational overhead. Every imaginable workload runs on top of S3, from data lakes, machine learning, satellite imagery, DNA sequencing, to call center logs, autonomous vehicle simulations, and even your favorite media and in-home fitness content. At the same time, the fundamentals to optimizing your S3 storage costs across your entire organization are not hard to implement once you learn them. Whether you have raw data that must be moved, analyzed, or archived – S3 has a storage class that is optimized for your data access, performance, and cost requirements. All S3 storage classes are designed for 99.99999999% (11 9’s) of durability and the highest availability.

Our goal with this blog post is that you walk away with an understanding of how to control your storage costs for workloads that have predictable and changing access patterns, and how to take action to implement changes to realize storage costs savings.

Optimizing workloads with predictable access patterns

Do you have data that becomes infrequently accessed after a definite period of time? Take, for example, user-generated content like social media apps. We share videos and pictures with our network that gets frequently accessed right after we upload it but becomes infrequently accessed a few weeks later or even a few days/hours. For use cases like these, many customers know when data becomes infrequently accessed or can usually pinpoint the right time they should move data from S3 Standard to a lower-cost storage class optimized for infrequent or archive access.

Many customers that have predictable access patterns get started with S3 Storage Lens to gain a detailed understanding of their usage for all of their buckets within an account. If you have enabled S3 Storage Lens advanced metrics, you have access to activity metrics to identify datasets (buckets) that are frequently, infrequently, or rarely accessed. There are metrics like GET requests and download bytes that indicate how often your datasets are accessed each day. You can trend this data over several months (extended data retention is available with advanced tier) to understand the consistency of the access pattern and to spot datasets that become infrequently accessed.

Once you know when a dataset becomes infrequently accessed or can be archived, you can easily configure an Amazon S3 Lifecycle rule to automatically transition objects stored in the S3 Standard storage class to the S3 Standard-Infrequent Access, S3 One Zone-Infrequent Access, and/or S3 Glacier storage classes based on the age of the data. You can set up and manage Lifecycle policies in the AWS Management Console, S3 REST API, AWS SDKs, or AWS Command Line Interface (CLI). You can specify the policy at the prefix or at the bucket level. For example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after you created them, or archive objects to the S3 Glacier storage class one year after creating them.

Optimizing workloads with unknown or changing access patterns

What if you have data with unknown or changing access patterns? Customers use Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering) to store shared datasets, where data is aggregated and accessed by different applications, teams and individuals, whether for analytics, machine learning, real-time monitoring, or other data lake use cases. With these use cases, it’s common that many users within an organization will access S3 with different tools. For instance, Amazon Athena makes it easy to analyze data in S3 using standard SQL, or they may use Amazon Redshift Spectrum to run queries against petabytes of data in S3 without having to load or transform any data. The access patterns for many of these use cases are highly variable over the course of the year, and can range from little to no access to data being read multiple times in a single month, which can lead to retrieval fees if stored in S3 Standard-IA.

Customers with use cases that have changing access patterns or don’t want to manage customized tiering rules use S3 Intelligent-Tiering as their default storage class. Consider using the S3 Intelligent-Tiering storage class because it gives you a way to save money even under changing access patterns, with no performance impact, no operational overhead, and no retrieval fees. It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two optional archive access tiers designed for asynchronous access.

Once you upload or transition objects into S3 Intelligent-Tiering, they are automatically stored in the Frequent Access tier. S3 Intelligent-Tiering works by monitoring access patterns and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. You can also choose to activate one or both of the optional archive access tiers. If you activate the archive access tiers, S3 Intelligent-Tiering automatically moves objects that have not been accessed for 90 consecutive days to the Archive Access tier. After 180 consecutive days of no access, S3 Intelligent-Tiering moves objects to the Deep Archive Access tier. If the objects are accessed again later, they are moved back to the Frequent Access Tier.

With this approach, you can reduce your storage costs by up to 95% without thinking about it. S3 Intelligent-Tiering has a small monthly monitoring fee, which means that as objects get larger, the higher the savings are. We recommend that you keep objects smaller than 128KB in S3 Standard because they are not eligible for auto-tiering. Customers love S3 Intelligent-Tiering because there are no retrieval fees or unexpected increases in storage bills when access patterns change.

Storage savings scenarios

To recap, you can get the highest storage cost savings when you know or can easily identify the right time to move objects to less expensive storage classes using a combination of S3 Storage Lens and S3 Lifecycle policies. And, you can get the highest storage cost savings for data with unknown or changing access patterns using the S3 Intelligent-Tiering class. In the following, we walk through two different workloads for a bucket totaling 1 PB of storage and 100 million objects in US east (N. Virginia) Region to show you the potential storage savings from choosing the right storage class that fits your access patterns. For pricing, refer to S3 public pricing and you can also estimate the S3 storage costs for your workload using the AWS Pricing Calculator for Amazon S3.

In our first scenario, let’s say you move media assets for an online video platform to the S3 Standard-IA storage class 30 days after you created them, and that 10% of the objects are accessed once per month. Over the course of a year, this use case would achieve storage cost savings of $90,583 (or 33.3%) using S3 Standard-IA or $89,711 (or 33.0%) using S3 Intelligent-Tiering. If you know that your access patterns will not change, then you can achieve the lowest storage costs for this use-case using S3 Standard-IA. If you are not sure how predictable access patterns are, you can also achieve significant storage costs savings using S3 Intelligent-Tiering without having to worry about spikes in retrieval fees.

Predictable access patterns

Yearly costs	S3 Standard	Standard-IA	S3 Intelligent-Tiering (archive not enabled)
Storage costs	$270,999	$166,762	$178,288
PUT requests*	$500	$500	$500
GET requests	$48	$120	$48
Lifecycle requests*	NA	$1,000	NA
Data retrievals	Free	$12,582	Free
Monitoring and automation	NA	NA	$3,000
Total yearly storage cost	$271,547	$180,964	$181,836

* PUT requests and Lifecycle requests are one-time fees when uploading or transitioning objects

In our second scenario, let’s say you upload your data warehousing datasets to S3 Intelligent-Tiering. In this use case, we assume that 30% of all of the objects are accessed at least four times per month for unplanned analyses or reporting workflows across your organization. Over the course of a year, this use case would achieve storage cost savings of $67,273 (or 24.7%) using S3 Intelligent-Tiering. Use cases like these aren’t good fits for S3 Standard-IA because the retrieval fees can increase your costs to more than what you’d pay in S3 Standard. With S3 Intelligent-Tiering, you can save money even under changing access patterns, with no performance impact, no operational overheard, and no retrieval fees.

Changing access patterns

Yearly costs	S3 Standard	S3 Intelligent-Tiering (archive not enabled)	Standard-IA
Storage costs	$270,999	$200,198	$166,762
PUT requests*	$500	$500	$500
GET requests	$576	$576	$1,320
Lifecycle requests*	NA	NA	$1,000
Data retrievals	Free	Free	$138,412
Monitoring and automation	NA	$3,000	NA
Total yearly storage cost	$271,547	$204,274	$307,994

* PUT requests and Lifecycle requests are one-time fees when uploading or transitioning objects

Oftentimes, customers want to further reduce their storage costs automatically for subsets of datasets that are not accessed for long periods of time. Take, for example, historical datasets that are only occasionally used for research projects or retraining machine learning models. For these use cases, you may be okay archiving this data and waiting minutes to hours before the data is immediately accessible. If this sounds like your use case, we recommend turning on the S3 Intelligent-Tiering archive access tiers to save even more.

Yearly costs	S3 Intelligent-Tiering (archive not enabled)	S3 Intelligent-Tiering (archive enabled)
Storage costs	$200,198	$158,501
PUT requests*	$500	$500
GET requests	$576	$576
Lifecycle requests*	NA	NA
Data retrievals	Free	Free
Monitoring and automation	$3,000	$3,000
Total yearly storage cost	$204,274	$162,577

* PUT requests and Lifecycle requests are one-time fees when uploading or transitioning objects

*S3 Intelligent-Tiering (archive enabled) assumptions: 30% of objects stored in the FA Tier, 20% stored in the IA Tier, 25% stored in the Archive Access tier, 25% in the Deep Archive Access tier

Some last things to keep in mind:

Object size: You can use S3 Intelligent-Tiering for objects of any size, but objects smaller than 128 KB will be kept in the Frequent Access tier. For each object archived to the Archive Access tier or Deep Archive Access tier, Amazon S3 uses 8 KB of storage for the name of the object and other metadata (billed at S3 Standard storage rates) and 32 KB of storage for index and related metadata (billed at S3 Glacier and S3 Glacier Deep Archive storage rates). This enables you to get a real-time list of all of your S3 objects or the S3 Inventory report.
Object life: S3 Intelligent-Tiering is suited for objects with a life longer than 30 days, and all the objects that use this storage class will be billed for a minimum of 30 days.
Durability and availability: Amazon S3 Intelligent-Tiering is designed for 99.9% availability and 99.999999999% durability.
Pricing: You pay for monthly storage, request and data transfer. When using S3 Intelligent-Tiering, you pay for a small monthly per-object fee for monitoring and automation. There is no retrieval fee in S3 Intelligent-Tiering and no fee for moving data between tiers. Objects in the Frequent Access tier are billed at the same rate as S3 Standard, objects stored in the Infrequent Access tier are billed at the same rate as S3 Standard-IA, objects stored in the Archive Access tier are billed at the same rate as S3 Glacier, and objects stored in the Deep Archive access tier are billed at the same rate as S3 Glacier Deep Archive.
API and CLI access: You can use S3 Intelligent-Tiering from S3 CLI and S3 APIs with the INTELLIGENT_TIERING storage class. You can also configure the S3 Intelligent-Tiering archive using PUT, GET, and Delete configuration APIs for a specific bucket.
Feature support: S3 Intelligent-Tiering supports features like S3 Inventory to report on the access tier of objects, and S3 Replication to replicate data to any AWS Region.

Conclusion

This blog post covers how to optimize storage costs for use cases with predictable access patterns and use cases with changing access patterns. Often, we find that choosing the right storage class can help customers realize storage cost savings of up to 95%. To do this, some customers are comfortable building customized tiering S3 Lifecycle policies across different buckets and many other customers just use S3 Intelligent-Tiering because it’s the easy button for automatic storage cost savings.

What’s important is that you choose the right approach based on your data access patterns and your business needs. As you get started, consider the following properties of your objects to help you decide what the best approach is for your workload:

Decision criteria	Consider S3 Intelligent-Tiering	Consider storage classes and lifecycle management
Overhead	Willing to allocate minimal to no engineering resources to optimize storage	Willing to allocate resources to optimize storage
Access frequency	Unknown or changing access patterns such as Data Lakes, Business Intelligence, Machine Learning	Predictable access pattern such as long tail media, backups, disaster recovery
Size of objects	S3 Intelligent-Tiering charges a small tiering fee and has a minimum eligible object size of 128 KB for auto-tiering. Smaller objects may be stored but will always be charged at the Frequent Access tier rates.	S-IA and Z-IA have a minimum billable object size of 128 KB, S3 Glacier Deep Archive has a minimum billable object size of 40 KB.
Object life	Ideal for long lived objects stored longer than 30 days.	S3 Standard ideal for short lived objects deleted within 30 days. S-IA, Z-IA ideal deal for long lived objects stored longer than 30 days, S3 Glacier for 90 days, and Glacier Deep Archive for 180 Days

Thanks for reading this blog post on how to control and optimize your storage costs for workloads that have predictable and changing access patterns. If you have an increasing number of Amazon S3 buckets, spread across tens or even hundreds of accounts, you should also read the blog post “5 Ways to reduce data storage costs using Amazon S3 Storage Lens” to gain a basic understanding of how to use S3 Storage Lens to identify typical cost savings opportunities, and how to take action to implement changes to realize those cost savings.

If you have any comments or questions, please leave them in the comments section.