Automate S3 Lifecycle rules at scale to transition data to S3 Intelligent-Tiering
The vast majority of data customers store on Amazon S3 has unknown or changing access patterns, such as data lakes, analytics, and new applications. With these use cases, a dataset can become infrequently and even rarely accessed at specific points in time. The problem is that customers don’t know how data access patterns will change in the future. To get the highest storage cost savings for data with unknown or changing access patterns, you should use the S3 Intelligent-Tiering storage class. At the same time, we have often seen customers accumulate petabytes of objects in the S3 Standard storage class across tens to hundreds of buckets who are now looking for an easier way to apply a single S3 Lifecycle configuration across all of their buckets to transition data from S3 Standard into S3 Intelligent-Tiering.
In this blog post, we provide you with an easy-to-use mechanism to automate the creation of S3 Lifecycle rules at scale for all the buckets in your account to automatically transition your objects from S3 Standard to the S3 Intelligent-Tiering storage class. After reading this blog post, you should walk away with a basic understanding of how to estimate your storage savings from S3 Intelligent-Tiering after accounting for S3 Lifecyle request charges, and with the right tools to immediately take cost-saving actions across all of your buckets in a few minutes. Automation can save you time, and when automating cost savings, you get back both time and funds to put toward innovation and critical workloads.
S3 Intelligent-Tiering: Recap of new cost optimizations you should know about
On November 30, 2021 at re:Invent, AWS announced the new S3 Intelligent-Tiering Archive Instant Access tier with cost savings of up to 68% for rarely accessed data that needs millisecond retrieval and high throughput performance. The Archive Instant Access tier is optimized for data that is not accessed for months at a time but, when it is needed, is available within milliseconds. S3 Intelligent-Tiering now automatically stores objects in three access tiers (Frequent, Infrequent, Archive Instant) that deliver the same performance as the S3 Standard storage class.
In addition, AWS announced (September 2, 2021) that S3 Intelligent-Tiering now automates storage cost savings for a wider range of workloads by eliminating the minimum storage duration and removing the low per-object monitoring and automation charges for objects smaller than 128 KB. Previously, S3 Intelligent-Tiering was optimized for long-lived objects stored for a minimum of 30 days and objects larger than 128 KB. What this means to you is that you can now use S3 Intelligent-Tiering for virtually any workload, independent of object size or retention period.
With these launches, we hear customer stories across every industry vertical how S3 Intelligent-Tiering is helping customers quickly optimize storage costs without needing to do any analysis or adding any development cycles. For example, Stripe is a technology company that builds economic infrastructure for the internet that uses S3 Intelligent-Tiering. Businesses of every size—from new startups to public companies—use Stripe software to accept payments and manage their businesses online. Kalyana Chadalavada, Head of Efficiency at Stripe, had this to say:
“Since the launch of S3 Intelligent-Tiering in 2018, we’ve automatically saved ~30% per month on our storage costs without any impact on performance or need to analyze our data. With the new Archive Instant Access tier, we anticipate automatically realizing the benefit of archive storage pricing, while retaining the ability to access our data instantly when needed.”
The number of customers transitioning the majority of their data to S3 Intelligent-Tiering has accelerated. S3 Intelligent-Tiering combines the power of automatic storage cost savings with the lowest storage cost for data that is rarely accessed, independent of object size or object lifetime. That said, keep in mind that, for datasets that have predictable access patterns, you can get the highest storage cost savings by fine-tuning S3 Lifecycle policies based on detailed analysis of access patterns in S3 Storage Lens. Read more about our mental model to optimizing storage costs in this blog on Amazon S3 predictable and dynamic access patterns.
How quickly will I save on S3 Intelligent-Tiering after transitioning my data from S3 Standard?
Many customers often ask us what the impact on the savings potential is when they have accumulated petabytes of data on the S3 Standard storage class that need to be transitioned using an S3 Lifecycle policy.
We typically see that customers start seeing the benefits of S3 Intelligent-Tiering a month after transitioning their data. It’s important to highlight that your first-month cost on S3 Intelligent-Tiering will be slightly higher than in the S3 Standard storage class. This is because objects uploaded to the S3 Intelligent-Tiering storage class are automatically stored in the Frequent Access tier, and charged a small per object monitoring and automation charge. And, if you are transitioning objects from S3 Standard using S3 Lifecycle policies you also pay one-time transition request charges. Once the data is moved into S3 Intelligent-Tiering, you can expect to see the cost savings kick-in on the second month when objects that have not been accessed for 30 consecutive days automatically move down to the S3 Intelligent-Tiering Infrequent Access tier. And, you should expect to see substantial storage cost savings on the fourth month when objects not accessed for 90 consecutive days move to the Archive Instant Access tier.
To show you what is achievable at scale, let’s take a real-world example. In the following, we assume we have a bucket with 4,500,000,000 objects that sum to 40 petabytes (PB) of data stored in S3 Standard in the US east (N. Virginia) Region. To simulate the savings in S3 Intelligent-Tiering, we assume that 30% of all of the objects are accessed each month, 30% of the objects are infrequently accessed, and 40% of the objects are rarely accessed. Table 1 shows that the first month costs, including the S3 Lifecycle Requests, are offset by a single month of storage cost savings in S3 Intelligent-Tiering. On the fourth month and after you can expect meaningful storage cost savings from data that is rarely accessed. In this example, we model storage cost savings of over 40% per year compared to using the S3 Standard storage class. You can model what the expected savings are for your individual use case using the Amazon S3 pricing calculator.
Table 1: S3 Intelligent-Tiering savings far exceed cost of transitioning data from S3 Standard
*S3 Lifecycle Requests only apply when you move data from S3 Standard to the S3 Intelligent-Tiering storage class. We recommend uploading objects directly to realize the highest cost savings.
**30% of GB Month stored in the Frequent Access tier, 30% stored in the Infrequent Access tier, and 40% stored in the Archive Instant Access tier to estimate ongoing monthly costs.
Automating the creation of S3 Lifecycle rules at scale for all the buckets
Given the broad scope of use cases and opportunities to save costs with Amazon S3 Intelligent-Tiering, the follow-up question customers ask is how to get started? There are two ways to get data into the S3 Intelligent-Tiering storage class:
- Direct PUT (this will require a change to an application to PUT the objects directly into Amazon S3 Intelligent-Tiering).
- Use Amazon S3 Lifecycle rules to automate the movement of objects from S3 Standard to S3 Intelligent-Tiering.
One of the easiest ways for you to realize the benefits of S3 Intelligent-Tiering for existing objects is to apply an S3 Lifecycle transition rules to your buckets. S3 Lifecycle rules can be used to automate the transition of objects to the S3 Intelligent-Tiering class from S3 Standard, and S3 Standard-Infrequent Access (IA). To do this, you can use the Python script we have provided in GitHub to automate the creation of S3 Intelligent-Tiering transition action rules for all the buckets in a given account (even if you have hundreds of buckets). We cover three scenarios of adding Lifecycle rules, and the script documents the results in an excel sheet that is generated at the end as output for your review.
Scenario #1: If an S3 Lifecycle rules exist, the script checks if it has a transition action rule to another storage class such as Amazon S3 Glacier, Amazon S3 Standard-IA, Amazon S3 Glacier Deep Archive, or even Amazon S3 Intelligent-Tiering. If there are no transition action rules present, the script adds a new rule to the existing Lifecycle configuration that transitions the current and previous versions of the objects in that bucket to the S3 Intelligent-Tiering storage class “0” days after object creation. It also records an entry in the output excel sheet with name of bucket and status as “Updated the existing Lifecycle with Transition rule to S3 INT.”
Scenario #2: If an S3 Lifecycle rules exist, and has a transition action rule to another storage class such as Amazon S3 Glacier, S3 Standard-IA, Amazon S3 Glacier Deep Archive, or even Amazon S3 Intelligent-Tiering, the script does not add or update the existing Lifecycle rules, and simply records an entry in the output excel sheet with name of the bucket and status as “No changes made to S3 Lifecycle configuration.”
Scenario #3: If there are no S3 Lifecycle rules for the bucket, the script adds a new transition action rule with the transition set for the current and previous version of the objects in that bucket to the S3 Intelligent-Tiering storage class with “0” days after object creation. It also records an entry in the output excel sheet with name of bucket and status as “Added a new S3 Lifecycle Transition Rule to S3 INT.”
The following screenshot is an example of the S3 Lifecycle rule added by the Python script. You will notice that the Lifecycle rule is named as “Added S3 INT Transition LC by automated script-timestamp,” so that you can easily distinguish the newly added rule. You can customize the name as needed by updating the Python script.
The Python script can be run from your laptop, Amazon EC2 instance, or AWS Lambda. You can download the script and associated requirements.txt, and IAM policy. The script requires several Python libraries to run, which are listed in the requirements.txt file. We have also provided a sample minimal IAM policy you need for the user or IAM role to be able to query the S3 buckets and create or update new S3 Lifecycle transition rules as needed. One key note is that this script is designed to run from a single account for ease of use and security. If you have a large multi-account structure, it may be best to add additional automation to run the script with appropriate user or roles in each individual account.
S3 Intelligent-Tiering is the first cloud storage that automatically reduces your storage costs on a granular object level by automatically moving data to the most cost-effective access tier based on access frequency, without performance impact, retrieval fees, or operational overhead. S3 Intelligent-Tiering delivers milliseconds latency and high throughput performance for frequently, infrequently, and now rarely accessed data in the Frequent, Infrequent, and new Archive Instant Access tiers. Now, you can use S3 Intelligent-Tiering as the default storage class for virtually any workload, especially data lakes, data analytics, new applications, and user-generated content.
In this blog post, we highlighted the cost and benefits of moving petabytes of data from the S3 Standard storage class to the S3 Intelligent-Tiering storage class by using an S3 Lifecycle policy. It’s important to highlight that your first-month cost on S3 Intelligent-Tiering will be slightly higher than in the S3 Standard storage class primarily because of one-time S3 Lifecycle charges. After that first month, you will start seeing storage cost savings from objects that have not been accessed for 30 consecutive days that have moved down to the S3 Intelligent-Tiering Infrequent Access tier, and will see meaningful storage cost savings after three months when objects that have not been accessed for 90 consecutive days move to the Archive Instant Access tier. Furthermore, you can use the Python script we have provided in GitHub to automate the creation of S3 Intelligent-Tiering transition action rules for all the buckets in your account. To get the highest storage cost savings for data you store in S3 Intelligent-Tiering, we recommend uploading objects directly.
Thanks for reading this Amazon S3 cost optimization blog. By saving on time and on cloud storage costs using automation and storage classes, you can focus spending on other projects and initiatives. We welcome your feedback or comments on the blog in the comments section.