Manage Amazon S3 storage costs granularly and at scale using S3 Intelligent-Tiering

Cost-effective data storage is critical when building and scaling data lakes that manage and hold growing datasets. By choosing the right storage architecture, customers are empowered to quickly experiment and migrate to AWS. Amazon S3 Intelligent-Tiering is a storage class that allows customers to optimize storage costs automatically when data access patterns change without performance impact or operational overhead, for all stages of data lake workflows.

In this blog, we explain how developers and cloud operations managers can use S3 Intelligent-Tiering to optimize storage costs. We start off by breaking down S3 Intelligent-Tiering access tiers. We then focus on multiple use cases, starting with individual buckets and directly uploading objects to S3 Intelligent-Tiering. Following that, we also explain how to transition existing objects from S3 Standard or S3 Standard-IA to S3 Intelligent-Tiering, using an S3 Lifecycle policy.

Later on we explain how to enable S3 Intelligent-Tiering Lifecycle policy at scale, on a large number of buckets. Here we cover two scenarios, namely transitioning objects between S3 Intelligent-Tiering access tiers based on access patterns, for both existing and new buckets. These use cases enable developers and cloud operations managers to manage the S3 Intelligent-Tiering storage class configurations on individual S3 buckets or at scale across multiple S3 buckets in an AWS account, optimizing storage costs automatically when data access patterns change.

S3 Intelligent-Tiering access tiers

S3 Intelligent-Tiering automatically stores objects in three access tiers:

Frequent Access tier optimized for frequently accessed data
Lower-cost Infrequent Access tier optimized for infrequently accessed data
Very-low-cost Archive Instant Access tier optimized for rarely accessed data

To save more on storage cost that doesn’t require immediate retrieval, you can activate the optional asynchronous Archive Access and Deep Archive Access tiers. When turned on, objects not accessed for 90 days are moved directly to the Archive Access Tier (bypassing the automatic Archive Instant Access tier) and the Deep Archive Access tier after 180 days.

There are no data retrieval charges in S3 Intelligent-Tiering. Customers can implement S3 Intelligent-Tiering with a small monthly per-object fee for monitoring and automation and has a minimum eligible object size of 128 KB for auto-tiering. Amazon S3 Intelligent-Tiering further optimizes storage cost savings by removing the minimum storage duration and monitoring and automation charge for objects less than 128 KB.

Frequent Access tier (automatic): This is the default access tier that any object created or transitioned to S3 Intelligent-Tiering begins its lifecycle in.
Infrequent Access tier (automatic): If an object is not accessed for 30 consecutive days, the object moves to the Infrequent Access tier.
Archive Instant Access tier (automatic): If an object is not accessed for 90 consecutive days, the object moves to the Archive Instant Access tier.
Archive Access tier (optional): You can activate the Archive Access tier for data that can be accessed asynchronously. After activation, the Archive Access tier automatically archives objects that have not been accessed for a minimum of 90 consecutive days. You can extend the last access time for archiving to a maximum of 730 days. Standard retrieval times for this access tier can range from 3-5 hours. Expedited retrieval is an option if you need a faster access to the object.
Deep Archive Access tier (optional): You can activate the Deep Archive Access tier for data that can be accessed asynchronously. After activation, the Deep Archive Access tier automatically archives objects that have not been accessed for a minimum of 180 consecutive days. You can extend the last access time for archiving to a maximum of 730 days. Standard retrieval of objects in this access tier occurs within 12 hours. Expedited retrieval is an option if you need a faster access to the object.

Solution overview (individual S3 buckets)

In this section, we walk through:

Directly uploading objects to S3 Intelligent-Tiering storage class.
Transitioning existing objects from S3 Standard or S3 Standard-IA to S3 Intelligent-Tiering using an S3 Lifecycle policy.

For this walkthrough, you should have the following prerequisites:

An AWS account
AWS IAM user/role with permission to create/modify an Amazon S3 bucket
AWS CLI version 2

1. Uploading objects directly to S3 Intelligent-Tiering

To upload an object directly to the S3 Intelligent-Tiering storage class, you must specify the storage class as S3 Intelligent-Tiering. Execute the following AWS CLI command to accomplish this.

aws s3api put-object --bucket <bucket_name> --key dir-1/my_images.tar --body my_images.tar --storage-class INTELLIGENT_TIERING

To upload an object to the S3 Intelligent-Tiering storage class using the PUT API operation, you must specify the storage class in the x-amz-storage-class request header.

2. Transitioning existing objects to S3 Intelligent-Tiering

Here, we walk you through the steps to create a S3 Lifecycle policy. Based on access patterns, objects are moved automatically from one access tier to another as follows.

When the objects are placed into S3 Intelligent-Tiering, they are first stored in the Frequent Access tier.
30 consecutive days (not accessed): Objects will be moved to the Infrequent Access tier.
90 consecutive days (not accessed): Objects will be moved to the Archive Instant Access tier.

An S3 Lifecycle rule is a set of rules that define actions that Amazon S3 applies to a group of objects. The following steps will walk through how to automatically transition objects from S3 Standard class to S3 Intelligent-tiering.

Step 1: The following lifecycle rule transitions all objects to S3 Intelligent-Tiering class based on object creation date. Create an intelligent-tier.json file as following.

{
    "Rules": [
        {
            "ID": "Intelligent_Tier_lifecycle",
            "Prefix": "",
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 0,
                    "StorageClass": "INTELLIGENT_TIERING"
                }
            ]
        }
   ]
}

Step 2: Run the following command to create a new lifecycle rule for the bucket.

aws s3api put-bucket-lifecycle-configuration --bucket <bucket_name> --lifecycle-configuration file://intelligent-tier.json

Run the following command to return/verify the lifecycle rule set on the bucket.

aws s3api get-bucket-lifecycle-configuration --bucket <bucket_name>

Configuring Opt-in asynchronous Archive Access tiers

To save more on storage cost that doesn’t require immediate retrieval, you can activate the optional asynchronous Archive Access and Deep Archive Access tiers for individual buckets by following the steps in this section.

90 consecutive days (not accessed): Objects will be moved to the Archive Access tier (bypassing the automatic Archive Instant Access tier).
180 consecutive days (not accessed): Objects will be moved to the Deep Archive Access tier.

Step 1: The following S3 Intelligent-Tiering Archive configuration moves objects to the Archive Access tier and the Deep Archive Access tier, which are optimized for objects that will be rarely accessed for long periods of time. You can choose to apply the configuration rule to all the objects in the bucket or limit the scope by defining filters. The two options available to define filters are object prefix and object tags. Create an archive-tier.json as shown in the following snippet:

{
   "Id":"Archive_Tier",
   "Status":"Enabled",
   "Tierings":[
      {
         "Days":90,
         "AccessTier":"ARCHIVE_ACCESS"
      },
      {
         "Days":180,
         "AccessTier":"DEEP_ARCHIVE_ACCESS"
      }
   ]
}

Step 2: Run the following command to create S3 Intelligent-Tiering Archive configuration.

aws s3api put-bucket-intelligent-tiering-configuration --bucket <bucket_name> --id Archive_Tier --intelligent-tiering-configuration file://archive-tier.json

Run the following command to return/verify the S3 Intelligent-Tiering Archive configuration on the bucket.

aws s3api get-bucket-intelligent-tiering-configuration --bucket <bucket_name> --id Archive_Tier

Illustrates the Intelligent-tiering Archive configuration

Step 3 (optional): The following S3 Lifecycle configuration specifies two rules, each with one action.

The Transition action requests Amazon S3 to transition all objects to the S3 Intelligent-Tiering storage class, on the object creation date.
The Expiration action requests Amazon S3 to delete all objects with the prefix “logs/”, 365 days after creation.
- Using object expiration rules to schedule periodic removal of objects eliminates the need to build processes to identify objects for deletion and submit delete requests to Amazon S3.
- When an object reaches the end of its lifetime based on its lifecycle policy, Amazon S3 queues it for removal and removes it asynchronously.

Create an intelligent-tier_logs_expire.json as following.

{
   "Rules":[
      {
         "ID":"Intelligent_Tier_lifecycle",
         "Filter":{
            "Prefix":""
         },
         "Status":"Enabled",
         "Transitions":[
            {
               "Days":0,
               "StorageClass":"INTELLIGENT_TIERING"
            }
         ]
      },
      {
         "ID":"Logs_Expire_lifecycle",
         "Filter":{
            "Prefix":"logs/"
         },
         "Status":"Enabled",
         "Expiration":{
            "Days":365
         }
      }
   ]
}

Step 4 (optional): Run the following command to create the lifecycle configuration for the bucket.

aws s3api put-bucket-lifecycle-configuration --bucket <bucket_name> --lifecycle-configuration file://intelligent-tier_logs_expire.json

Run the following command to return/verify the lifecycle configuration information set on the bucket.

aws s3api get-bucket-lifecycle-configuration --bucket <bucket_name>

Illustrates the S3 Intelligent-tiering lifecycle rules

HeadObject

The HEAD action retrieves metadata from an object without returning the object itself. This operation will retrieve the ‘ArchiveStatus’ attribute of an object along with several other attributes.

Run the following command to retrieve the metadata of an object.

aws s3api head-object --bucket <bucket_name> --key <object_key>

Shows the status of an S3 object in ArchiveStatus

Solution overview (S3 buckets at scale)

Here, we walk you through how S3 objects are transitioned from one access tier to another based on access patterns, as following.

When the objects are placed into S3 Intelligent-Tiering, they are first stored in the Frequent Access tier.
30 consecutive days (not accessed): objects will be moved to the Infrequent Access tier.
90 consecutive days (not accessed): objects will be moved to the Archive Access tier (bypassing the automatic Archive Instant Access tier).
180 consecutive days (not accessed): objects will be moved to the Deep Archive Access tier.

Our solution addresses how to create an S3 Lifecycle configuration and opt-in ro asynchronous Archive Access tiers for:

Existing S3 buckets in an AWS account
Newly created S3 buckets in an AWS account

1. Existing S3 buckets

For existing S3 buckets, you can update the S3 Lifecycle configuration for specific S3 buckets based on a resource tag filter using a python script.

For this, you should have the following prerequisites:

AWS account
AWS IAM user or role with appropriate permission
AWS CLI version 2
AWS Resource Tagging

At a high level, the process can be summarized as follows. Configure AWS CLI to accomplish these steps.

Following is S3 lifecycle configuration and archive policy that are used.

S3 lifecycle configuration
lifecycle_config_settings_it = {
    'Rules': [
        {'ID': 'S3 Intelligent Tier Transition Rule',
         'Filter': {'Prefix': ''},
         'Status': 'Enabled',
         'Transitions': [
             {'Days': 0,
              'StorageClass': 'INTELLIGENT_TIERING'}
         ]}
    ]}
archive_policy = {
        'Id': 'Archive_Tier', 
        'Status': 'Enabled',
        'Tierings': [
            {
                'Days': 90,
                'AccessTier': 'ARCHIVE_ACCESS'
            }, 
            {
                'Days': 180,
                'AccessTier': 'DEEP_ARCHIVE_ACCESS'
            }
        ]

Configuration applied to S3 buckets based on specific resource tags of the bucket.

Tag based filtering

bucket_tag_key = "storage.class"

bucket_tag_value = "s3.it"

Note: The filter key and values can be changed based on your enterprise tagging naming conventions.

Please ensure that the AWS IAM user/role that executes the python script has got appropriate permission to ListBuckets, GetBucketTagging, PutBucketLifecycleConfiguration, and PutBucketIntelligentTieringConfiguration.

The python script is available here to download.

2. New S3 buckets

For new S3 buckets, we will walk you through the creation of a Lambda function that applies the S3 lifecycle configuration on a bucket, triggered by the bucket creation event notification.

At a high level, the steps can be summarized as follows:

Create a Lambda function to transition the storage class from Standard to S3 Intelligent-Tiering
Lambda function includes the logic to apply the transition based on specific resource tags of the S3 bucket.
Create an EventBridge rule to capture the S3 bucket creation event to trigger the Lambda function.

Creating an AWS Lambda function

Here, we will be creating a Lambda function that transitions the S3 bucket storage class based on bucket tags. The Lambda function uses the same S3 bucket tag filter and lifecycle configuration mentioned in use case 1.

You can set up an Amazon EventBridge rule to trigger this Lambda function. When triggered, the Lambda function handler processes the EventBridge (CloudWatch) event and extracts the S3 bucket name. Only the buckets that matches the resource tag filter are updated with S3 Lifecycle configuration.

An AWS Lambda Function execution role

Following the best practices for the principle of least privilege, the Lambda function execution role needs minimum permission to apply new lifecycle configuration to a S3 bucket. At a minimum, it should have permission for PutBucketLifecycleConfiguration, PutBucketIntelligentTieringConfiguration, and GetBucketTagging. It’s a good practice to enable logging of your serverless Lambda functions. AWS managed role AWSLambdaBasicExecutionRole gives permission to upload logs to CloudWatch.

The Lambda function is available here to download.

Create an Amazon EventBridge rule

Here’s how to create an EventBridge rule that invokes the Lambda function, using the AWS Management Console.

Open the CloudWatch console and from the left-hand navigation pane, choose Events->Rules, and then click the Create rule button
Under Event Source, verify that Event Pattern is selected.
For Service Name dropdown, choose S3 and for Event Type, choose Bucket Level Operations.
Select Specific operation and choose Create Bucket
Under Targets, choose the name of the AWS Lambda function that you just created.
Provide a Name and (optional) description for the rule. Leave the Enabled box selected to make the rule active immediately.
Finally, select the Create rule.

Cleaning up

To avoid ongoing charges in your AWS account, delete the AWS Lambda resource(s) you may have created following along.

Conclusion

In this blog post, we covered different ways you could use S3 Intelligent-Tiering on individual S3 buckets or at scale across multiple S3 buckets to optimize storage costs depending on your specific situation. We provided guidance on how to optimize S3 storage costs when data access patterns change, without performance impact or operational overhead. Companies of any size can adopt this proactive approach to storage cost savings as part of the broader cloud cost optimization strategy.

You can enable S3 Intelligent-Tiering via the AWS Management Console, AWS Command Line Interface (CLI), and through the AWS SDKs. Customers can implement S3 Intelligent-Tiering with a small monthly per-object fee for monitoring and automation. For more information, refer to the AWS Well-Architected Framework, Architecture Best Practices for Storage, and Architecture Best Practices for Cost Optimization. We are here to help, and if you need further assistance in storage cost-optimization strategy, reach out to AWS Support and your AWS account team.