AWS Storage Blog

Simplify your data lifecycle by using object tags with Amazon S3 Lifecycle

Managing your storage cost effectively at scale can become complex as you have multiple applications or users using the data with different access patterns and frequency. S3 Lifecycle can help you optimize your storage cost by creating lifecycle configurations to manage your storage spend over time by moving your data to more cost-effective storage classes or expire them based on object age. With large-scale workloads, multi-tenant buckets, and growing numbers of objects, it can become a management burden to create and manage many rules in S3 Lifecycle configurations. In this blog post, we cover how to simplify your data lifecycle management by reducing the the number of rules in an S3 Lifecycle configuration by using object tagging.

How can I simplify my S3 Lifecycle rules in my configuration?

An S3 Lifecycle configuration is a set of rules that define the actions Amazon S3 applies to a group of objects. There are two primary types of actions: transition actions that move objects to another storage class, and expiration actions that delete objects. Customers can define an entire bucket or a subset of objects to transition or expire with rules in the lifecycle configuration.

S3 Lifecycle rules contain filters such as prefixes and object tags to specify the objects eligible for the specific lifecycle action. Each rule can contain one prefix and/or set of object tags. Many workloads use multiple prefixes within an S3 bucket. As the number of distinct prefixes and use cases in your bucket grows, the number of rules you need grows along with it.

Before we learn how to simplify lifecycle rules, let’s first look at the components of an S3 Lifecycle configuration.

What are the components of an S3 Lifecycle configuration?

An S3 Lifecycle configuration has the following elements – ID element, status element, filter element and elements to describe lifecycle actions. S3 Lifecycle configurations can be specified as an XML, consisting of one or more lifecycle rules.

<LifecycleConfiguration>
  <Rule>
    <ID>Transition and Expiration Rule</ID>
    <Filter>
       <Prefix>tax/</Prefix>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <Days>365</Days>
      <StorageClass>GLACIER</StorageClass>
    </Transition>
    <Expiration>
      <Days>3650</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

Each rule consists of the following:

  • Rule metadata that includes a rule ID, and status indicating whether the rule is enabled or disabled. If a rule is disabled, Amazon S3 doesn’t perform any actions specified in the rule.
  • Filter identifying objects to which the rule applies. You can specify a filter by using an object key prefix, one or more object tags, or a conjunction of both.
  • One or more transition or expiration actions with a date or a time period in the object’s lifetime when you want Amazon S3 to perform the specified action.

For example configurations, see the documentation with examples of lifecycle configurations.

What does an S3 Lifecycle configuration look like with multiple prefixes?

You can specify multiple rules if you want different lifecycle actions of different objects. The following lifecycle configuration has two rules:

  • Rule 1 applies to objects with the key name prefix classA/. It directs Amazon S3 to transition objects to the S3 Glacier storage class one year after creation and expire these objects 10 years after creation.
  • Rule 2 applies to objects with key name prefix classB/. It directs Amazon S3 to transition objects to the S3 Standard-IA storage class 90 days after creation and delete them one year after creation.
<LifecycleConfiguration>
    <Rule>
        <ID>ClassADocRule</ID>
        <Filter>
           <Prefix>classA/</Prefix>        
        </Filter>
        <Status>Enabled</Status>
        <Transition>        
           <Days>365</Days>        
           <StorageClass>GLACIER</StorageClass>       
        </Transition>    
        <Expiration>
             <Days>3650</Days>
        </Expiration>
    </Rule>
    <Rule>
        <ID>ClassBDocRule</ID>
        <Filter>
            <Prefix>classB/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>        
           <Days>90</Days>        
           <StorageClass>STANDARD_IA</StorageClass>       
        </Transition>    
        <Expiration>
             <Days>365</Days>
        </Expiration>
    </Rule>
</LifecycleConfiguration>

For each prefix in your bucket, a new lifecycle rule is required for transition and expiration actions for objects within that prefix. Buckets with hundreds of prefixes, as a result, need many rules to set up the appropriate lifecycle actions. In order to limit the overall number of lifecycle rules needed for all of your prefixes, we recommend using object tags.

Specifying a filter based on object tags

In the following example, the lifecycle rule specifies a filter based on a tag (key) and value (value). The rule then applies to the subset of objects with the specific tag.

<LifecycleConfiguration>
  <Rule>
    <ID>Rule 1</ID>
    <Filter>
      <Tag>
         <Key>tag1</Key>
         <Value>value1</Value>
      </Tag>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <StorageClass>GLACIER<StorageClass>
      <Days>365</Days> 
    </Transition>
  </Rule>
</LifecycleConfiguration>

You can specify a filter based on multiple tags. Wrap the tags in the <And> element shown in the following example. The rule directs Amazon S3 to perform lifecycle actions on objects with two tags (with these specific tag keys and values).

<LifecycleConfiguration>
    <Rule>
      <Filter>
         <And>
            <Tag>
               <Key>key1</Key>
               <Value>value1</Value>
            </Tag>
            <Tag>
               <Key>key2</Key>
               <Value>value2</Value>
            </Tag>
             ...
          </And>
      </Filter>
    <Status>Enabled</Status>
    <Transition>
      <StorageClass>GLACIER<StorageClass>
      <Days>365</Days> 
    </Transition>
  </Rule>
</LifecycleConfiguration>

The lifecycle rule applies to objects that have both of the tags specified. Amazon S3 performs a logical <And> operation. Note the following:

  • Each tag must match both key and value exactly.
  • The rule applies to a subset of objects that has all the tags specified in the rule. If an object has additional tags specified, the rule still applies.

Why should I use object tags?

You can associate multiple key-value pairs (tags) with each of your S3 objects, with the ability to change them at any time. The tags can be used to manage and control access, set up lifecycle rules, customize S3 Storage Class Analysis, and filter CloudWatch metrics. You can think of the bucket as a data lake, and use tags to create a taxonomy of the objects within the lake. This is more flexible than using the bucket and prefixes, and allows you to make semantic-style changes without renaming, moving, or copying objects.

Simplifying your S3 Lifecycle configurations using object tags will be most helpful if you currently have tens or hundreds of rules in your lifecycle configuration filtered through your prefixes. We recommend consolidating those rules by using object tags. To demonstrate the effectiveness of using object tags in your lifecycle configurations, let us take the example of a bucket with the key name prefix configuration and their specific lifecycle action as shown in the following table:

Rule Filter (Prefix) Transition action – 1 Transition action – 2 Expiration action
Rule 1 Prefix 1 S3 Standard-IA after 45 days S3 Glacier after 90 days After 200 days
Rule 2 Prefix 2 S3 Glacier after 90 days
Rule 3 Prefix 3 S3 Intelligent-Tiering after 30 days
Rule 4 Prefix 4 S3 Standard-IA after 90 days After 200 days
Rule 5 Prefix 5 S3 Intelligent-Tiering after 90 days After 200 days
Rule 6 Prefix 6 S3 Standard-IA after 45 days After 200 days
Rule 7 Prefix 7 S3 Glacier after 90 days
Rule 8 Prefix 8 S3 Intelligent-Tiering after 30 days
Rule 9 Prefix 9 S3 Standard-IA after 90 days After 200 days
Rule 10 Prefix 10 S3 Intelligent-Tiering after 90 days After 200 days
Rule 11 Prefix 11 S3 Standard-IA after 45 days After 200 days
Rule 12 Prefix 12 S3 Glacier after 90 days S3 Glacier Deep Archive after 200 days
Rule 13 Prefix 13 S3 Intelligent-Tiering after 30 days
Rule 14 Prefix 14 S3 Standard-IA after 90 days After 200 days
Rule 15 Prefix 15 S3 Intelligent-Tiering after 90 days After 200 days
Rule 16 Prefix 16 S3 Standard-IA after 45 days After 200 days
Rule 17 Prefix 17 S3 Glacier after 90 days
Rule 18 Prefix 18 S3 Intelligent-Tiering after 30 days
Rule 19 Prefix 19 S3 Standard-IA after 90 days After 200 days
Rule 20 Prefix 20 S3 Intelligent-Tiering after 90 days After 200 days

Notice that there are 20 different prefixes with lifecycle actions, and as a result, the lifecycle configuration will need 20 different rules if the only filter element is a prefix. We can reduce the number of rules significantly by using object tags, each defined for every unique lifecycle action.

Analyzing this specific example, we recommend creating six different object tags, one for each unique lifecycle action:

Tag – Key Tag – Value Lifecycle action for the Tag
TransitionInfrequent 45 Transition tagged objects to S3  Standard-IA after 45 days
TransitionArchive 90 Transition tagged objects to S3 Glacier after 90 days
TransitionIntelligent 30 Transition tagged objects to S3 Intelligent-Tiering after 30 days
TransitionIntelligent 90 Transition tagged objects to S3 Intelligent-Tiering after 90 days
TransitionDeepArchive 200 Transition tagged objects to S3 Glacier Deep Archive after 200 days
Expiration 200 Expire tagged objects after 200 days

We create one tag for each unique transition element and one tag for each unique expiration element. Objects that only need to be transitioned OR expired need only one of the tags. Objects that expire after transition should be tagged with both transition and expiration element tags.

For example, objects in prefix 3 that only transition to S3 Intelligent-Tiering after 30 days need only one tag. However, objects in Prefix 1, which have both transition and expiration actions, need both of those tags.

As a result, our new and improved lifecycle configuration hasthe following structure:

Rule Filter (Tag – Key Value) Transition action Expiration action
Rule 1 Key – TransitionInfrequent
Value – 45
S3 Standard-IA after 45 days After 200 days
Rule 2 Key – TransitionArchive
Value – 90
S3 Glacier after 90 days
Rule 3 Key – TransitionIntelligent
Value – 30
S3 Intelligent-Tiering after 30 days
Rule 4 Key – TransitionInfrequent
Value – 45ANDKey – TransitionArchive
Value – 90AND

Key – Expiration
Value – 200

S3 Standard-IA after 45 days, then S3 Glacier after 90 days After 200 days
Rule 5 Key – TransitionArchive
Value – 90ANDKey – TransitionDeepArchive
Value – 200
S3 Glacier after 90 days, then S3  Glacier Deep Archive after 200 days
Rule 6 Key – TransitionInfrequent
Value – 90ANDKey – Expiration
Value – 200
S3 Standard-IA after 90 days After 200 days
Rule 7 Key – TransitionIntelligent
Value – 90ANDKey – Expiration
Value – 200
S3 Intelligent-Tiering after 90 days After 200 days

We have simplified the lifecycle configuration by reducing the number of rules. As you add new datasets that need similar transition and expiration policies, you can tag them based on their retention periods. As there is a limit of 1000 rules per bucket, finding ways to reduce your lifecycle rules will help when managing large shared datasets.

Great, so how do I get started?

To get started on replacing your lifecycle rules to use object tags, we recommend three steps: automate adding objects tags for your objects in your application, add object tags to your current objects based on their lifecycle, and finally changing the lifecycle configurations with new rule filters.

Step 1 – Automating object tags to future objects

Object tagging works with many Amazon S3 API operations. For example, you can specify tags when you create objects, and the tagging action itself is free of charge when added as a part of the PutObject request. You specify tags using the x-amz-tagging request header.

Alternatively, you could add an AWS Lambda trigger that adds the tags to the object when uploaded. Adding tags via Lambda would incur additional Lambda and S3 request fees.

Step 2:  Applying object tags to existing objects

You can add object tags straight from the console on individual objects or use S3 Batch Operations to add or replace object tags to millions of objects. For example, using S3 Inventory reports for multiple prefixes, you can generate prefix-level manifests and then use S3 Batch Operations to add appropriate tags to each prefix. In the preceding example, the S3 Inventory report manifest for prefix 1 can be used as an input for S3 Batch Operations job to add the tag “SIA45,” which can then be used in the lifecycle configuration to transition to S3 Standard-IA storage class after 45 days since the object was created.

Step 3: Changing your S3 Lifecycle configuration to include object tags as filters

The following is an example of the prefix structure for the first table, the XML input of the lifecycle configuration only using prefixes as the filter element looks like this:

<LifecycleConfiguration>
    <Rule>
        <ID>Rule1</ID>
        <Filter>
           <Prefix>Prefix1/</Prefix>        
        </Filter>
        <Status>Enabled</Status>
        <Transition>        
           <Days>45</Days>        
           <StorageClass>STANDARD_IA</StorageClass>       
        </Transition>    
        <Expiration>
             <Days>200</Days>
        </Expiration>
    </Rule>
    <Rule>
        <ID>Rule2</ID>
        <Filter>
            <Prefix>Prefix2/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>        
           <Days>90</Days>        
           <StorageClass>GLACIER</StorageClass>       
        </Transition>
    </Rule>
    ...
    ...
    ...
    ...
    <Rule>
        <ID>Rule20</ID>
        <Filter>
            <Prefix>Prefix20/</Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>        
           <Days>90</Days>        
           <StorageClass>INTELLIGENT_TIERING</StorageClass>       
        </Transition>    
        <Expiration>
             <Days>200</Days>
        </Expiration>
    </Rule>
</LifecycleConfiguration>

After tagging all the objects in these prefixes using step 1 and step 2, the new lifecycle configuration looks like this:

<LifecycleConfiguration>
  <Rule>
    <ID>Rule 1</ID>
    <Filter>
     <And>
      <Tag>
         <Key>TransitionInfrequent</Key>
         <Value>45</Value>
      </Tag>
      <Tag>
         <Key>Expiration</Key>
         <Value>200</Value>
      </Tag>
     <And>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <StorageClass>STANDARD_IA<StorageClass>
      <Days>45</Days> 
    </Transition>
    <Expiration>
      <Days>200</Days>
    </Expiration>
  </Rule>
   ...
   ...
   ...
   ...
  <Rule>
    <ID>Rule 7</ID>
    <Filter>
     <And>
      <Tag>
         <Key>TransitionIntelligent</Key>
         <Value>90</Value>
      </Tag>
      <Tag>
         <Key>Expiration</Key>
         <Value>200</Value>
      </Tag>
     <And>
    </Filter>
    <Status>Enabled</Status>
    <Transition>
      <StorageClass>INTELLIGENT_TIERING<StorageClass>
      <Days>90</Days> 
    </Transition>
    <Expiration>
      <Days>200</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

As a result of the consolidation, we have successfully reduced the number of rules in the lifecycle configuration from 20 to just 7.

Is there anything else I should know?

There are a couple of things to be careful of while consolidating your lifecycle rules with object tags.

  • Adjusting your applications to tag objects during PUT operations helps you create the tags without a charge. Replacing or adding new tags to your existing objects will incur standard costs for tagging. Tags cost $0.01 per 10,000 tags per month. Requests that add or update tags (PUT and GET, respectively) are charged at the Tier 1 request rates. For more information, see the Amazon S3 pricing page.
  • When tagging multiple objects from a manifest using Batch Operations, changes are made to the full set of tags rather than individually. As a result, Batch Operations replaces any existing tags to the objects. For more information on replacing all tags, please refer to the documentation.
  • When you have multiple rules in an S3 Lifecycle configuration, an object can become eligible for multiple lifecycle actions. In such cases, Amazon S3 follows these general rules: permanent deletion takes precedence over transition and transition takes precedence over creation of delete markers. For example, when an object is eligible for both a S3 Glacier and S3 Standard-IA (or S3 One Zone-IA) transition, Amazon S3 chooses the Amazon S3 Glacier transition. For examples, see the documentation on overlapping filters, conflicting lifecycle actions, and what Amazon S3 does.
  • When specifying the AbortIncompleteMultipartUpload or ExpiredObjectDeleteMarker lifecycle actions, the rule cannot specify a tag-based filter. We recommend turning these on at the bucket level to optimize your storage further and improve performance.
  • You can associate up to 10 tags with an object. Tags that are associated with an object must have unique tag keys. A tag key can be up to 128 Unicode characters in length, and tag values can be up to 256 Unicode characters in length. The key and values are case-sensitive. For more information about tag restrictions, see the documentation on user-defined tag restrictions.

Conclusion

In this post, we demonstrated how you can use object tags to reduce and consolidate your S3 Lifecycle rules. In particular, this helps you simplify how you manage your data lifecycle by analyzing your current S3 Lifecycle configuration, identify common lifecycle actions to multiple prefixes, and use object tags to tag all objects across different prefixes with common lifecycle actions. As you scale your applications, your datasets increase. When objects are tagged based on their retention needs, S3 Lifecycle can automatically transition or expire them based on your configuration. We hope you can use the examples covered in this blog post to optimize the number of rules in your S3 Lifecycle configuration across your accounts and buckets to optimize your storage costs and simplify your data management.

Thanks for reading this post and using S3 Lifecycle to manage your objects in Amazon S3. If you have any comments, questions, or feedback, please leave a comment in the comments section.

Souvik Bhattacharya

Souvik Bhattacharya

Souvik Bhattacharya is a technical product manager on the Amazon S3 team at AWS. Souvik enjoys hearing from customers on how they use S3, and new ideas for future blog posts. Prior to coming to AWS, Souvik built tech solutions for K-12 schools to improve learning outcomes. He is based in Seattle and enjoys brewing espressos at home.

Gregory Benjamin

Gregory Benjamin

Gregory Benjamin is a Senior Software Development Engineer on the Amazon S3 team. He rides his bike to work even when it’s raining, which is most of the time in Seattle.

Vignesh Natarajan

Vignesh Natarajan

Vignesh Natarajan is a Software Engineer on the Amazon S3 team. Depending on the weather, he likes to spend his spare time reading or playing ultimate frisbee.