Amazon S3 Lifecycle Management for Versioned Objects
Today I would like to tell you about a powerful new AWS feature that bridges a pair of existing AWS services and makes another pair of existing features far more useful! Let’s start with a quick review.
S3 & Versioned Objects
I’m sure that you already know about Amazon S3. First launched in 2006, S3 now processes over a million requests per second and stores trillions of documents, images, backups, and other data, all with high availability and eleven 9’s (i.e. 99.999999999%) durability. Since the initial launch, we have added many features and locations, and have also reduced the price (conveniently measured in pennies per Gigabyte per month) of storage repeatedly. One notable and popular S3 feature is object versioning . After you enable versioning for an S3 bucket, successive uploads or PUTs of a particular object will create distinct, named, individually addressable versions of the object in order to provide you with protection against overwrites and deletes. You can preserve, retrieve, and restore every version of every object in an S3 bucket that has versioning enabled.
You can retrieve previous versions of the object in order to recover from a human or programmatic error.
Glacier & Lifecycle Rules
You have probably heard about Amazon Glacier as well. Glacier shares eleven 9’s of data durability with S3, but offers a lower price per Gigabyte / month in exchange for a retrieval time that is typically between three and five hours. Glacier is ideal for long-term storage of important data that you don’t need to access within seconds or minutes.
S3’s Lifecycle Management integrates S3 and Glacier and makes the details visible via the Storage Class of each object. The data for objects with a Storage Class of Standard or RRS (Reduced Redundancy Storage) is stored in S3. If the Storage Class is Glacier, then the data is stored in Glacier. Regardless of the Storage Class, the objects are accessible through the S3 API and other S3 tools. Lifecycle Management allows you to define time-based rules that can trigger Transition (changing the Storage Class to Glacier) and Expiration (deletion of objects). The Expiration rules give you the ability to delete objects (or versions of objects) that are older than a particular age. You can use these rules to ensure that the objects remain available in case of an accidental or planned delete while limiting your storage costs by deleting them after they are older than your preferred rollback window.
S3 & Glacier & Versioned Objects & Lifecycle Rules
With all of that out of the way, I am finally ready to share today’s news! You can now create and apply Lifecycle rules to buckets that use versioned objects. This seemingly simple change makes S3, Glacier, and versioned objects a lot more useful. For example, you can arrange to keep the current version of an object in S3, and to transition older versions to Glacier. You can get to the current version (the one that you are most likely to need) immediately, with older versions accessible within three to five hours. Depending on your use case, you might want to transition all of the versions, including the current one, to Glacier. You might also want to expire each version a few days after it was created (using a rule for the current version) or overwritten/expired (using a rule based on the successor time for previous versions). In other words, this new feature combines the flexibility of S3 versioned objects with the extremely low cost of storage in Glacier, helping you to reduce your overall storage costs.
Lifecycle Management in the Console
Let’s set up a simple Lifecycle rule using the AWS Management Console. I will create a fresh bucket to store some backups:
In this example, my backup app is very simple-minded and generates its output to the same file every time. I’ll enable versioning for the bucket. This will allow me to upload fresh backups without having to move or rename any files, while gaining all of the advantages of versioning including protection against overwrites and deletions. It will also allow me to archive the previous versions of the file in Glacier. Here’s how I enable versioning:
Now I need to set up the appropriate Transition and Expiration rules:
The console now includes a wizard to simplify this process! In the first step, I can choose to create a rule that addresses all of the objects in the bucket, or a subset of objects that share a common name prefix within the bucket.
After choosing the objects that are addressed by the rule, I now specify the transitions and expirations for the current and previous versions of the object. Let’s say that I want to transition the current version of each backup to Glacier after a week, and the previous versions two days after they have been overwritten. Further, I would like to permanently delete the previous versions 100 days after they are no longer current. Here’s how I would set that up (you can also click on See an Example to get an even better understanding of the Lifecycle rules):
The console confirms my intent and then creates and activates the rule:
Once the rules have been established, transitions and expirations will happen automatically. I can see the current state of each version of an object from the console:
Important Note: In order to see the versions of my backup file, I clicked the Show button.
The example shown above is a good starting point, but things are somewhat complex behind the scenes and you should plan to spend some time learning more about this feature before you start using it. In fact, you may want to create a bucket just for testing and use it to try out your proposed rules.
Here are some things to think about when you design your strategy for versioning, transitions, and expirations:
- Versioning Status – This value is maintained on a per-bucket basis. Each bucket can be unversioned (the default) or versioned, and you also have the option to suspend versioning. With versioning suspended, you will stop accruing new version of an object. Also, deleting an object when versioning is suspended creates a special Delete Marker with a NULL version, and makes this the current version.
- Actions – The current and previous versions of an object each have transition and expiration actions, each of which have behavior that is dependent on the versioning status of the associated bucket. Based on your use case, you can choose to just transition, just expire, or transition and then expire.
- Days and Dates Rules can specify a date or a number of days since the creation of an object. Rules created in the console must be day-based; you must use the API to create a rule that includes a date. The lifecycle rules for previous versions take effect from the time that a current version is retained as a previous one. You can control the time that a superseded version remains in S3 before it is transitioned to Glacier or expired.
- Existing Rules The rules that you created prior to the introduction of rules for versioning will still apply and will behave as expected. If they reference specific dates, you will need to use the API to edit them (you can still view, disable, and delete them in the console). You can set up rules for previous versions before you actually enable versioning for a bucket. The rules will become applicable only after you do so.
Give it a Try
This new feature is available now and you can start using it today. Give it a spin and let me know what you think.