How AWS Storage partners utilize Amazon S3 Glacier Instant Retrieval to meet customers’ data archiving needs

Last year at re:Invent 2021 Amazon Simple Storage Service (Amazon S3) released a new storage class called Amazon S3 Glacier Instant Retrieval, as well as additional improvements to the Amazon S3 Glacier offerings. Many AWS Storage partners were excited about the new storage class as detailed in the blog “Storing data with AWS Partner solutions and Amazon S3 Glacier Instant Retrieval“. S3 Glacier Instant Retrieval provides partners the ability to utilize low-cost storage for long-lived, rarely-accessed data that requires milliseconds retrieval. This provides new options and capabilities for customers that require faster access to their archived data. Today, AWS has many storage partners that have added support for S3 Glacier Instant Retrieval, with more planning to add support in the coming months.

In this blog, I’ll explore how partners are making Amazon S3 Glacier Instant Retrieval available to customers, which enables customers to combine all the benefits of using partners solution while gaining seamless access to low-cost archive storage. I’ll also dive in to some common S3 Glacier Instant Retrieval use cases like cost-effective storage tiering from primary storage and long-term retention of backups.

Data archiving customer requirements

Customers use Amazon S3 for a wide variety of use cases including data lakes, web hosting, backup, archiving, and many more. Amazon S3 provides storage classes that address the business and technical requirements for different use cases, and for data at different points in its data lifecycle. Customers can determine which storage class is right for them based on the data access, resiliency, and cost requirements of their workloads. Before the launch of S3 Glacier Instant Retrieval, S3 Standard was used for frequently accessed data, S3 Standard-Infrequent Access was used for infrequently accessed data, and for rarely accessed data, Amazon S3 offered the Amazon S3 Glacier Flexible Retrieval (formerly ‘Amazon S3 Glacier’) and Amazon S3 Glacier Deep Archive storage classes (Figure 1). In addition to the storage classes in Figure 1, Amazon S3 offers storage classes like Amazon S3 Intelligent-Tiering which can reduce storage costs by automatically moving data to the most cost-effective access tier when access patterns change. See the Amazon S3 storage classes web page for a complete range of the Amazon S3 storage classes.

Amazon S3 storage classes

Figure 1: Amazon S3 storage classes

With Amazon S3 Glacier Flexible Retrieval and Amazon S3 Glacier Deep Archive, customers can access their data in minutes to hours for a very low cost. These storage classes met the needs for rarely accessed data that wasn’t time-critical to access. In some cases, customers have the need to store data that is rarely accessed but when it is needed, the data needs to be accessed in milliseconds. This may have prevented some customers from using S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive to store time-sensitive, rarely accessed data.

Prior to Amazon S3 Glacier Instant Retrieval, if customers needed rapid access to their less frequently accessed data, they would store their data in an infrequently accessed storage class rather than the lower-cost, rarely-accessed S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes. Customers were also looking for a place to store archive data that needed milliseconds access such as medical images, news media assets, or genomics data.

Amazon S3 Glacier Instant Retrieval as a solution

Amazon S3 Glacier Instant Retrieval addresses customers’ archive needs by allowing customers to cost-effectively store archive data while having access to that data immediately.

Now, let’s take a quick look at how easy it is to work with the new Amazon S3 Glacier Instant Retrieval storage class.

First, we upload a file to S3 Glacier Instant Retrieval using the AWS CLI V2. The storage class is also available via the S3 console, S3 API, or the AWS SDK. The below command “aws s3 cp” passes the “—storage-class” option to specify that we want to directly upload this object to S3 Glacier Instant Retrieval. If you are unsure of the name to use with the storage class option, you can use “aws s3 cp help” to see the list of possible values.

aws s3 cp myfile.mov s3://aws-s3-storageclass-bucket/ --storage-class=GLACIER_IR

upload: ./myfile.mov to s3://aws-s3-storageclass-bucket/myfile.mov

As you can see, uploading to S3 Glacier Instant Retrieval is as easy as uploading to any other Amazon S3 storage class. Now that we have uploaded the file, we can perform a head-object operation to get the object metadata. Below I am using the “query” option which is a standard option across the AWS CLI to specify I want the StorageClass property returned. As you can see, the storage class returns as GLACIER_IR.

aws s3api head-object --bucket aws-s3-storageclass-bucket --key myfile.mov --query StorageClass

"GLACIER_IR"

We can also see the same thing if we look at the S3 console.

Figure 2: Amazon S3 Bucket with uploaded object

Figure 2: Amazon S3 bucket with uploaded object

The most important aspect of using S3 Glacier Instant Retrieval is being able to get immediate access to your archived data, whenever and wherever you need it. With S3 Glacier Instant Retrieval you get access to your data in just milliseconds. I once again use the AWS CLI with “aws s3 cp” and copy the object down to my local machine. S3 Glacier Instant Retrieval is a synchronous storage class, which means when you issue a command to access or download an object, the action happens within milliseconds. With S3 Glacier Instant Retrieval, the command starts downloading the file immediately.

aws s3 cp s3://aws-s3-storageclass-bucket/myfile.mov myfile.mov

download: s3://aws-s3-storageclass-bucket/myfile.mov to ./myfile.mov

Amazon S3 Glacier Instant Retrieval storage partner use cases

For partners, Amazon S3 Glacier Instant Retrieval allows them to offer solutions that meet customers’ archive needs no matter how they need to access the data. With Amazon S3 Glacier Instant Retrieval, partners are able to add support to their existing solutions that require archive capabilities and instant access. This can be done by specifying the storage class with the S3 API or AWS SDK. Let’s look at a few of these use cases, like tiering storage from a primary storage system, backing up data for long term retention, and maintaining large storage archives.

First, let’s talk through the primary storage tiering use case. Primary storage partners offer solutions that provide file, block, or object storage, or tools that work with primary data. Many primary storage partners have supported tiering of some kind from their primary storage solution to Amazon S3 for many years. In addition to built-in functionality, AWS Storage Partners offer software that can help tier data from multiple storage systems, allowing for a unified, cost-effective way to manage data lifecycle. While many of these solutions support Amazon S3 storage classes that provide rapid access, adding support for archive storage classes presented some challenges.

One of the challenges is this data was often presented to customers over standard protocols like SMB and NFS. Partners had to either ensure the data could be retrieved within the protocol time-out limit or provide a special client to access the data. In addition, partners would have to build workflows for retrieving data. With S3 Glacier Instant Retrieval, primary storage partners can now send data that needs to be tiered to an archive storage class and meet all their other requirements. For example, Komprise is an AWS Storage Partner that provides customers with the ability to tier Network Attached Storage (NAS) data from different storage systems to AWS. Komprise announced plans to support S3 Glacier Instant Retrieval shortly after it was announced. Today customers can take advantage of S3 Glacier Instant Retrieval to move rarely accessed data from their primary storage system to Amazon S3 and get that data back immediately whenever they need to access it. In addition to Komprise, CTERA, Nasuni, NetApp, and Weka have added support for S3 Glacier Instant Retrieval. Some of these partners offer replication capabilities between on-premises and AWS and between AWS Regions. S3 Glacier Instant Retrieval allows the replicated copy to be stored in a cost-effective way but still be immediately available should a failover be needed.

Amazon S3 Glacier Instant Retrieval storage backup, restore, and archiving use cases

Next, let’s dive into the backup and restore use case. It may seem obvious, but backup and restore go hand-in-hand. Usually, data gets backed up more often than it gets restored. This is especially true as data ages over time. However, often customers have the need to store backup data for long periods of time, which may be either for business-specific policies or compliance requirements. This data is distinct from operational backups which often has a 30 – 90-day retention rate. Operational backups also can be needed for restores more often, especially in the early days after the backups are taken. Long-term retention backups, on the other hand, can range from a few months to many years. While these backups are rarely accessed, in some cases, customers also want to be able to quickly restore these backups. Partners like Clumio, Cohesity, Commvault, Druva, MSP360, Rubrik, and Veritas have added support for Amazon S3 Glacier Instant Retrieval as part of their existing Amazon S3 support to address this customer need. When customers create their connection to Amazon S3 buckets they can simply specify Amazon S3 Glacier Instant Retrieval as a storage class. Whenever they need to restore data, whether it’s a single file, a complete virtual machine, or an entire NAS share, they are able to do so with ease.

The last use case we are going to explore in this blog is data archiving. Archiving can be similar to both primary storage tiering and backup and restore but is generally distinct in that unlike primary storage tiering, the data is often no longer needed on the primary storage system. It is also generally distinct from backup and restore in that the data is usually not a copy of data that is stored elsewhere but the authoritative, only copy of the data. Archive data by its nature is rarely accessed but that doesn’t mean when it is requested it isn’t needed immediately. For example, in the Media and Entertainment industry customers often store large media archives called active-archives. While these might be petabytes in size, only a small portion of that data may be needed at any given time which may be any random file within that archive. Even though it is archived data, in some cases that data needs to be accessed immediately. With Amazon S3 Glacier Instant Retrieval, customers can now cost-effectively store that data and access any of the media objects with performance similar to any other Amazon S3 storage class built for more frequently accessed data.

Summary

In this blog we explored the use cases Amazon S3 Glacier Instant Retrieval addresses for customers and partners. We also reviewed an update on partners currently providing support for S3 Glacier Instant Retrieval. While this blog should not be considered a comprehensive list of partner support, it provides a good understanding of some of the ways partners are enabling customers to use S3 Glacier Instant Retrieval.

As a next step, you should determine if your needs align to any of these use cases, or if there are other use cases for S3 Glacier Instant Retrieval within your organization. Many AWS Partners, especially storage competency partners, have added support. Your solution provider can provide you with details on using S3 Glacier Instant Retrieval with their solution. Your AWS account team can also help determine if Amazon S3 Glacier Instant Retrieval is a good fit for your use case and help you set up a proof of concept.

Thanks for reading this blog post, if you have any comments or questions, leave them in the comments section.