Category: Amazon Glacier


Amazon Glacier Now in the Asia Pacific (Sydney) Region

by Jeff Barr | on | in Amazon Glacier | | Comments

Amazon Glacier is a storage service designed for backup and archiving. Glacier provides secure, durable storage at an extremely low cost. Introduced a little over a year ago, Glacier has been growing quickly.

Our customers are using Glacier for digital preservation, online backup of historical assets, archiving of cold media data, storage of operational data for online services, redundant storage for disaster recovery, and more.

Glacier Down Under
Today we are making Glacier available in the Asia Pacific (Sydney) Region.

Glacier is now available in six AWS Regions:

  • US East (Northern Virginia)
  • US West (Oregon)
  • US West (Northern California)
  • EU (Ireland)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Sydney)

Pricing for Glacier storage starts at $0.012 per Gigabyte per month in the new Region.

 

You can use Glacier in two different ways. You can upload archives directly, or you can use Amazon S3’s lifecycle rules to migrate objects in an S3 bucket to the Glacier storage class using a relative or absolute time specifier:

Customers Using Glacier
As I mentioned earlier, usage of Glacier has been growing quickly. Here are a few stories to give you a better idea of what our customers are doing with it:

DuraCloud is a digital preservation service launched in conjunction with the Library of Congress. It supports Glacier as one of four storage providers, including Amazon S3.

Illumina‘s BaseSpace is a cloud-based sequencing environment. It allows researchers to upload massive DNA data sets and sequences to the cloud for analysis, with long-term data storage in Glacier.

Backupify provides archiving, search, and restore functions for cloud-based data stored in other online applications and services. All customer data is stored in Glacier.

Scribd backs up all customer and internal data, including database snapshots and log files, to Glacier.

— Jeff;

 

New – Range Retrieval for Amazon Glacier

by Jeff Barr | on | in Amazon Glacier | | Comments

Amazon Glacier is designed for storing data that is infrequently accessed. Once you have stored your data, you can retrieve up to 5% of it (prorated daily) each month at no charge.

Today we are making it easier for you to remain within the 5% retrieval band by introducing Range Retrievals. You can use this new feature to fetch only data you need from a larger file or to spread the retrieval of a large archive over a longer period of time.

Range Retrieval
Glacier’s existing archive retrieval function now accepts an optional RetrievalByteRange parameter. If you don’t provide this header, Glacier will retrieve the entire archive.

If you choose to provide this parameter, it must be in the form StartByte-EndByte. The value provided for StartByte must be megabyte aligned (a multiple of 1,048,576). The value provided for EndByte + 1 must be megabyte aligned if you are retrieving data from somewhere within the archive. If you want to retrieve data from StartByte up to the end of the archive, simply specify a value that is one less than the archive size.

When you upload data to Glacier, you must also compute and supply a tree hash. Glacier checks the hash against the data to ensure that it has not been altered en route. A tree hash is generated by computing a hash for each megabyte-sized segment of the data, and then combining the hashes in tree fashion to represent ever-growing adjacent segments of the data.

If you would like to use tree hashes to confirm the integrity of the data that you download from Glacier (and you definitely should), then the range that you specify must also be tree-hash aligned. In other words, a tree hash must exist (at some level of the tree of hashes) for the exact range of bytes retrieved. If you specify such a range, Glacier will provide you with the corresponding tree hash when the retrieval job completes.

This new feature is available now and you can start using it today. The AWS SDK for Java and the AWS SDK for .Net have been updated and now include support for Range Retrievals.

For More Information
Here are some quick links that you can use to learn more about Range Retrievals in Glacier:

— Jeff;

 

Archiving Amazon S3 Data to Amazon Glacier

by Jeff Barr | on | in Amazon Glacier, Amazon S3 | | Comments

AWS provides you with a number of data storage options. Today I would like to focus on Amazon S3 and Amazon Glacier and a new and powerful way for you to use both of them together.

Both of the services offer dependable and highly durable storage for the Internet. Amazon S3 was designed for rapid retrieval. Glacier, in contrast, trades off retrieval time for cost, providing storage for as little at $0.01 per Gigabyte per month while retrieving data within three to five hours.

How would you like to have the best of both worlds? How about rapid retrieval of fresh data stored in S3, with automatic, policy-driven archiving to lower cost Glacier storage as your data ages, along with easy, API-driven or console-powered retrieval?

Sound good? Awesome, because that’s what we have! You can now use Amazon Glacier as a storage option for Amazon S3.

There are four aspects to this feature — storage, archiving, listing, and retrieval. Let’s look at each one in turn.

Storage
First, you need to tell S3 which objects are to be archived to the new Glacier storage option, and under what conditions. You do this by setting up a lifecycle rule using the following elements:

  • A prefix to specify which objects in the bucket are subject to the policy.
  • A relative or absolute time specifier and a time period for transitioning objects to Glacier. The time periods are interpreted with respect to the object’s creation date. They can be relative (migrate items that are older than a certain number of days) or absolute (migrate items on a specific date)
  • An object age at which the object will be deleted from S3.  This is measured from the original PUT of the object into the service, and the clock is not reset by a transition to Glacier.

You can create a lifecycle rule in the AWS Management Console:

Archiving
Every day, S3 will evaluate the lifecycle policies for each of your buckets and will archive objects in Glacier as appropriate. After the object has been successfully archived using the Glacier storage option, the object’s data will be removed from S3 but its index entry will remain as-is. The S3 storage class of an object that has been archived in Glacier will be set to GLACIER.

Listing
As with Amazon S3’s other storage options, all S3 objects that are stored using the Glacier option have an associated user-defined name. You can get a real-time list of all of your S3 object names, including those stored using the Glacier option, by using S3’s LIST API. If you list a bucket that contains objects that have been archived in Glacier, what will you see?

As I mentioned above, each S3 object has an associated storage class. There are three possible values:

  • STANDARD – 99.999999999% durability. S3’s default storage option.
  • RRS – 99.99% durability. S3’s Reduced Redundancy Storage option.
  • GLACIER – 99.999999999% durability, object archived in Glacier option.

If you archive objects using the Glacier storage option, you must inspect the storage class of an object before you attempt to retrieve it. The customary GET request will work as expected if the object is stored in S3 Standard or Reduced Redundancy (RRS) storage. It will fail (with a 403 error) if the object is archived in Glacier. In this case, you must use the RESTORE operation (described below) to make your data available in S3.

Retrieval
You use S3’s new RESTORE operation to access an object archived in Glacier. As part of the request, you need to specify a retention period in days. Restoring an object will generally take 3 to 5 hours. Your restored object will remain in both Glacier and S3’s Reduced Redundancy Storage (RRS) for the duration of the retention period. At the end of the retention period the object’s data will be removed from S3; the object will remain in Glacier.

Although the objects are archived in Glacier, you can’t get to them via the Glacier APIs. Objects stored directly in Amazon Glacier using the Amazon Glacier API cannot be listed in real-time, and have a system-generated identifier rather than a user-defined name.  Because Amazon S3 maintains the mapping between your user-defined object name and the Amazon Glacier system-defined identifier, Amazon S3 objects that are stored using the Amazon Glacier option are only accessible through the Amazon S3 API or the Amazon S3 Management Console.

Archiving in Action
We expect to see Amazon Glacier storage put to use in a variety of different ways. Toshiba’s Cloud & Solutions Division will be using it to store medical imaging. Tetsuro Muranaga, Chief Technology Executive of the division is very exciting about it. Here’s what he told us:

We currently provide a service enabling medical institutions to securely store patients medical images in Japan. We are excited about using Amazon Glacier through Amazon S3 to affordably and cost-effectively archive these images in large volumes for each of our customers.  We will combine Toshibas cloud computing technology with Amazon Glaciers low costs and Amazon S3s lifecycle policies to provide a unique offering tailored to the needs of medical institutions. In addition, we expect we can build similarly tailored integrated solutions for our wide range of customers so that they can archive massive amounts of data in various business areas.

Pricing
You will pay standard Glacier pricing for data stored using S3’s new Glacier storage option.

Learn More
Learn how to archive your Amazon S3 data to Glacier by reading the Object Lifecycle Management topic in the Amazon S3 Developer Guide or check out the new Archiving Amazon S3 Data to Amazon Glacier video:

— Jeff;

The AWS Report – Colin Lazier, Amazon Glacier

by Jeff Barr | on | in Amazon Glacier | | Comments

For today’s episode of The AWS Report, I spoke to Colin Lazier, a Senior Development Manager on the AWS Storage Team. Colin and I talked about Amazon Glacier and how it can be used to archive data for long periods of time. I learned that Glacier uses anti-entropy techniques to guard against data loss.

We also talked about Glacier’s retrieval model, and our expectation that third parties will build archiving and indexing tools around Glacier’s storage and retrieval functions.

Jeff