Amazon Glacier Select Makes Big Data Analytics of Archive Data Possible

Posted on: Nov 29, 2017

Amazon Glacier Select is a new way to query archived data in Amazon Glacier. Glacier Select allows queries to run directly on data stored in Amazon Glacier, retrieving only the data you need out of your archives to use for analytics. This allows you to reduce total cost of ownership while massively extending your data lake into cost-effective archive storage.

With Amazon Glacier Select, you can now provide a SQL query and an Amazon Glacier archive where you want the query to be applied. You specify how soon you need results based on three options: Expedited Retrievals take 1-5 minutes, Standard Retrievals take 3-5 hours, and Bulk Retrievals take up to 12 hours. You are notified when a query is complete with Amazon Simple Notification Service (SNS), and you can specify the Amazon S3 bucket where you want the output results to be stored.

Using Amazon Glacier Select, you can now perform operations like auditing and pattern matching easily, over large amounts of data, which may be archived in Amazon Glacier. For example, you can use Amazon Glacier Select to find and retrieve only records matching a particular account or only billing data for a particular customer. You can also integrate Amazon Glacier Select APIs in your application, where it can be used to expand query over archive capability to many more use cases like machine learning and Big Data.

Amazon Glacier Select is generally available today in all AWS commercial regions where Amazon Glacier is offered. To learn more about Amazon Glacier Select, visit the Amazon Glacier details page.