Data is stored in Amazon S3 Glacier in "archives." An archive can be comprised of any data such as photos, videos, or documents. You can upload a single file as an archive or aggregate multiple files into a TAR or ZIP file and upload as one archive.
A single archive can be as large as 40 terabytes. You can store an unlimited number of archives and an unlimited amount of data in Amazon S3 Glacier. Each archive is assigned a unique archive ID at the time of creation, and the content of the archive is immutable, meaning that after an archive is created it cannot be updated.
Amazon S3 Glacier uses "vaults" as containers to store archives. You can view a list of your vaults in the AWS Management Console and use the AWS SDKs to perform a variety of vault operations such as create vault, delete vault, lock vault, list vault metadata, retrieve vault inventory, tag vaults for filtering and configure vault notifications. You can also set access policies for each vault to grant or deny specific activities to users. Under a single AWS account, you can have up to 1000 vaults.
Data retrieval features
Amazon S3 Glacier provides three retrieval features for your archives to meet varying access time and cost requirements: Expedited, Standard, and Bulk retrievals. Archives requested using Expedited retrievals are typically available within 1 – 5 minutes, allowing you to quickly access your data when occasional urgent requests for a subset of archives are required. With Standard retrievals, archives typically become accessible within 3 – 5 hours. Or you can use Bulk retrievals to cost-effectively access significant portions of your data, even petabytes, for just a quarter-of-a-cent per GB.
Amazon S3 Glacier Select
Amazon S3 Glacier Select allows queries to run directly on data stored in Amazon S3 Glacier without having to retreive the entire archive. Amazon S3 Glacier Select changes the value of archive storage by allowing you to process and find only the bytes you need out of the archive to use for analytics.
Now, your analytics application can call the Amazon S3 Glacier Select API to retrieve only the relevant data for your query from the Amazon S3 Glacier archive. Amazon S3 Glacier Select will soon integrate with Amazon Athena and Amazon Redshift Spectrum so you can now consider S3 Glacier archives a part of your data lake.
Prior to S3 Glacier Select, an Amazon S3 Glacier archive had to be completely restored before the data could be used. Now customers can use S3 Glacier Select to lower their costs and uncover more insights from their archive data.
AWS Snowball and Direct Connect integration
AWS Snowball can accelerate movement of large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers your data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the Internet. For significant data sets, AWS Snowball is often faster than Internet transfer and more cost effective than upgrading your connectivity. You can use AWS Snowball for migrating data into the cloud, distributing content to your customers, sending backups to AWS, and disaster recovery.
AWS Direct Connect makes it easy to establish a high-bandwidth, dedicated network connection from your premises to AWS. With AWS Direct Connect, you can transfer your business critical data directly from your datacenter into AWS, bypassing your Internet service provider and removing network congestion. Further, AWS Direct Connect makes it easy to scale your connection to meet your data transfer needs. AWS Direct Connect provides 1 Gbps and 10 Gbps connections, and you can easily provision multiple connections if you need more capacity.
Amazon S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls on individual S3 Glacier vaults via a lockable policy. You can specify controls such as “Write Once Read Many” (WORM) in a Vault Lock policy and lock the policy from future edits. Once locked, the policy becomes immutable and Amazon S3 Glacier will enforce the prescribed controls to help achieve your compliance objectives. To learn more, please read Amazon S3 Glacier Vault Lock in the Amazon S3 Glacier developer’s guide.
Amazon S3 Glacier uses AWS Identity and Access Management (IAM) to help you securely control access to AWS and your Amazon S3 Glacier data. You can create users in IAM, assign individual security credentials (i.e., access keys, passwords, and multi-factor authentication devices) and IAM policies on each Amazon S3 Glacier vault to grant permitted activities to intended users.
Amazon S3 Glacier allows you to tag your S3 Glacier vaults for easier resource and cost management. Tags are labels that you can define and associate with your vaults, and using tags adds filtering capabilities to operations such as AWS cost reports. For example, you can use tags to allocate S3 Glacier costs and usage across multiple departments in your organization or by any other categorization. For more information see Tagging Your Amazon S3 Glacier Vaults.
Amazon S3 Glacier supports audit logging with AWS CloudTrail, which records Amazon S3 Glacier API calls for your account and delivers these log files to you. These log files provide visibility into actions performed on your Amazon S3 Glacier assets. For instance, you can determine which users have accessed a vault over the last month or identify who deleted a particular archive and when. Using audit logging can help you implement compliance and governance objectives for your cloud-based archival system. To learn more read Using Audit Logging with Amazon S3 Glacier.
Vault access policies
Vault access policies allow you to easily manage access to your individual S3 Glacier vaults. You can define an access policy directly on a vault to grant vault access to users and business groups internal to your organization, as well as to your external business partners. To learn more please read Managing Vault Access Policies in the Amazon S3 Glacier developer’s guide.
Amazon S3 Glacier maintains an inventory of all archives in each of your vaults for disaster recovery or occasional reconciliation. The vault inventory is updated approximately once a day. You can request a vault inventory as either a JSON or CSV file which will contain details about the archives including the size, creation date, and the archive description if provided during upload. The inventory will represent the state of the vault as of the most recent inventory update.
Data retrieval policies
Amazon S3 Glacier data retrieval policies let you define your own data retrieval limits with a few clicks in the AWS console. You can limit retrievals to “Free Tier Only”, or if you wish to retrieve more than the free tier, you can specify a “Max Retrieval Rate” to limit your retrieval speed and establish a retrieval cost ceiling. In both cases, Amazon S3 Glacier will not accept retrieval requests that would exceed the retrieval limits you defined. To learn more please read Configuring Data Retrieval Policies in the Amazon S3 Glacier developer’s guide.
AWS Management Console
Amazon S3 Glacier can be accessed using the AWS Management Console, an easy-to-use web interface that provides the capability to create vaults, configure vault-level access permissions, and set up SNS notifications for data retrieval. The console also presents a storage usage summary for each vault as well as the last refresh time for the vault inventory.
AWS software development kits (SDKs)
Data upload and retrieval are done using the AWS SDKs or the underlying Amazon S3 Glacier API. Amazon S3 Glacier is supported by the AWS SDKs for Java, .NET, PHP, and Python (Boto). The SDK libraries wrap the underlying Amazon S3 Glacier API, simplifying your programming tasks. These SDKs provide libraries that map to an underlying REST API and enable you to easily construct requests and process responses. The AWS SDKs for Java and .NET offer high-level and low-level API libraries.
The low-level wrapper libraries map closely to the underlying Amazon S3 Glacier API and provide the most complete implementation of the underlying Amazon S3 Glacier operations.
The high-level APIs further simplify application development with a higher-level of abstraction for some of the operations. For example, when uploading an archive, the high-level API will automatically compute the checksum for you.
Integrated lifecycle management with Amazon S3
Amazon S3 Glacier works together with Amazon S3 lifecycle rules to help you automate archiving of Amazon S3 data and reduce your overall storage costs. You can easily set up a rule that stores all your previous Amazon S3 object versions in the lower cost S3 Glacier storage class and deletes them from S3 Glacier storage after 100 days. This example would provide a 100-day window to roll back any changes made to your data and automatically lower your storage costs. For more information about lifecycle configuration and transitioning objects to Amazon S3 Glacier, go to Object Lifecycle Management in the Amazon Simple Storage Service Developer Guide.
Protecting your data
Data stored in Amazon S3 Glacier is protected by default; only vault owners have access to the Amazon S3 Glacier resources they create. Amazon S3 Glacier encrypts your data at rest by default and supports secure data transit with SSL. It also supports access control mechanisms with Identity and Access Management (IAM) policies. With Amazon S3 Glacier’s data protection features, you can protect your data from both logical and physical failures, guarding against data loss from unintended user actions, application errors, and infrastructure breakdown. For customers who must comply with regulatory standards such as PCI and HIPAA, Amazon S3 Glacier’s data protection features can be used as part of an overall strategy to achieve compliance. The various data security and reliability features offered by Amazon S3 Glacier are described in detail below.
Encryption by default
Amazon S3 Glacier automatically encrypts data at rest using Advanced Encryption Standard (AES) 256-bit symmetric keys and supports secure transfer of your data over Secure Sockets Layer (SSL).
Data stored in Amazon S3 Glacier is immutable, meaning that after an archive is created it cannot be updated. This ensures that data such as compliance and regulatory records cannot be altered after they have been archived.
Flexible access control with IAM policies
Amazon S3 Glacier supports Identity and Access Management (IAM) policies, which enables organizations with multiple employees to create and manage multiple users under a single AWS account. With IAM policies, you create fine-grained policies to control to your Amazon S3 Glacier vaults. You can write IAM policies to selectively grant or revoke certain permissions and actions on each Amazon S3 Glacier vault.
Mandatory request signing
Amazon S3 Glacier requires all requests to be signed for authentication protection. To sign a request, you calculate a digital signature using a cryptographic hash function that returns a hash value that you include in the request as your signature. After receiving your request, Amazon S3 Glacier recalculates the signature using the same hash function and input that you used to sign the request before processing the request.
Data durability and reliability
Amazon S3 Glacier provides a highly durable storage infrastructure designed for long-term data archival storage. It is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple AWS Availability Zones (AZ) and on multiple devices within each AZ. To increase durability, Amazon S3 Glacier synchronously stores your data across multiple AZs before confirming a successful upload.
To prevent corruption of data packets over the wire, Amazon S3 Glacier uploads the checksum of the data during data upload. It compares the received checksum with the checksum of the received data to detect bit flips over the wire. Similarly, it validates data authenticity with checksums during data retrieval. Unlike traditional systems, that can require laborious data verification and manual repair, Amazon S3 Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.
Managing your data
Archive operations in Amazon S3 Glacier
Amazon S3 Glacier supports the following archive operations: Upload, Download, and Delete. Archives are immutable and cannot be modified.
Uploading an archive to Amazon S3 Glacier
Uploading an archive is a synchronous operation. You can upload an archive in a single operation or upload larger archives in parts with the MultipartUpload API to improve throughput and fault tolerance. You can upload archives as small as 1 byte and as large as 40 TB. You will receive a unique archive ID once the archive has been durably stored. For more information, see Uploading an Archive in Amazon S3 Glacier for recommendations on when to use MultipartUpload to improve throughput.
Downloading an archive from Amazon S3 Glacier
Downloading an archive is an asynchronous operation. You must first initiate a retrieval job of a specific archive. After receiving the job request, Amazon S3 Glacier prepares your archive for download. After the job completes, you have 24 hours to download the data from the staging location.
Deleting an archive in Amazon S3 Glacier
To delete an archive you need to use the Amazon S3 Glacier REST API or the AWS SDKs and specify the archive ID. You can also use a number of third party tools to delete archives. For more information, see Deleting an Archive in Amazon S3 Glacier.
Intended usage and restrictions
Your use of this service is subject to the Amazon Web Services Customer Agreement.