Amazon S3 stores data as objects within resources called "buckets". You can store as many objects as you want within a bucket, and write, read, and delete objects in your bucket. Objects can be up to 5 terabytes in size.
You can control access to both the bucket and the objects (who can create, delete, and retrieve objects in the bucket for example), view access logs for the bucket and its objects, and choose the AWS Region where a bucket is stored to optimize for latency, minimize costs, or address regulatory requirements.
You can select from four different storage classes to store your data in Amazon S3: S3 Standard, S3 Standard-IA, S3 One Zone-IA, and Amazon Glacier. You can learn more about each of these storage classes on the Amazon S3 Storage Classes page. Objects may be automatically moved between storage classes using Lifecycle Management Policies.
Security & Access Management
By default, only bucket and object owners have access to the Amazon S3 resources they create. S3 supports multiple access control mechanisms, as well as encryption for both secure transit and secure storage at rest. With Amazon S3’s data protection features, you can help protect your data from both logical and physical failures, and guard against data loss from unintended user actions, application errors, and infrastructure failures. For customers who must comply with regulatory standards such as PCI and HIPAA, Amazon S3’s data protection features can be used as part of an overall strategy to achieve compliance. The various data security and reliability features offered by Amazon S3 are described in detail below.
Flexible Access Control Mechanisms
Amazon S3 supports several mechanisms that give you flexibility to control who can access your data, as well as how, when, and where they can access it. Amazon S3 provides four different access control mechanisms: AWS Identity and Access Management (IAM) policies, Access Control Lists (ACLs), bucket policies, and Query String Authentication. IAM enables organizations to create and manage multiple users under a single AWS account. With IAM policies, you can grant IAM users fine-grained control to your Amazon S3 bucket or objects. You can use ACLs to selectively add (grant) certain permissions on individual objects. Amazon S3 bucket policies can be used to add or deny permissions across some or all of the objects within a single bucket. With Query String Authentication, you have the ability to share Amazon S3 objects through URLs that are valid for a specified period of time.
The S3 console highlights your publicly accessible S3 buckets and also warns you if your changes to bucket policies and bucket ACLs will make that bucket publicly accessible.
You can access Amazon S3 from your Amazon Virtual Private Cloud (Amazon VPC) using VPC endpoints. VPC endpoints are easy to configure and provide reliable connectivity to Amazon S3 without requiring an Internet gateway or a Network Address Translation (NAT) instance. With VPC endpoints, the data between an Amazon VPC and Amazon S3 is transferred within the Amazon network, helping protect your instances from Internet traffic. Amazon VPC endpoints for Amazon S3 provide multiple levels of security controls to help limit access to S3 buckets. First, you can require that requests to your Amazon S3 buckets originate from a VPC using a VPC endpoint. Additionally, you can control what buckets, requests, users, or groups are allowed through a specific VPC endpoint.
You can securely upload or download your data to Amazon S3 via the SSL-encrypted endpoints using the HTTPS protocol. Amazon S3 can automatically encrypt your data at rest and gives you several choices for key management. You can configure your S3 buckets to automatically encrypt objects before storing them in S3 if the incoming storage requests do not have the encryption information. Alternatively, you can use a client encryption library such as the Amazon S3 Encryption Client to encrypt your data before uploading to Amazon S3.
If you choose to have Amazon S3 encrypt your data at rest with server-side encryption (SSE), Amazon S3 will automatically encrypt your data on write and decrypt your data on retrieval. When Amazon S3 SSE encrypts data at rest, it uses Advanced Encryption Standard (AES) 256-bit symmetric keys. If you choose server-side encryption with Amazon S3, there are three ways to manage the encryption keys.
SSE with Amazon S3 Key Management (SSE-S3)
With SSE-S3, Amazon S3 will encrypt your data at rest and manage the encryption keys for you.
SSE with Customer-Provided Keys (SSE-C)
With SSE-C, Amazon S3 will encrypt your data at rest using the custom encryption keys that you provide. To use SSE-C, simply include your custom encryption key in your upload request, and Amazon S3 encrypts the object using that key and securely stores the encrypted data at rest. Similarly, to retrieve an encrypted object, provide your custom encryption key, and Amazon S3 decrypts the object as part of the retrieval. Amazon S3 doesn’t store your encryption key anywhere; the key is immediately discarded after Amazon S3 completes your requests.
SSE with AWS KMS (SSE-KMS)
With SSE-KMS, Amazon S3 will encrypt your data at rest using keys that you manage in the AWS Key Management Service (KMS). Using AWS KMS for key management provides several benefits. With AWS KMS, there are separate permissions for the use of the master key, providing an additional layer of control, as well as protection against unauthorized access to your object stored in Amazon S3. AWS KMS provides an audit trail so you can see who used your key to access which object and when, as well as view failed attempts to access data from users without permission to decrypt the data. Additionally, AWS KMS provides additional security controls to support customer efforts to comply with PCI-DSS, HIPAA/HITECH, and FedRAMP industry requirements.
Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.
For more information on the security features available in Amazon S3, please refer to the Access Control topic in the Amazon S3 Developer Guide. For an overview of security on AWS, including Amazon S3, please refer to the Amazon Web Services: Overview of Security Processes document.
Amazon S3 provides further protection with versioning capability. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. This allows you to easily recover from both unintended user actions and application failures. By default, requests will retrieve the most recently written version. Older versions of an object can be retrieved by specifying a version in the request. Storage rates apply for every version stored. You can configure S3 Lifecycle rules to automatically control the lifetime and cost of storing multiple versions.
Multi-Factor Authentication Delete
Amazon S3 provides additional security with Multi-Factor Authentication (MFA) Delete. When enabled, this feature requires the use of a multi-factor authentication device to delete objects stored in Amazon S3 to help protect previous versions of your objects.
By enabling MFA Delete on your Amazon S3 bucket, you can only change the versioning state of your bucket or permanently delete an object version when you provide two forms of authentication together:
- Your AWS account credentials
- The concatenation of a valid serial number, a space, and the six-digit code displayed on an approved authentication device
Time-Limited Access to Objects
Amazon S3 supports query string authentication, which allows you to provide a URL that is valid only for a length of time that you define. This time-limited URL can be useful for scenarios such as software downloads or other applications where you want to restrict the length of time users have access to an object.
Automated, Machine Learning-Powered Security
Amazon Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, and provides you with dashboards and alerts that give visibility into how this data is being accessed or moved. The fully managed service continuously monitors data access activity for anomalies, and generates detailed alerts when it detects risk of unauthorized access or inadvertent data leaks.
Query in Place
Amazon has a suite of tools that make analyzing and processing large amounts of data in the cloud faster, including ways to optimize and integrate existing workflows with Amazon S3.
Amazon S3 Select is designed to help analyze and process data within an object in Amazon S3 buckets, faster and cheaper. It works by providing the ability to retrieve a subset of data from an object in Amazon S3 using simple SQL expressions. Your applications no longer have to use compute resources to scan and filter the data from an object, potentially increasing query performance by up to 400%, and reducing query costs as much as 80%. You simply change your application to use SELECT instead of GET to take advantage of S3 Select.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL expressions. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL expressions. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
Amazon Redshift Spectrum
Amazon Redshift also includes Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required, and you can use open data formats, including Avro, CSV, Grok, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 run fast, regardless of data set size.
Amazon S3 makes it easy to manage your data by giving you actionable insight to your data usage patterns and the tools to manage your storage with management policies. All of these management capabilities can be easily administered using the Amazon S3 APIs or the AWS Management Console. The various data management features offered by Amazon S3 are described in detail below.
S3 Object Tagging
With Amazon S3 Object Tagging, you can manage and control access for Amazon S3 objects. S3 object tags are key-value pairs applied to S3 objects which can be created, updated, or deleted at any time during the lifetime of the object. With these, you’ll have the ability to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics. These object-level tags can then manage transitions between storage classes and expire objects in the background.
You can simplify and speed up business workflows and Big Data jobs using S3 Inventory, which provides a scheduled alternative to Amazon S3's synchronous List API. S3 Inventory provides a CSV (Comma Separated Values) or ORC (Optimized Row Columnar) output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or prefix. S3 Inventory also makes it easy for you to audit and report on object encryption status for your business, compliance, and regulatory needs.
Storage Class Analysis
With Storage Class Analysis, you can monitor the access frequency of the objects within your S3 bucket in order to transition less frequently accessed storage to a lower cost storage class. Storage Class Analysis observes usage patterns to detect infrequently accessed storage to help you transition the right objects from the S3 Standard storage class to the S3 Standard-IA, S3 One Zone-IA, or Amazon Glacier storage classes. You can configure a Storage Class Analysis policy to monitor an entire bucket, a prefix, or object tag. Once Storage Class Analysis detects that data is a candidate for transition to another storage class, you can easily create a new S3 Lifecycle policy based on these results. This feature also includes a detailed daily analysis of your storage usage at the specified bucket, prefix, or tag level that you can export to a S3 bucket.
Amazon CloudWatch Metrics for Amazon S3
Amazon S3’s integration with Amazon CloudWatch helps you improve your end-user experience by providing integrated monitoring and alarming on a host of different metrics. You can receive 1-minute CloudWatch Metrics, set CloudWatch alarms, and access CloudWatch dashboards to view real-time operations and performance of your Amazon S3 storage. For web and mobile applications that depend on cloud storage, these let you quickly identify and act on operational issues. These 1-minute metrics are available at the S3 bucket level. Additionally, you have the flexibility to define a filter for the metrics collected using a shared prefix or object tag allowing you to align metrics filters to specific business applications, workflows, or internal organizations.
AWS CloudTrail Management & Data Events for Amazon S3
You can use AWS CloudTrail to capture bucket-level (Management Events) and object-level API activity (Data Events) on S3 objects. Data Events include read operations such as GET, HEAD, and Get Object ACL, as well as write operations such as PUT and POST. The detail captured provides support for many types of security, auditing, governance, and compliance use cases. Visit the AWS CloudTrail page for more information on S3 Data Events.
Amazon S3 Data Lifecycle Management
Amazon S3 can automatically assign and change cost and performance characteristics as your data evolves. It can even automate common data lifecycle management tasks, including capacity provisioning, automatic migration to lower cost tiers, regulatory compliance policies, and eventual scheduled deletions.
As your data ages, Amazon S3 automatically and transparently migrates your data to new hardware as hardware fails or reaches its end of life. This eliminates the need for you to perform expensive, time-consuming, and risky hardware migrations. You can set S3 Lifecycle policies direct to Amazon S3 to automatically migrate your data to lower cost storage as your data ages. You can define rules to automatically migrate Amazon S3 objects to the S3 Standard-IA, S3 One Zone-IA, or Amazon Glacier storage classes based on the age of the data. You can set lifecycle policies by bucket, prefix, or objects tags, allowing you to specify the granularity most suited to your use case.
When your data reaches its end of life, Amazon S3 provides programmatic options for recurring and high-volume deletions. For recurring deletions, rules can be defined to remove sets of objects after a predefined time period. These rules can be applied to objects stored in S3 Standard, S3 Standard-IA, or S3 One Zone-IA, and objects that have been archived to Amazon Glacier.
You can also define lifecycle rules on versions of your Amazon S3 objects to reduce storage costs. For example, you can create rules to automatically - and cleanly - delete older versions of your objects when these versions are no longer needed, saving money and improving performance. Alternatively, you can also create rules to automatically migrate older versions to S3 Standard-IA, S3 One Zone-IA, or Amazon Glacier in order to further reduce your storage costs.
Cross-region replication (CRR) makes it simple to replicate new objects into any other AWS Region for reduced latency, compliance, security, disaster recovery, and a number of other use cases. CRR replicates every object uploaded to your source bucket to a destination bucket in a different AWS Region that you choose. The metadata, ACLs, and object tags associated with the object are also part of the replication. Once you configure CRR on your source bucket, any changes to the data, metadata, ACLs, or object tags on the object trigger a new replication to the destination bucket.
CRR is a bucket-level configuration and you enable CRR on your bucket by specifying a destination bucket in a different AWS Region. With CRR, you can select any AWS commercial Region as the target region or any S3 storage class for your replicated storage according to your need. You can set up CRR across accounts and have a distinctly different ownership stack between the source and destination. You can create these settings using either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs. Versioning must be turned on for both the source and destination buckets to enable CRR.
Cost Monitoring and Controls
Amazon S3 offers several features for managing and controlling your costs. You can use the AWS Management Console or the Amazon S3 APIs to apply tags to your Amazon S3 buckets, enabling you to allocate your costs across multiple business dimensions, including cost centers, application names, or owners. You can then view breakdowns of these costs using Amazon Web Services’ Cost Allocation Reports, which show your usage and costs aggregated by your bucket tags. For more information on Cost Allocation and tagging, please visit About AWS Account Billing. For more information on tagging your Amazon S3 buckets, please see the Bucket Tagging topic in the Amazon S3 Developer Guide.
You can use Amazon CloudWatch to receive S3 billing alerts that help you monitor the Amazon S3 charges on your bill. You can set up an alert to be notified automatically via e-mail when estimated charges reach a threshold that you choose. For additional information on billing alerts, you can visit the billing alerts page or see the Monitor Your Estimated Charges topic in the Amazon CloudWatch Developer Guide.
Amazon S3 Event Notifications can be sent in response to actions taken on objects uploaded or stored in Amazon S3. Notification messages can be sent through either Amazon SNS or Amazon SQS, or delivered directly to AWS Lambda to invoke AWS Lambda functions.
Amazon S3 event notifications enable you to run workflows, send alerts, or perform other actions in response to changes in your objects stored in Amazon S3. You can use Amazon S3 Event Notifications to set up triggers to perform actions including transcoding media files when they are uploaded, processing data files when they become available, and synchronizing Amazon S3 objects with other data stores. You can also set up event notifications based on object name prefixes and suffixes. For example, you can choose to receive notifications on object names that start with “images/." It may also be used to keep a secondary index of Amazon S3 objects in sync.
Amazon S3 Event Notifications are set up at the bucket level, and you can configure them through the Amazon S3 console, through the REST API, or by using an AWS SDK.
Transferring Large Amounts of Data
Amazon has a suite of data migration tools that make migrating data into the cloud faster, including ways to optimize or replace your network, and ways to integrate existing workflows with S3.
S3 Transfer Acceleration
Amazon S3 Transfer Acceleration is designed to maximize transfer speeds to Amazon S3 buckets over long distances. It works by carrying HTTP and HTTPS traffic over a highly optimized network bridge that runs between the AWS Edge Location nearest to your clients and your Amazon S3 bucket. There are no gateway servers to manage, no firewalls to open, no special ports or clients to integrate, or upfront fees to pay. You simply change the Amazon S3 endpoint that your application uses to transfer data and acceleration is automatically applied. Use S3 Transfer Acceleration if you:
- Need faster uploads from clients that are located far away from your bucket, for instance across countries or continents.
- Have clients located outside of your own data centers, who rely on the public internet to reach Amazon S3. For clients inside your own data centers, consider AWS Direct Connect.
AWS Snowball, Snowball Edge, and Snowmobile
From petabytes to exabytes, AWS data migration services use secure devices to transfer large amounts of data into and out of Amazon S3. AWS Snowball, AWS Snowball Edge, and AWS Snowmobile address common challenges with large-scale data transfers, including high network costs, long transfer times, and security concerns. Transferring data with is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.
AWS Storage Gateway
Data or storage systems that exist on-premises can be easily linked to Amazon S3 using the AWS Storage Gateway for hybrid cloud storage. This means your existing systems, software, processes, and data can be streamlined into the cloud for backup, migration, tiering, or bursting with minimal disruption.
3rd Party Partner Integration
A number of ISV partners are integrated with Amazon S3 for simplified data transfer and retrieval. Visit the AWS Storage Partner Solutions page for a list of approved AWS partner solutions.
Intended Usage and Restrictions
Your use of this service is subject to the Amazon Web Services Customer Agreement.