Amazon S3 Documentation

Amazon Simple Storage Service (Amazon S3) has various features you can use to organize and manage your data in ways that support specific use cases, enable cost efficiencies, enforce security, and meet compliance requirements. Data is stored as objects within resources called “buckets.” S3 features include capabilities to append metadata tags to objects, move and store data across the S3 Storage Classes, configure and enforce data access controls, help secure data against unauthorized users, run big data analytics, and monitor data at the object, bucket levels, and view storage usage and activity trends across your organization. Objects can be accessed through S3 Access Points or directly through the bucket hostname.

Storage management and monitoring

Amazon S3’s flat, non-hierarchical structure and various management features are helping customers of all sizes and industries organize their data in ways that are valuable to their businesses and teams. All objects are stored in S3 buckets and can be organized with shared names called prefixes. You can also append key-value pairs called S3 object tags to each object, which can be created, updated, and deleted throughout an object’s lifecycle. To keep track of objects and their respective tags, buckets, and prefixes, you can use an S3 Inventory report that lists your stored objects within an S3 bucket or with a specific prefix, and their respective metadata and encryption status. S3 Inventory can be configured to generate reports on a daily or a weekly basis.

Storage management

With S3 bucket names, prefixes, object tags, and S3 Inventory, you have a range of ways to categorize and report on your data, and subsequently can configure other S3 features to take action. S3 Batch Operations makes it simple, whether you store thousands of objects or a billion, to manage your data in Amazon S3 at any scale. With S3 Batch Operations, you can copy objects between buckets, replace object tag sets, modify access controls, and restore archived objects from Amazon S3 Glacier, with a single S3 API request or a few clicks in the Amazon S3 Management Console. You can also use S3 Batch Operations to run AWS Lambda functions across your objects to execute custom business logic, such as processing data or transcoding image files. To get started, select a source bucket and filters or specify a list of target objects by using an S3 Inventory report or by providing a custom list, and then select the desired operation from a pre-populated menu. When an S3 Batch Operation request is done, you will receive a notification and a completion report of all changes made.

Amazon S3 also supports features that help maintain data version control, prevent accidental deletions, and replicate data to the same or different AWS Region. S3 Versioning is designed to help you preserve, retrieve, and restore every version of an object stored in Amazon S3, which allows you to recover from unintended user actions and application failures. To help prevent accidental deletions, enable Multi-Factor Authentication (MFA) Delete on an S3 bucket. If you try to delete an object stored in an MFA Delete-enabled bucket, it will require two forms of authentication: your AWS account credentials and the concatenation of a valid serial number, a space, and the code displayed on an approved authentication device, like a hardware key fob or a Universal 2nd Factor (U2F) security key.

With S3 Replication, you can replicate objects (and their respective metadata and object tags) to one or more destination buckets into the same or different AWS Regions for reduced latency, compliance, security, disaster recovery, and other use cases. S3 Cross-Region Replication (CRR) can be configured to replicate from a source S3 bucket to one or more destination buckets in different AWS Region. Amazon S3 Same-Region Replication (SRR), replicates objects between buckets in the same AWS Region. Amazon S3 Replication Time Control (S3 RTC) helps you meet compliance requirements for data replication and provides visibility into Amazon S3 replication activity.

Amazon S3 Multi-Region Access Points can accelerate performance when accessing data sets that are replicated across multiple AWS Regions. Based on AWS Global Accelerator, S3 Multi-Region Access Points consider factors like network congestion and the location of the requesting application to dynamically route your requests over the AWS network to the lowest latency copy of your data. S3 Multi-Region Access Points provide a single global endpoint that you can use to access a replicated data set, spanning multiple buckets in S3. This allows you to build multi-region applications with the same simple architecture that you would use in a single region, and then to run those applications anywhere in the world.

You can also enforce write-once-read-many (WORM) policies with S3 Object Lock. This S3 management feature is designed to block object version deletion during a customer-defined retention period so that you can enforce retention policies as an added layer of data protection or to your meet compliance obligations. You can migrate workloads from existing WORM systems into Amazon S3, and configure S3 Object Lock at the object- and bucket-levels to prevent object version deletions prior to a pre-defined Retain Until Date or Legal Hold Date. Objects with S3 Object Lock retain WORM protection, even if they are moved to different storage classes with an S3 Lifecycle policy. To track what objects have S3 Object Lock, you can refer to an S3 Inventory report that includes the WORM status of objects.

Storage monitoring

In addition to these management capabilities, you can use S3 features and other AWS services to monitor and control how your S3 resources are being used. You can apply tags to S3 buckets in order to allocate costs across multiple business dimensions (such as cost centers, application names, or owners), and then use AWS Cost Allocation Reports to view usage and costs aggregated by the bucket tags. You can also use Amazon CloudWatch to track the operational health of your AWS resources and configure billing alerts that are sent to you when estimated charges reach a user-defined threshold. Another AWS monitoring service is AWS CloudTrail, which tracks and reports on bucket-level and object-level activities. You can configure S3 Event Notifications to trigger workflows, alerts, and invoke AWS Lambda when a specific change is made to certain S3 resources. S3 Event Notifications can be used to transcode media files as they are uploaded to Amazon S3, process data files as they become available, or synchronize objects with other data stores.

Storage analytics and insights

S3 Storage Lens

S3 Storage Lens delivers organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations you can use to improve cost-efficiency and apply data protection best practices. S3 Storage Lens is the first cloud storage analytics solution to provide a single view of object storage usage and activity across hundreds, or even thousands, of accounts in an organization, with drill-downs to generate insights at the account, bucket, or even prefix level. Drawing from years of experience helping customers optimize their storage, S3 Storage Lens analyzes organization-wide metrics to deliver contextual recommendations to find ways to help you reduce storage costs and apply best practices on data protection.

S3 Storage Class Analysis

Amazon S3 Storage Class Analysis analyzes storage access patterns to help you decide when to transition the right data to the right storage class. This feature can help you determine when to transition less frequently accessed storage to a lower-cost storage class. You can use the results to help improve your S3 Lifecycle policies. You can configure storage class analysis to analyze all the objects in a bucket. Or, you can configure filters to group objects together for analysis by common prefix, by object tags, or by both prefix and tags.

Storage classes

With Amazon S3, you can store data across a range of different S3 Storage Classes: S3 Standard, S3 Intelligent-Tiering, S3 Express One Zone, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, and S3 Outposts.

Every S3 Storage Class supports a specific data access level at corresponding costs or geographic location.

For data with changing, unknown, or unpredictable access patterns, such as data lakes, analytics, or new applications, use S3 Intelligent-Tiering, which is designed to optimize your storage costs. S3 Intelligent-Tiering is designed to move your data between three low latency access tiers optimized for frequent, infrequent, and rare access. When subsets of objects become archived over time, you can activate the archive access tier designed for asynchronous access.

For more predictable access patterns, you can store mission-critical production data in S3 Standard for frequent access, accelerate performance-critical applications by storing your most frequently accessed data in S3 Express One Zone, save costs by storing infrequently accessed data in S3 Standard-IA or S3 One Zone-IA, and archive data at the lowest costs in the archival storage classes — S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive. You can use S3 Storage Class Analysis to monitor access patterns across objects to discover data that should be moved to lower-cost storage classes. Then you can use this information to configure an S3 Lifecycle policy that makes the data transfer. S3 Lifecycle policies can also be used to expire objects at the end of their lifecycles.

If you have data residency requirements that can’t be met by an existing AWS Region, you can use the S3 Outposts storage class to store your S3 data on-premises using S3 on Outposts.

Access management and security

Access management

To help protect your data in Amazon S3, by default, users only have access to the S3 resources they create. You can grant access to other users by using one or a combination of the following access management features: AWS Identity and Access Management (IAM) to create users and manage their respective access; Access Control Lists (ACLs) to make individual objects accessible to authorized users; bucket policies to configure permissions for all objects within a single S3 bucket; S3 Access Points to simplify managing data access to shared data sets by creating access points with names and permissions specific to each application or sets of applications; S3 Access Grants to manage data permissions at scale by granting S3 access to end-users based on their corporate identity; and Query String Authentication to grant time-limited access to others with temporary URLs. Amazon S3 also supports Audit Logs that list the requests made against your S3 resources for visibility into who is accessing what data.

Security

Amazon S3 offers flexible security features designed to block unauthorized users from accessing your data. You may use VPC endpoints to connect to S3 resources from your Amazon Virtual Private Cloud (Amazon VPC) and from on-premises. Amazon S3 supports both server-side encryption (with various key management options) and client-side encryption for data uploads. You may use S3 Inventory to check the encryption status of your S3 objects.

S3 Block Public Access is a set of security controls that helps you configure your S3 buckets and objects to block public access. With a few clicks in the Amazon S3 Management Console, you can apply the S3 Block Public Access settings to all buckets within your AWS account or to specific S3 buckets. Once the settings are applied to an AWS account, any existing or new buckets and objects associated with that account inherit the settings that prevent public access. S3 Block Public Access settings override other S3 access permissions, making it easy for the account administrator to enforce a “no public access” policy regardless of how an object is added, how a bucket is created, or if there are existing access permissions. S3 Block Public Access controls are auditable, provide a further layer of control, and use AWS Trusted Advisor bucket permission checks, AWS CloudTrail logs, and Amazon CloudWatch alarms. You should enable Block Public Access for all accounts and buckets that you do not want publicly accessible.

S3 Object Ownership is a feature that disables Access Control Lists (ACLs), changing ownership for all objects in a bucket to the bucket owner and helping simplify access management for data stored in S3. When you configure the S3 Object Ownership Bucket owner enforced setting, ACLs will no longer affect permissions for your bucket and the objects in it. The services is designed so that access control will be defined using resource-based policies, user policies, or some combination of these.

Using S3 Access Points that are restricted to a Virtual Private Cloud (VPC), you can easily firewall your S3 data within your private network. Additionally, you can use AWS Service Control Policies to require that any new S3 Access Point in your organization is restricted to VPC-only access.

IAM Access Analyzer for S3 is a feature that helps you simplify permissions management as you set, verify, and refine policies for your S3 buckets and access points. Access Analyzer for S3 is designed to monitor your existing bucket access policies to help you verify that they provide only the required access to your S3 resources. Access Analyzer for S3 is designed to evaluate your bucket access policies to help you remediate any buckets with access that is not required. When reviewing results that show potentially shared access to a bucket, you can Block Public Access to the bucket with a single click in the S3 console. For auditing purposes, you can download Access Analyzer for S3 findings as a CSV report. Additionally, the S3 console is designed to report security warnings, errors, and suggestions from IAM Access Analyzer as you author your S3 policies. The console is designed to run more than 100 policy checks to validate your policies. These checks can help save you time, guide you to resolve errors, and apply security best practices.

IAM makes it easier for you to analyze access and reduce permissions to achieve least privilege by providing the timestamp when a user or role last used S3 and the associated actions. Use this “last accessed” information to analyze S3 access, identify unused permissions, and remove them.

AWS PrivateLink for S3 provides private connectivity between Amazon S3 and on-premises. You can provision interface VPC endpoints for S3 in your VPC to connect your on-premises applications directly with S3 over AWS Direct Connect or AWS VPN. Requests to interface VPC endpoints for S3 are automatically routed to S3 over the Amazon network. You can set security groups and configure VPC endpoint policies for your interface VPC endpoints for additional access controls.

Data processing

S3 Object Lambda

With S3 Object Lambda you can add your own code to S3 GET requests to modify and process data as it is returned to an application. You can use custom code to modify the data returned by standard S3 GET requests to filter rows, dynamically resize images, redact confidential data, and much more. Powered by AWS Lambda functions, your code runs on infrastructure that is fully managed by AWS, eliminating the need to create and store derivative copies of your data or to run expensive proxies, all with no changes required to applications.

S3 Object Lambda uses AWS Lambda functions to process the output of a standard S3 GET request. AWS Lambda is a serverless compute service that runs customer-defined code without requiring management of underlying compute resources. With just a few clicks in the AWS Management Console, you can configure a Lambda function and attach it to a S3 Object Lambda Access Point. From that point forward, S3 will automatically call your Lambda function to process any data retrieved through the S3 Object Lambda Access Point, returning a transformed result back to the application. You can author and execute your own custom Lambda functions, tailoring S3 Object Lambda’s data transformation to your specific use case.

Query in place

Amazon S3 has a built-in feature and complimentary services that query data without needing to copy and load it into a separate analytics platform or data warehouse. This means you can run big data analytics directly on your data stored in Amazon S3. S3 Select is an S3 feature designed to increase query performance by up to 400%, and reduce querying costs by as much as 80%. It works by retrieving a subset of an object’s data (using simple SQL expressions) instead of the entire object.

Amazon S3 is also compatible with AWS analytics services Amazon Athena and Amazon Redshift Spectrum.

Data transfer

AWS provides a portfolio of data transfer services to provide the right solution for any data migration project. The level of connectivity is a major factor in data migration, and AWS has offerings that can address your hybrid cloud storage, online data transfer, and offline data transfer needs.

Hybrid cloud storage: AWS Storage Gateway is a hybrid cloud storage service that lets you seamlessly connect and extend your on-premises applications to AWS Storage. Customers use Storage Gateway to replace tape libraries with cloud storage, provide cloud storage-backed file shares, or create a low-latency cache to access data in AWS for on-premises applications.

Online data transfer: AWS DataSync makes it easy to transfer large data sets into Amazon S3. DataSync allows you to automate many manual tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization. Additionally, you can use AWS DataSync to copy objects between a bucket on S3 on Outposts and a bucket stored in an AWS Region. The AWS Transfer Family provides fully managed file transfer to Amazon S3 using SFTP, FTPS, and FTP. Amazon S3 Transfer Acceleration is designed for fast transfers of files over long distances between your client and your Amazon S3 bucket.

Offline data transfer: The AWS Snow Family is purpose-built for use in edge locations where network capacity is constrained or nonexistent and provides storage and computing capabilities in harsh environments. The AWS Snowball service uses ruggedized, portable storage and edge computing devices for data collection, processing, and migration. Customers can ship the physical Snowball device for offline data migration to AWS. AWS Snowmobile is an exabyte-scale data transfer service used to move massive volumes of data to the cloud, including video libraries, image repositories, or even a complete data center migration.

Customers can also work with third-party providers from the AWS Partner Network (APN) to deploy hybrid storage architectures, integrate Amazon S3 into existing applications and workflows, and transfer data to and from the AWS Cloud.

Performance

Amazon S3 provides industry leading performance for cloud object storage. Amazon S3 supports parallel requests, which means you can scale your S3 performance by the factor of your compute cluster, without making any customizations to your application. Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes. Amazon S3 performance supports at least 3,500 requests per second to add data and at least 5,500 requests per second to retrieve data. Each S3 prefix can support these request rates, making it simple to increase performance significantly.

To achieve this S3 request rate performance you do not need to randomize object prefixes to achieve faster performance. That means you can use logical or sequential naming patterns in S3 object naming without any performance implications.

Consistency

Amazon S3 is designed to deliver strong read-after-write consistency for all applications, without changes to performance or availability, without sacrificing regional isolation for applications, and at no additional cost. With strong consistency, S3 is designed to simplify the migration of on-premises analytics workloads by removing the need to make changes to applications, and can help you reduce costs by removing the need for extra infrastructure to provide strong consistency.

Any request for S3 storage is designed to be strongly consistent. After a successful write of a new object or an overwrite of an existing object, any subsequent read request immediately receives the latest version of the object. S3 is also designed to provide strong consistency for list operations, so after a write, you can immediately perform a listing of the objects in a bucket with any changes reflected.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.