Storage management and monitoring
Storage management and monitoring
Amazon S3’s flat, non-hierarchical structure and various management features are helping customers of all sizes and industries organize their data in ways that are valuable to their businesses and teams. All objects are stored in S3 buckets and can be organized with shared names called prefixes. You can also append up to 10 key-value pairs called S3 object tags to each object, which can be created, updated, and deleted throughout an object’s lifecycle. To keep track of objects and their respective tags, buckets, and prefixes, you can use an S3 Inventory report that lists your stored objects within an S3 bucket or with a specific prefix, and their respective metadata and encryption status. S3 Inventory can be configured to generate reports on a daily or a weekly basis.
Storage management
With S3 bucket names, prefixes, object tags, S3 Metadata (Preview), and S3 Inventory, you have a range of ways to categorize and report on your data, and subsequently can configure other S3 features to take action. Whether you store thousands of objects or a billion, S3 Batch Operations makes it simple to manage your data in Amazon S3 at any scale. With S3 Batch Operations, you can copy objects between buckets, replace object tag sets, modify access controls, and restore archived objects from S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes, with a single S3 API request or a few steps in the S3 console. You can also use S3 Batch Operations to run AWS Lambda functions across your objects to run custom business logic, such as processing data or transcoding image files. To get started, select a source bucket and filters or specify a list of target objects by using an S3 Inventory report or by providing a custom list, and then select the desired operation from a pre-populated menu. When an S3 Batch Operation request is done, you'll receive a notification and a completion report of all changes made. Learn more about S3 Batch Operations by watching the video tutorials.
Amazon S3 Metadata (Preview) delivers queryable object metadata in near real-time to organize your data and accelerate data discovery. This helps you to curate, identify, and use your S3 data for business analytics, real-time inference applications, and more. S3 Metadata supports object metadata, which includes system-defined details like size and the source of the object, and custom metadata, which allows you to use tags to annotate your objects with information like product SKU, transaction ID, or content rating, for example. S3 Metadata is designed to automatically capture metadata from objects as they are uploaded into a bucket, and to make that metadata queryable in a read-only table. As data in your bucket changes, S3 Metadata updates the table within minutes to reflect the latest changes.
Amazon S3 also supports features that help maintain data version control, prevent accidental deletions, and replicate data to the same or a different AWS Region. With S3 Versioning, you can preserve, retrieve, and restore every version of an object stored in Amazon S3, which allows you to recover from unintended user actions and application failures. To prevent accidental deletions, enable Multi-Factor Authentication (MFA) Delete on an S3 bucket. If you try to delete an object stored in an MFA Delete-enabled bucket, it will require two forms of authentication: your AWS account credentials and the concatenation of a valid serial number, a space, and the six-digit code displayed on an approved authentication device, like a hardware key fob or a Universal 2nd Factor (U2F) security key.
With S3 Replication, you can replicate objects (and their respective metadata and object tags) to one or more destination buckets into the same or different AWS Regions for reduced latency, compliance, security, disaster recovery, and other use cases. You can configure S3 Cross-Region Replication (CRR) to replicate objects from a source S3 bucket to one or more destination buckets in different AWS Regions. S3 Same-Region Replication (SRR) replicates objects between buckets in the same AWS Region. While live replication like CRR and SRR automatically replicates newly uploaded objects as they are written to your bucket, S3 Batch Replication allows you to replicate existing objects. You can use S3 Batch Replication to backfill a newly created bucket with existing objects, retry objects that were previously unable to replicate, migrate data across accounts, or add new buckets to your data lake. Amazon S3 Replication Time Control (S3 RTC) helps you meet compliance requirements for data replication by providing an SLA and visibility into replication times.
To access replicated data sets in S3 buckets across AWS Regions and accounts, use Amazon S3 Multi-Region Access Points to create a single global endpoint for your applications and clients to use regardless of their location. This global endpoint allows you to build multi-Region applications with the same simple architecture you would use in a single Region, and then to run those applications anywhere in the world. Amazon S3 Multi-Region Access Points can accelerate performance by up to 60% when accessing data sets that are replicated across multiple AWS Regions and accounts. Based on AWS Global Accelerator, S3 Multi-Region Access Points consider factors like network congestion and the location of the requesting application to dynamically route your requests over the AWS network to the lowest latency copy of your data. Using S3 Multi-Region Access Points failover controls, you can failover between your replicated data sets across AWS Regions, allowing you to shift your S3 data request traffic to an alternate AWS Region within minutes.
You can also enforce write-once-read-many (WORM) policies with S3 Object Lock. This S3 management feature blocks object version deletion during a customer-defined retention period so that you can enforce retention policies as an added layer of data protection or to meet compliance obligations. You can migrate workloads from existing WORM systems into Amazon S3, and configure S3 Object Lock at the object- and bucket-levels to prevent object version deletions prior to a pre-defined Retain Until Date or Legal Hold Date. Objects with S3 Object Lock retain WORM protection, even if they are moved to different storage classes with an S3 Lifecycle policy. To track what objects have S3 Object Lock, you can refer to an S3 Inventory report that includes the WORM status of objects. S3 Object Lock can be configured in one of two modes. When deployed in Governance mode, AWS accounts with specific IAM permissions are able to remove S3 Object Lock from objects. If you require stronger immutability in order to comply with regulations, you can use Compliance Mode. In Compliance Mode, the protection cannot be removed by any user, including the root account.
Storage monitoring
In addition to these management capabilities, use Amazon S3 features and other AWS services to monitor and control your S3 resources. Apply tags to S3 buckets to allocate costs across multiple business dimensions (such as cost centers, application names, or owners), then use AWS Cost Allocation Reports to view the usage and costs aggregated by the bucket tags. You can also use Amazon CloudWatch to track the operational health of your AWS resources and configure billing alerts for estimated charges that reach a user-defined threshold. Use AWS CloudTrail to track and report on bucket- and object-level activities, and configure S3 Event Notifications to trigger workflows and alerts or invoke AWS Lambda when a specific change is made to your S3 resources. S3 Event Notifications automatically transcodes media files as they’re uploaded to S3, processes data files as they become available, and synchronizes objects with other data stores. Additionally, the latest AWS SDKs calculate a high-speed CRC-based checksum for all uploads. S3 independently verifies that checksum and only accepts objects after confirming that data integrity was maintained in transit over the public internet. If a version of the SDK that does not provide pre-calculated checksums is used to upload an object, S3 calculates a CRC-based checksum of the whole object, even for multipart uploads. Checksums are stored in object metadata and are therefore available to verify data integrity at any time. You can choose from one of five supported algorithms (CRC64NVME, CRC32, CRC32C, SHA-1, and SHA-256) for data integrity checking on upload and download depending on your application needs.
Storage analytics and insights
S3 Storage Lens
S3 Storage Lens delivers organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to improve cost-efficiency and apply data protection best practices. S3 Storage Lens is the first cloud storage analytics solution to provide a single view of object storage usage and activity across hundreds, or even thousands, of accounts in an organization, with drill-downs to generate insights at the account, bucket, or even prefix level. Drawing from more than 16 years of experience helping customers optimize their storage, S3 Storage Lens analyzes organization-wide metrics to deliver contextual recommendations to find ways to reduce storage costs and apply best practices on data protection.
S3 Storage Class Analysis
Amazon S3 Storage Class Analysis analyzes storage access patterns to help you decide when to transition the right data to the right storage class. This Amazon S3 feature observes data access patterns to help you determine when to transition less frequently accessed storage to a lower-cost storage class. You can use the results to help improve your S3 Lifecycle policies. You can configure storage class analysis to analyze all the objects in a bucket. Or, you can configure filters to group objects together for analysis by common prefix, by object tags, or by both prefix and tags. To learn more, visit the storage analytics and insights page.
Table storage
Amazon S3 Tables
Amazon S3 Tables deliver the first cloud object store with built-in open table format support, and the easiest way to store tabular data at scale. S3 Tables are specifically optimized for analytics workloads, resulting in up to 3x faster query performance and up to 10x higher transactions per second compared to self-managed tables. S3 Tables support the Apache Iceberg standard, and are easily queried by popular AWS and third-party query engines. Additionally, S3 Tables are designed to perform continual table maintenance to automatically optimize query efficiency and storage cost over time, even as your data lake scales and evolves. S3 Tables integration with AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize data—including S3 Metadata tables—using AWS analytics services such as Amazon Data Firehose, Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight.
S3 Tables use table buckets, a bucket type that is purpose-built to store tabular data. With table buckets, you can easily create tables and set up table-level permissions to manage access to your data lake. You can then load and query data in your tables with standard SQL, and take advantage of Apache Iceberg’s advanced analytics capabilities such as row-level transactions, queryable snapshots, schema evolution, and more. Table buckets also provide policy-driven table maintenance, helping you to automate operational tasks such as compaction, snapshot management, and unreferenced file removal.
Storage classes
Storage classes
With Amazon S3, you can store data across a range of different S3 storage classes purpose-built for specific use cases and access patterns: S3 Intelligent-Tiering, S3 Standard, S3 Express One Zone, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, and S3 Outposts.
Every S3 storage class supports a specific data access level at corresponding costs or geographic location.
For data with changing, unknown, or unpredictable access patterns, such as data lakes, analytics, or new applications, use S3 Intelligent-Tiering, which automatically optimizes your storage costs. S3 Intelligent-Tiering automatically moves your data between three low latency access tiers optimized for frequent, infrequent, and rare access. When subsets of objects become archived over time, you can activate the archive access tier designed for asynchronous access.
For more predictable access patterns, you can store mission-critical production data in S3 Standard for frequent access, accelerate performance-critical applications by storing your most frequently accessed data in S3 Express One Zone, save costs by storing infrequently accessed data in S3 Standard-IA or S3 One Zone-IA, and archive data at the lowest costs in the archival storage classes — S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive. You can use S3 Storage Class Analysis to monitor access patterns across objects to discover data that should be moved to lower-cost storage classes. Then you can use this information to configure an S3 Lifecycle policy that makes the data transfer. You can also use S3 Lifecycle policies to expire objects at the end of their lifecycles.
If you have data residency requirements that can’t be met by an existing AWS Region, you can use S3 storage classes for AWS Dedicated Local Zones or S3 on Outposts racks to store your data in a specific data perimeter.
Data residency and isolation
Amazon S3 supports your data residency and data isolation use cases when you need to store data in a specific data perimeter. If you have data residency requirements that can’t be met by an existing AWS Region, you can use S3 storage classes for AWS Dedicated Local Zones or S3 on Outposts racks to store your data in a specific data perimeter. This expands on the AWS Digital Sovereignty Pledge, our commitment to offering the most advanced set of sovereignty controls and features in the cloud.
Access management and security
Access management
To protect your data in Amazon S3, by default, users only have access to the S3 resources they create. You can grant access to other users by using one or a combination of the following access management features: AWS Identity and Access Management (IAM) to create users and manage their respective access; Access Control Lists (ACLs) to make individual objects accessible to authorized users; bucket policies to configure permissions for all objects within a single S3 bucket; S3 Access Points to simplify managing data access to shared data sets by creating access points with names and permissions specific to each application or sets of applications; S3 Access Grants to manage data permissions at scale by automatically granting S3 access to end-users based on their corporate identity; and Query String Authentication to grant time-limited access to others with temporary URLs. Amazon S3 also supports Audit Logs that list the requests made against your S3 resources for complete visibility into who is accessing what data.
Security
Amazon S3 offers flexible security features to block unauthorized users from accessing your data. Use VPC endpoints to connect to S3 resources from your Amazon Virtual Private Cloud (Amazon VPC) and from on-premises. Amazon S3 encrypts all new data uploads to any bucket (as of January 5, 2023). Amazon S3 supports both server-side encryption (with four key management options) and client-side encryption for data uploads (see the Amazon S3 User Guide for more information on data encryption with S3). Use S3 Inventory to check the encryption status of your S3 objects (see storage management for more information on S3 Inventory).
S3 Block Public Access is a set of security controls that ensures S3 buckets and objects do not have public access. Block Public Access is turned on by default for all new buckets. With a few clicks in the Amazon S3 console, you can apply the S3 Block Public Access settings to all buckets within your AWS account or to specific S3 buckets. Once the settings are applied to an AWS account, all existing or new buckets and objects associated with that account inherit the settings that prevent public access. S3 Block Public Access settings override other S3 access permissions, making it easy for the account administrator to enforce a “no public access” policy regardless of how an object is added, how a bucket is created, or if there are existing access permissions. S3 Block Public Access controls are auditable, provide a further layer of control, and use AWS Trusted Advisor bucket permission checks, AWS CloudTrail logs, and Amazon CloudWatch alarms. You should enable Block Public Access for all accounts and buckets that you do not want publicly accessible.
S3 Object Ownership is a feature that disables Access Control Lists (ACLs), changing ownership for all objects to the bucket owner and simplifying access management for data stored in S3. When you configure the S3 Object Ownership Bucket owner enforced setting, ACLs will no longer affect permissions for your bucket and the objects in it. All access control will be defined using resource-based policies, user policies, or some combination of these. Before you disable ACLs, review your bucket and object ACLs. To identify Amazon S3 requests that required ACLs for authorization, you can use the aclRequired field in Amazon S3 server access logs or AWS CloudTrail.
Using S3 Access Points that are restricted to a Virtual Private Cloud (VPC), you can easily firewall your S3 data within your private network. Additionally, you can use AWS Service Control Policies to require that any new S3 Access Point in your organization is restricted to VPC-only access.
IAM Access Analyzer for S3 is a feature that helps you simplify permissions management as you set, verify, and refine policies for your S3 buckets and access points. Access Analyzer for S3 monitors your existing bucket access policies to verify that they provide only the required access to your S3 resources. Access Analyzer for S3 evaluates your bucket access policies so that you can swiftly remediate any buckets with access that isn't required. When reviewing results that show potentially shared access to a bucket, you can Block Public Access to the bucket with a single click in the S3 console. For auditing purposes, you can download Access Analyzer for S3 findings as a CSV report. Additionally, the S3 console reports security warnings, errors, and suggestions from IAM Access Analyzer as you author your S3 policies. The console automatically runs more than 100 policy checks to validate your policies. These checks save you time, guide you to resolve errors, and help you apply security best practices.
IAM makes it easier for you to analyze access and reduce permissions to achieve least privilege by providing the timestamp when a user or role last used S3 and the associated actions. Use this “last accessed” information to analyze S3 access, identify unused permissions, and remove them confidently. To learn more see Refining Permissions Using Last Accessed Data.
You can use Amazon Macie to discover and protect sensitive data stored in Amazon S3. Macie automatically gathers a complete S3 inventory and continually evaluates every bucket to alert on any publicly accessible buckets, unencrypted buckets, or buckets shared or replicated with AWS accounts outside of your organization. Then, Macie applies machine learning and pattern matching techniques to the buckets you select to identify and alert you to sensitive data, such as personally identifiable information (PII). As security findings are generated, they are pushed out to the Amazon CloudWatch Events, making it easy to integrate with existing workflow systems and to trigger automated remediation with services like AWS Step Functions to take action like closing a public bucket or adding resource tags.
AWS PrivateLink for S3 provides private connectivity between Amazon S3 and on-premises. You can provision interface VPC endpoints for S3 in your VPC to connect your on-premises applications directly with S3 over AWS Direct Connect or AWS VPN. Requests to interface VPC endpoints for S3 are automatically routed to S3 over the Amazon network. You can set security groups and configure VPC endpoint policies for your interface VPC endpoints for additional access controls.
Learn more by visiting S3 access management and security, the S3 security and data protection eBook, and protecting data in Amazon S3.
Data processing
S3 Object Lambda
With S3 Object Lambda you can add your own code to S3 GET, HEAD, and LIST requests to modify and process data as it is returned to an application. You can use custom code to modify the data returned by standard S3 GET requests to filter rows, dynamically resize images, redact confidential data, and much more. You can also use S3 Object Lambda to modify the output of S3 LIST requests to create a custom view of objects in a bucket and S3 HEAD requests to modify object metadata like object name and size. Powered by AWS Lambda functions, your code runs on infrastructure that is fully managed by AWS, eliminating the need to create and store derivative copies of your data or to run expensive proxies, all with no changes required to applications.
S3 Object Lambda uses AWS Lambda functions to automatically process the output of a standard S3 GET, HEAD, or LIST request. AWS Lambda is a serverless compute service that runs customer-defined code without requiring management of underlying compute resources. With just a few clicks in the AWS Management Console, you can configure a Lambda function and attach it to a S3 Object Lambda Access Point. From that point forward, S3 will automatically call your Lambda function to process any data retrieved through the S3 Object Lambda Access Point, returning a transformed result back to the application. You can author and execute your own custom Lambda functions, tailoring S3 Object Lambda’s data transformation to your specific use case.
Query in place
Query in place
Amazon S3 has complementary services that query data without needing to copy and load it into a separate analytics platform or data warehouse. This means you can run data analytics directly on your data stored in Amazon S3.
Amazon S3 is compatible with AWS analytics services Amazon Athena and Amazon Redshift Spectrum. Amazon Athena queries your data in Amazon S3 without needing to extract and load it into a separate service or platform. It uses standard SQL expressions to analyze your data, delivers results within seconds, and is commonly used for ad hoc data discovery. Amazon Redshift Spectrum also runs SQL queries directly against data at rest in Amazon S3, and is more appropriate for complex queries and large datasets (up to exabytes). Because Amazon Athena and Amazon Redshift share a common data catalog and data formats, you can use them both against the same datasets in Amazon S3.
Learn more about querying your data in Amazon S3 by reading the blog post.
Data transfer
Data transfer
AWS provides a portfolio of data transfer services to provide the right solution for any data migration project. The level of connectivity is a major factor in data migration, and AWS has offerings that can address your hybrid cloud storage, online data transfer, and offline data transfer needs.
Hybrid cloud storage: AWS Storage Gateway is a hybrid cloud storage service that lets you seamlessly connect and extend your on-premises applications to AWS Storage. Customers use Storage Gateway to seamlessly replace tape libraries with cloud storage, provide cloud storage-backed file shares, or create a low-latency cache to access data in AWS for on-premises applications.
Online data transfer: AWS DataSync makes it easy and efficient to transfer hundreds of terabytes and millions of files into Amazon S3, up to 10x faster than open-source tools. DataSync automatically handles or eliminates many manual tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization. Additionally, you can use AWS DataSync to copy objects between a bucket on S3 on Outposts and a bucket stored in an AWS Region. The AWS Transfer Family provides fully managed, simple, and seamless file transfer to Amazon S3 using SFTP, FTPS, and FTP. Amazon S3 Transfer Acceleration enables fast transfers of files over long distances between your client and your Amazon S3 bucket.
Offline data transfer / little or no connectivity: The AWS Snowball service uses ruggedized, portable storage and edge computing devices for data collection, processing, and migration. Customers can ship the physical Snowball device for offline data migration to AWS.
Customers can also work with third-party providers from the AWS Partner Network (APN) to deploy hybrid storage architectures, integrate Amazon S3 into existing applications and workflows, and transfer data to and from AWS.
Learn more by visiting AWS cloud data migration services, AWS Storage Gateway, AWS DataSync, AWS Transfer Family, Amazon S3 Transfer Acceleration, and AWS Snowball.
Data exchange
Data exchange
AWS Data Exchange for Amazon S3 accelerates time to insight with direct access to data providers' Amazon S3 data. AWS Data Exchange for Amazon S3 helps you easily find, subscribe to, and use third-party data files for storage cost optimization, simplified data licensing management, and more. It is intended for subscribers who want to easily use third-party data files for data analysis with AWS services without needing to create or manage data copies. It is also helpful for data providers who want to offer in-place access to data hosted in their Amazon S3 buckets.
Once data subscribers are entitled to an AWS Data Exchange for Amazon S3 dataset, they can start data analysis without having to set up their own S3 buckets, copy data files into those S3 buckets, or pay associated storage fees. Data analysis can be done with AWS services such as Amazon Athena, Amazon SageMaker Feature Store, or Amazon EMR. Subscribers access the same S3 objects that the data provider maintains and are therefore always using the most up-to-date data available, without additional engineering or operational work. Data providers can easily set up AWS Data Exchange for Amazon S3 on top of their existing S3 buckets to share direct access to an entire S3 bucket or specific prefixes and S3 objects. After setup, AWS Data Exchange automatically manages subscriptions, entitlements, billing, and payment.
Performance
Performance
Amazon S3 provides industry leading performance for cloud object storage. Amazon S3 supports parallel requests, which means you can scale your S3 performance by the factor of your compute cluster, without making any customizations to your application. Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes. Amazon S3 performance supports at least 3,500 requests per second to add data and 5,500 requests per second to retrieve data. Each S3 prefix can support these request rates, making it simple to increase performance significantly.
To achieve this S3 request rate performance you do not need to randomize object prefixes to achieve faster performance. That means you can use logical or sequential naming patterns in S3 object naming without any performance implications. Refer to the Performance Guidelines for Amazon S3 and Performance Design Patterns for Amazon S3 for the most current information about performance optimization for Amazon S3.
Consistency
Amazon S3 delivers strong read-after-write consistency automatically for all applications, without changes to performance or availability, without sacrificing regional isolation for applications, and at no additional cost. With S3 Strong Consistency, S3 simplifies the migration of on-premises analytics workloads by removing the need to make changes to applications, and reduces costs by removing the need for extra infrastructure to provide strong consistency.
Any request for S3 storage is strongly consistent. After a successful write of a new object or an overwrite of an existing object, any subsequent read request immediately receives the latest version of the object. S3 also provides strong consistency for list operations, so after a write, you can immediately perform a listing of the objects in a bucket with any changes reflected.
Intended usage and restrictions
Your use of this service is subject to the Amazon Web Services Customer Agreement.