AWS Storage Blog
Category: Amazon Simple Storage Service (S3)
Build a managed Apache Iceberg data lake using Starburst and Amazon S3 Tables
Managing large-scale data analytics across diverse data sources has long been a challenge for enterprises. Data teams often struggle with complex data lake configurations, performance bottlenecks, and the need to maintain consistent data governance while enabling broad access to analytics capabilities. Today, Starburst announces a powerful solution to these challenges by extending their Apache Iceberg […]
Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose
UPDATE (7/31/2025): Firehose can directly access S3 Tables in Glue Data Catalog without requiring resource links. Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and […]
Event-driven framework to integrate AWS Backup service with CSPM tools
Many organizations use third-party Cloud Security Posture Management (CSPM) tools (for example Wiz.io) to continuously detect and remediate misconfiguration from build time to runtime across hybrid clouds such as AWS. CSPM tools often use AWS resource tags to enhance their security and compliance monitoring capabilities. Tags are key-value pairs that you can assign to AWS resources […]
Optimizing Amazon FSx for Lustre storage consumption using automatic data tiering with Amazon S3
Managing high-performance file storage can be a significant operational and cost challenge for many organizations, especially those running compute-intensive workloads such as high-performance computing (HPC) or data analytics. This is particularly true for organizations with existing data lakes on Amazon S3 who need POSIX-compliant, high-performance file system access. Amazon FSx for Lustre provides a scalable, […]
Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint
Modern data lakes integrate with multiple engines to meet a wide range of analytics needs, from SQL querying to stream processing. A key enabler of this approach is the adoption of Apache Iceberg as the open table format for building transactional data lakes. However, as the Iceberg ecosystem expands, the growing variety of engines and languages has […]
Integrating custom metadata with Amazon S3 Metadata
Organizations of all sizes face a common challenge: efficiently managing, organizing, and retrieving vast amounts of digital content. From images and videos to documents and application data, businesses are inundated with information that needs to be stored securely, accessed quickly, and analyzed effectively. The ability to extract, manage, and use metadata from this content is […]
Design patterns for multi-tenant access control on Amazon S3
Large organizations and software as a service (SaaS) platforms often share storage resources across multiple users, groups, or tenants. The design pattern chosen to implement this shared storage can significantly impact how access permissions are managed at scale. This decision is key because it directly affects platforms’ security and ease of scale. A well thought […]
Optimizing data transfers for high throughput life science instruments using AWS DataSync
Healthcare and life sciences (HCLS) customers are generating more data than ever as they integrate the use of omics data with applications in drug discovery, clinical development, molecular diagnostics, and population health. The rate and volume of data that HCLS laboratories generate are a reflection of their lab instrumentation and day-to-day lab operations. Efficiently moving […]
Cost-optimized log aggregation and archival in Amazon S3 using s3tar
According to a study by the International Data Corporation (IDC), the global datasphere is expected to grow from 33 zettabytes (ZB) in 2018 to 175 ZB by 2025, a staggering five-fold increase. Organizations that leverage distributed architectures generate a significant portion of their data footprint from observability data, including application logs, metrics, and traces, which […]
Adapting to change with data patterns on AWS: The “extend” cloud data pattern
As part of my re:Invent 2024 Innovation Talk, I shared three data patterns that many of our largest AWS customers have adopted. This article focuses on “Extend” which is an emerging data pattern. You can also watch this four-minute video clip on the Extend data pattern if interested. Many companies find great success with the […]