AWS Storage Blog
Tag: Amazon Simple Storage Service (Amazon S3)
Archiving relational databases to Amazon S3 Glacier storage classes for cost optimization
Many customers are growing their data footprints rapidly, with significantly more data stored in their relational database management systems (RDBMS) than ever before. Additionally, organizations subject to data compliance including the Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI-DSS) and General Data Protection Regulation (GDPR) are often required […]
Cost-optimized log aggregation and archival in Amazon S3 using s3tar
According to a study by the International Data Corporation (IDC), the global datasphere is expected to grow from 33 zettabytes (ZB) in 2018 to 175 ZB by 2025, a staggering five-fold increase. Organizations that leverage distributed architectures generate a significant portion of their data footprint from observability data, including application logs, metrics, and traces, which […]
Backing up Oracle databases to Amazon S3 at scale
In today’s data-driven world, safeguarding critical information stored in Oracle databases is crucial for enterprises. Companies struggle to efficiently backing up vast amounts of data from hundreds of databases powering enterprise resource planning (ERP) systems and critical applications. These backups must be secure, durable, and easily restorable to ensure business continuity, guard against ransomware, and […]
Adapting to change with data patterns on AWS: The “extend” cloud data pattern
As part of my re:Invent 2024 Innovation Talk, I shared three data patterns that many of our largest AWS customers have adopted. This article focuses on “Extend” which is an emerging data pattern. You can also watch this four-minute video clip on the Extend data pattern if interested. Many companies find great success with the […]
Adapting to change with data patterns on AWS: The “aggregate” cloud data pattern
As part of my re:Invent 2024 Innovation talk, I shared three data patterns that many of our largest AWS customers have adopted. This article focuses on the “Aggregate” cloud data pattern, which is the most commonly adopted across AWS customers. You can also watch this six-minute video clip on the Aggregate data pattern for a […]
Adapting to change with data patterns on AWS: The “curate” cloud data pattern
As part of my re:Invent 2024 Innovation talk, I shared three data patterns that many of our largest AWS customers have adopted. This article focuses on the “Curate” data pattern, which we have seen more AWS customers adopt in the last 12-18 months as they look to leverage data sets for both analytics and AI […]
Adapting to change with data patterns on AWS: Aggregate, curate, and extend
At AWS re:Invent, I do an Innovation Talk on the emerging data trends that shape the direction of cloud data strategies. Last year, I talked about Putting Your Data to Work with Generative AI, which not only covered how data is used with foundation models, but also how businesses should think about storing and classifying […]
Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight
UPDATE (1/27/2025): Amazon S3 Metadata is generally available. UPDATE (7/15/2025): Amazon S3 Metadata releases live inventory tables. This blog post has been edited to reflect the current process. Object storage provides virtually unlimited scalability, but managing billions, or even trillions, of objects can pose significant challenges. How do you know what data you have? How can […]
Build a managed transactional data lake with Amazon S3 Tables
UPDATE (12/19/2024): Added guidance for Amazon EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]
How Amazon S3 Tables use compaction to improve query performance by up to 3 times
Today businesses managing petabytes of data must optimize storage and processing to drive timely insights while being cost-effective. Customers often choose Apache Parquet for improved storage and query performance. Additionally, customers use Apache Iceberg to organize Parquet datasets to take advantage of its database-like features such as schema evolution, time travel, and ACID transactions. Customers […]