Amazon EMR | AWS Storage Blog

Building an open warehouse architecture: Supabase’s integration with Amazon S3 Tables

As applications scale, developers face a persistent challenge: analytical queries that slow down transactional databases, force them to copy data across multiple proprietary tools, and create disconnected data silos. For the 5 million developers building on Supabase, an open source Postgres development platform, this tension between operational and analytical workloads has become increasingly critical. The […]

Build intelligent ETL pipelines using AWS Model Context Protocol and Amazon Q

Data scientists and engineers spend hours writing complex data pipelines to extract, transform, and load (ETL) data from various sources into their data lakes for data integration and creating unified data models to build business insights. The process involves understanding the source and target systems, discovering schemas, mapping source and target, writing and testing ETL […]

Optimizing recommendations and analytics using Amazon DynamoDB and Amazon S3

Today, consumers navigate thousands of products on e-commerce sites, hundreds of shows on streaming platforms, and countless options in digital marketplaces. This choice overload creates decision fatigue, yet consumers continue to demand more variety and make more purchases online. As a result, personalization has become essential—consumers reward brands that deliver relevant, tailored online experiences. However, […]

How Zeta Global scales multi-tenant data ingestion with Amazon S3 Tables

Zeta Global is a data-driven marketing technology company that uses consumer insights to empower brands in customer acquisition, growth, and retention. At the core of its operations is the Zeta Marketing Platform, an advanced system that applies sophisticated AI and machine learning (ML) capabilities on proprietary data from over 245 million U.S. consumer profiles. This […]

How to consume tabular data from Amazon S3 Tables for insights and business reporting

When was the last time you found yourself trying to look at rows and rows of data in a spreadsheet struggling to interpret and draw conclusions? Many analysts and engineers experience the same challenge every day. Whether it’s analyzing sales trends, monitoring operational metrics, or understanding customer behavior, the challenge lies not just in interpreting […]

How Pendulum achieves 6x faster processing and 40% cost reduction with Amazon S3 Tables

Pendulum is an AI-powered analytics platform that aggregates and analyzes real-time data from social media, news, and podcasts. Designed to help organizations stay ahead, it enables reputation monitoring, early crisis detection, and influencer activity tracking. Using machine learning (ML) enables Pendulum to surface key insights from multiple channels, providing a comprehensive view of the digital […]

Bringing more to the table: How Amazon S3 Tables rapidly delivered new capabilities in the first 5 months

Amazon S3 redefined data storage when it launched as the first generally available AWS service in 2006 to deliver highly reliable, durable, secure, low-latency storage with virtually unlimited scale. While designed to deliver simple storage, S3 has proven to be built to handle the explosive growth of data we have seen in the last 19 […]

Streamlining access to tabular datasets stored in Amazon S3 Tables with DuckDB

As businesses continue to rely on data-driven decision-making, there’s an increasing demand for tools that streamline and accelerate the process of data analysis. Efficiency and simplicity in application architecture can serve as a competitive edge when driving high-stakes decisions. Developers are seeking lightweight, flexible tools that seamlessly integrate with their existing application stack, specifically solutions […]

Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose

UPDATE (7/31/2025): Firehose can directly access S3 Tables in Glue Data Catalog without requiring resource links. Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and […]

Build a managed transactional data lake with Amazon S3 Tables

UPDATE (12/19/2024): Added guidance for Amazon EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]

AWS Storage Blog

Category: Amazon EMR