AWS Glue | AWS Storage Blog

Build intelligent ETL pipelines using AWS Model Context Protocol and Amazon Q

Data scientists and engineers spend hours writing complex data pipelines to extract, transform, and load (ETL) data from various sources into their data lakes for data integration and creating unified data models to build business insights. The process involves understanding the source and target systems, discovering schemas, mapping source and target, writing and testing ETL […]

Optimizing recommendations and analytics using Amazon DynamoDB and Amazon S3

Today, consumers navigate thousands of products on e-commerce sites, hundreds of shows on streaming platforms, and countless options in digital marketplaces. This choice overload creates decision fatigue, yet consumers continue to demand more variety and make more purchases online. As a result, personalization has become essential—consumers reward brands that deliver relevant, tailored online experiences. However, […]

Faster threat detection at scale: Real-time cybersecurity graph analytics with PuppyGraph and Amazon S3 Tables

Modern cybersecurity teams are facing unprecedented challenges in data analysis by the scale, complexity, and velocity of data. Cloud environments continuously generate massive amounts information in form of access logs, configuration changes, alerts, and telemetry. Traditional analysis methods of looking at these data points in isolation can’t effectively detect threats such as lateral movement and […]

Cloud-powered tick data: revolutionizing financial data storage with Amazon S3 and LSEG

Data has become the lifeblood of modern financial markets, driving everything from investment decisions to regulatory compliance. Nowhere is this more evident than in electronic trading, where the ability to efficiently store, process, and analyze historical market data can make the difference between success and failure. Market participants are witnessing an unprecedented surge in tick […]

From raw to refined: building a data quality pipeline with AWS Glue and Amazon S3 Tables

Organizations often struggle to extract maximum value from their data lakes when running generative AI and analytics workloads due to data quality challenges. Although data lakes excel at storing massive amounts of raw, diverse data, they need robust governance and management practices to prevent common quality issues. Without proper data validation, cleansing processes, and ongoing […]

How to consume tabular data from Amazon S3 Tables for insights and business reporting

When was the last time you found yourself trying to look at rows and rows of data in a spreadsheet struggling to interpret and draw conclusions? Many analysts and engineers experience the same challenge every day. Whether it’s analyzing sales trends, monitoring operational metrics, or understanding customer behavior, the challenge lies not just in interpreting […]

How Pendulum achieves 6x faster processing and 40% cost reduction with Amazon S3 Tables

Pendulum is an AI-powered analytics platform that aggregates and analyzes real-time data from social media, news, and podcasts. Designed to help organizations stay ahead, it enables reputation monitoring, early crisis detection, and influencer activity tracking. Using machine learning (ML) enables Pendulum to surface key insights from multiple channels, providing a comprehensive view of the digital […]

Bringing more to the table: How Amazon S3 Tables rapidly delivered new capabilities in the first 5 months

Amazon S3 redefined data storage when it launched as the first generally available AWS service in 2006 to deliver highly reliable, durable, secure, low-latency storage with virtually unlimited scale. While designed to deliver simple storage, S3 has proven to be built to handle the explosive growth of data we have seen in the last 19 […]

Connect Snowflake to S3 Tables using the SageMaker Lakehouse Iceberg REST endpoint

Organizations today seek data analytics solutions that provide maximum flexibility and accessibility. Customers need their data to be readily available using their preferred query engines, and break down barriers across different computing environments. At the same time, they want a single copy of data to be used across these solutions, to track lineage, be cost […]

Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose

UPDATE (7/31/2025): Firehose can directly access S3 Tables in Glue Data Catalog without requiring resource links. Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and […]

AWS Storage Blog

Category: AWS Glue