AWS Big Data Blog

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Adoption of data lakes and the data mesh framework emerges as a powerful approach. By decentralizing data ownership and distribution, enterprises can break down silos and enable seamless data sharing. In this post, we discuss how to choose the right tool for building an enterprise data platform and enabling data sharing, collaboration and access within your organization and with third-party providers. We address three business use cases using AWS Glue, AWS Data Exchange, AWS Clean Rooms, and Amazon DataZone through three different use cases.

A customer’s journey with Amazon OpenSearch Ingestion pipelines

In this post, we share the journey of a multi-national financial credit reporting company, including the hurdles they faced, and why they went with Amazon OpenSearch Ingestion pipelines to make their log management smoother.

Single sign-on SSO for Amazon OpenSearch Service using SAML and Keycloak

In this post, we walk you through how to configure service provider-initiated authentication for OpenSearch Dashboards by using OpenSearch Service and Keycloak. We also discuss how to set up users, groups, and roles in Keycloak and configure their access to OpenSearch Dashboards.

Get started with Amazon DynamoDB zero-ETL integration with Amazon Redshift

We’re excited to announce the general availability (GA) of Amazon DynamoDB zero-ETL integration with Amazon Redshift, which enables you to run high-performance analytics on your DynamoDB data in Amazon Redshift with little to no impact on production workloads running on DynamoDB. As data is written into a DynamoDB table, it’s seamlessly made available in Amazon Redshift, eliminating the need to build and maintain complex data pipelines.

Elevate your search and analytics skills with the new Amazon OpenSearch Service YouTube channel

We’re thrilled to announce the launch of the official Amazon OpenSearch Service YouTube channel—a comprehensive resource for anyone looking to master Amazon OpenSearch Service. Whether you’re just getting started with searches , vectors, analytics, or you’re looking to optimize large-scale implementations, our channel can be your go-to resource to help you unlock the full potential of OpenSearch Service.

Migrate from Amazon Kinesis Data Analytics for SQL to Amazon Managed Service for Apache Flink and Amazon Managed Service for Apache Flink Studio

Amazon Kinesis Data Analytics for SQL is a data stream processing engine that helps you run your own SQL code against streaming sources to perform time series analytics, feed real-time dashboards, and create real-time metrics. AWS has made the decision to discontinue Kinesis Data Analytics for SQL, effective January 27, 2026. In this post, we explain why we plan to end support for Kinesis Data Analytics for SQL, alternative AWS offerings, and how to migrate your SQL queries and workloads.

Enriching metadata for accurate text-to-SQL generation for Amazon Athena

In this post, we demonstrate the critical role of metadata in text-to-SQL generation through an example implemented for Amazon Athena using Amazon Bedrock. We discuss the challenges in maintaining the metadata as well as ways to overcome those challenges and enrich the metadata.

Enhance Amazon EMR scaling capabilities with Application Master Placement

Starting with the Amazon EMR 7.2 release, Amazon EMR on EC2 introduced a new feature called Application Master (AM) label awareness, which allows users to enable YARN node labels to allocate the AM containers within On-Demand nodes only. In this post, we explore the key features and use cases where this new functionality can provide significant benefits, enabling cluster administrators to achieve optimal resource utilization, improved application reliability, and cost-efficiency in your EMR on EC2 clusters.

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

This post provides a detailed walkthrough about how to efficiently capture and manage manual snapshots in OpenSearch Service. It covers the essential steps for taking snapshots of your data, implementing safe transfer across different AWS Regions and accounts, and restoring them in a new domain. This guide is designed to help you maintain data integrity and continuity while navigating complex multi-Region and multi-account environments in OpenSearch Service.

Unleash deeper insights with Amazon Redshift data sharing for data lake tables

Amazon Redshift now enables the secure sharing of data lake tables—also known as external tables or Amazon Redshift Spectrum tables—that are managed in the AWS Glue Data Catalog, as well as Redshift views referencing those data lake tables. By using granular access controls, data sharing in Amazon Redshift helps data owners maintain tight governance over who can access the shared information. In this post, we explore powerful use cases that demonstrate how you can enhance cross-team and cross-organizational collaboration, reduce overhead, and unlock new insights by using this innovative data sharing functionality.