AWS Big Data Blog
Category: Intermediate (200)
How to consolidate cross-Region S3 data into OpenSearch
We’re happy to announce that Amazon OpenSearch Ingestion pipelines can now read from S3 buckets in different Regions to ingest and consolidate data into a single OpenSearch Service domain or collection. In this post, I’ll show you how to use the new cross-Region support to ingest data from S3 buckets across multiple AWS Regions into a single OpenSearch Service domain or collection.
Enable real-time mainframe analytics with Precisely Connect and Amazon S3
In this post, we discuss how you can use Precisely Connect to enable real-time, direct replication of mainframe data to Amazon Simple Storage Service (Amazon S3), and how your organization can extend this foundation using Amazon S3 Tables for advanced analytics.
Migrating TLS Clients managed by third-party Certificate Authorities from self-managed Apache Kafka to Amazon MSK
In this post, we provide an approach to reuse your existing client certificates without reissuing them through AWS Certificate Manager (ACM) Private Certificate Authority. This solution enables an accelerated migration path by using your current third-party CA infrastructure. This removes the complexity and operational overhead of certificate re-issuance while maintaining the security posture that you’ve established with your existing mTLS implementation.
Analyzing your data catalog: Query SageMaker Catalog metadata with SQL
In this post, we demonstrate how to use the metadata export capability in Amazon SageMaker Catalog and perform analytics such as historical changes, monitor asset growth and track metadata improvements.
Building unified data pipelines with Apache Iceberg and Apache Flink
In this post, you build a unified pipeline using Apache Iceberg and Amazon Managed Service for Apache Flink that replaces the dual-pipeline approach. This walkthrough is for intermediate AWS users who are comfortable with Amazon Simple Storage Service (Amazon S3) and AWS Glue Data Catalog but new to streaming from Apache Iceberg tables.
Enhancing Identity Intelligence with Babel Street Match and Amazon OpenSearch
This post explores how combining Babel Street Match with OpenSearch Service provides a solution that helps your organization to handle large-scale, multilingual data.
Get to insights faster using Notebooks in Amazon SageMaker Unified Studio
In this post, we demonstrate how Notebooks in Amazon SageMaker Unified Studio help you get to insights faster by simplifying infrastructure configuration. You’ll see how to analyze housing price data, create scalable data tables, run distributed profiling, and train machine learning (ML) models within a single notebook environment.
How to use Parquet Column Indexes with Amazon Athena
In this blog post, we use Athena and Amazon SageMaker Unified Studio to explore Parquet Column Indexes and demonstrate how they can improve Iceberg query performance. We explain what Parquet Column Indexes are, demonstrate their performance benefits, and show you how to use them in your applications.
Introducing Amazon MSK Express Broker power for Kiro
In this post, we’ll show you how to use Kiro powers, a new capability that equips Kiro with contextual knowledge and tooling. You can simplify your MSK cluster management, from initial setup to diagnosing common issues, all through natural language conversations.
Proactive monitoring for Amazon Redshift Serverless using AWS Lambda and Slack alerts
In this post, we show you how to build a serverless, low-cost monitoring solution for Amazon Redshift Serverless that proactively detects performance anomalies and sends actionable alerts directly to your selected Slack channels.









