AWS Big Data Blog

Improve DynamoDB analytics with AWS Glue zero-ETL schema and partition controls

In this post, you learn how to replicate Amazon DynamoDB data to Apache Iceberg tables in Amazon S3 through a zero-ETL integration. We walk through the challenges that the DynamoDB nested, schema-flexible data model introduces for analytics workloads, and show you how to configure schema unnesting and data partitioning for a sample product catalog table. We also cover how to query the replicated data in Amazon Athena using standard SQL.

How to build a cross-Region resilience for Amazon OpenSearch Service with Amazon MSK

In this post, we outline the solution that provides cross-Region resiliency without needing to reestablish relationships during a fail-back, using an active-active replication model with Amazon OpenSearch Ingestion (OSI) and Amazon Managed Streaming for Apache Kafka (Amazon MSK). This solution applies to both OpenSearch Service managed clusters and Amazon OpenSearch Serverless collections. We use Amazon OpenSearch Serverless as an example for the configurations in this post.

How to consolidate cross-Region S3 data into OpenSearch

We’re happy to announce that Amazon OpenSearch Ingestion pipelines can now read from S3 buckets in different Regions to ingest and consolidate data into a single OpenSearch Service domain or collection. In this post, I’ll show you how to use the new cross-Region support to ingest data from S3 buckets across multiple AWS Regions into a single OpenSearch Service domain or collection.

Enable real-time mainframe analytics with Precisely Connect and Amazon S3

In this post, we discuss how you can use Precisely Connect to enable real-time, direct replication of mainframe data to Amazon Simple Storage Service (Amazon S3), and how your organization can extend this foundation using Amazon S3 Tables for advanced analytics.

Build streaming applications on Amazon Managed Service for Apache Flink with AI-assisted guidance

In this post, we walk through installing the Power and Skill, using Amazon Kinesis Data Streams to build a Kinesis Data Stream-to-Kinesis Data Stream streaming pipeline, and migrating an existing application to Flink 2.2. You can follow along with this use case to see how the Managed Service for Apache Flink Kiro Power can help you build a resilient, performant application grounded in best practices.

Migrating TLS Clients managed by third-party Certificate Authorities from self-managed Apache Kafka to Amazon MSK

In this post, we provide an approach to reuse your existing client certificates without reissuing them through AWS Certificate Manager (ACM) Private Certificate Authority. This solution enables an accelerated migration path by using your current third-party CA infrastructure. This removes the complexity and operational overhead of certificate re-issuance while maintaining the security posture that you’ve established with your existing mTLS implementation.

A guide to capacity planning for Airflow worker pool in Amazon MWAA

In our previous post, A guide to Airflow worker pool optimization in Amazon MWAA, we explored when adding workers to your Amazon Managed Workflows for Apache Airflow (Amazon MWAA) environment actually solves performance issues, and when it doesn’t. We walked through patterns like high CPU utilization and long queue times where scaling may be appropriate, […]

A guide to Airflow worker pool optimization in Amazon MWAA

Optimizing the Airflow worker pool configuration in Amazon Managed Workflows for Apache Airflow (Amazon MWAA), the AWS fully managed Apache Airflow service, is an important yet often overlooked strategy for scaling workflow operations. Tasks queued for longer periods can create the illusion that additional workers are the solution, when in reality the root cause might […]

Unified observability in Amazon OpenSearch Service: metrics, traces, and AI agent debugging in a single interface

Amazon OpenSearch Service now brings application monitoring, native Amazon Managed Service for Prometheus integration, and AI agent tracing together in OpenSearch UI’s observability workspace. In this post, we walk through two real-world scenarios using the OpenTelemetry sample app: a multi-agent travel planner facing slow processing, and a checkout flow quietly failing on one microservice.