Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Medidata’s journey to a modern lakehouse architecture on AWS

In this post, we show you how Medidata created a unified, scalable, real-time data platform that serves thousands of clinical trials worldwide with AWS services, Apache Iceberg, and a modern lakehouse architecture.

How Yelp modernized its data infrastructure with a streaming lakehouse on AWS

This is a guest post by Umesh Dangat, Senior Principal Engineer for Distributed Services and Systems at Yelp, and Toby Cole, Principle Engineer for Data Processing at Yelp, in partnership with AWS. Yelp processes massive amounts of user data daily—over 300 million business reviews, 100,000 photo uploads, and countless check-ins. Maintaining sub-minute data freshness with […]

Amazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performance

Effective today, all new Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters with Express brokers will support Intelligent Rebalancing at no additional cost. In this post we’ll introduce the Intelligent Rebalancing feature and show an example of how it works to improve operation performance.

Unlock real-time data insights with schema evolution using Amazon MSK Serverless, Iceberg, and AWS Glue streaming

This post showcases a solution that businesses can use to access real-time data insights without the traditional delays between data creation and analysis. By combining Amazon MSK Serverless, Debezium MySQL connector, AWS Glue streaming, and Apache Iceberg tables, the architecture captures database changes instantly and makes them immediately available for analytics through Amazon Athena. A standout feature is the system’s ability to automatically adapt when database structures change—such as adding new columns—without disrupting operations or requiring manual intervention.

Stream mainframe data to AWS in near real time with Precisely and Amazon MSK

In this post, we introduce an alternative architecture to synchronize mainframe data to the cloud using Amazon Managed Streaming for Apache Kafka (Amazon MSK) for greater flexibility and scalability. This event-driven approach provides additional possibilities for mainframe data integration and modernization strategies.

How Laravel Nightwatch handles billions of observability events in real time with Amazon MSK and ClickHouse Cloud

Laravel, one of the world’s most popular web frameworks, launched its first-party observability platform, Laravel Nightwatch, to provide developers with real-time insights into application performance. Built entirely on AWS managed services and ClickHouse Cloud, the service already processes over one billion events per day while maintaining sub-second query latency, giving developers instant visibility into the health of their applications.

Amazon MSK Replicator and MirrorMaker2: Choosing the right replication strategy for Apache Kafka disaster recovery and migrations

In this post, we walk through the different considerations for using Amazon MSK Replicator over Apache Kafka’s MirrorMaker 2, and help you choose the right replication solution for your use case. We also discuss how to make applications using Amazon Managed Streaming for Apache Kafka (Amazon MSK) resilient to disasters using a multi-Region Kafka architecture using MSK Replicator.

Export JMX metrics from Kafka connectors in Amazon Managed Streaming for Apache Kafka Connect with a custom plugin

In this post, we demonstrate how you can export the JMX metrics for Debezium connector when used with Amazon MSK Connect.

Near real-time streaming analytics on protobuf with Amazon Redshift

In this post, we explore an end-to-end analytics workload for streaming protobuf data, by showcasing how to handle these data streams with Amazon Redshift Streaming Ingestion, deserializing and processing them using AWS Lambda functions, so that the incoming streams are immediately available for querying and analytical processing on Amazon Redshift.

Optimize traffic costs of Amazon MSK consumers on Amazon EKS with rack awareness

In this post, we walk you through a solution for implementing rack awareness in consumer applications that are dynamically deployed across multiple Availability Zones using Amazon EKS.

AWS Big Data Blog

Category: Amazon Managed Streaming for Apache Kafka (Amazon MSK)