AWS Big Data Blog

Category: Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Near real-time baggage operational insights for airlines using Amazon Kinesis Data Streams

This post explores a framework developed by IBM to modernize baggage analytics using AWS managed services like Amazon Kinesis Data Streams, DynamoDB Streams, and other AWS services within a serverless architecture. The solution enables near real-time baggage operational insights for airlines, delivering cost savings, enhanced scalability, and improved performance while providing better security and operational efficiency to meet evolving airline needs.

Overcome your Kafka Connect challenges with Amazon Data Firehose

We’re happy to announce a new feature in the Amazon Data Firehose integration with Amazon MSK. You can now specify the Firehose stream to either read from the earliest position on the Kafka topic or from a custom timestamp to begin reading from your MSK topic. In this post of this series, we focus on managed data delivery from Kafka to your data lake.

Building serverless event streaming applications with Amazon MSK and AWS Lambda

In this post, we describe how you can simplify your event-driven application architecture using AWS Lambda with Amazon MSK. We demonstrate how to configure Lambda as a consumer for Kafka topics, including a cross-account setup and how to optimize price and performance for these applications.

Secure access to a cross-account Amazon MSK cluster from Amazon MSK Connect using IAM authentication

In this post, we demonstrate a use case where you might need to use an MSK cluster in one AWS account, but MSK Connect is located in a separate account. We demonstrate how to implement IAM authentication after establishing network connectivity. IAM provides enhanced security measures, making sure your systems are protected against unauthorized access.

Build a secure serverless streaming pipeline with Amazon MSK Serverless, Amazon EMR Serverless and IAM

The post demonstrates a comprehensive, end-to-end solution for processing data from MSK Serverless using an EMR Serverless Spark Streaming job, secured with IAM authentication. Additionally, it demonstrates how to query the processed data using Amazon Athena, providing a seamless and integrated workflow for data processing and analysis. This solution enables near real-time querying of the latest data processed from MSK Serverless and EMR Serverless using Athena, providing instant insights and analytics.

Deploy real-time analytics with StarTree for managed Apache Pinot on AWS

In this post, we introduce StarTree as a managed solution on AWS for teams seeking the advantages of Pinot. We highlight the key distinctions between open-source Pinot and StarTree, and provide valuable insights for organizations considering a more streamlined approach to their real-time analytics infrastructure.

Governing streaming data in Amazon DataZone with the Data Solutions Framework on AWS

In this post, we explore how AWS customers can extend Amazon DataZone to support streaming data such as Amazon Managed Streaming for Apache Kafka (Amazon MSK) topics. Developers and DevOps managers can use Amazon MSK, a popular streaming data service, to run Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating it.

Amazon Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

Amazon Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.