AWS Big Data Blog

Category: Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Create a low-latency source-to-data lake pipeline using Amazon MSK Connect, Apache Flink, and Apache Hudi

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. During the recent years, there has been a shift from monolithic to the microservices architecture. The microservices architecture makes applications easier to scale and quicker to develop, […]

Validate streaming data over Amazon MSK using schemas in cross-account AWS Glue Schema Registry

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Today’s businesses face an unprecedented growth in the volume of data. A growing portion of the data is generated in real time by IoT devices, websites, business […]

Evolve JSON Schemas in Amazon MSK and Amazon Kinesis Data Streams with the AWS Glue Schema Registry

Data is being produced, streamed, and consumed at an immense rate, and that rate is projected to grow exponentially in the future. In particular, JSON is the most widely used data format across streaming technologies and workloads. As applications, websites, and machines increasingly adopt data streaming technologies such as Apache Kafka and Amazon Kinesis Data […]

Now Available: Updated guidance on the Data Analytics Lens for AWS Well-Architected Framework

Nearly all businesses today require some form of data analytics processing, from auditing user access to generating sales reports. For all your analytics needs, the Data Analytics Lens for AWS Well-Architected Framework provides prescriptive guidance to help you assess your workloads and identify best practices aligned to the AWS Well-Architected Pillars: Operational Excellence, Security, Reliability, […]

Query your Amazon MSK topics interactively using Amazon Managed Service for Apache Flink Studio

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Amazon Managed Service for Apache Flink Studio makes it easy to analyze streaming data in real time and build stream processing applications powered by Apache Flink using […]

Power your Kafka Streams application with Amazon MSK and AWS Fargate

November 2024: This post was reviewed and updated for accuracy. Today, companies of all sizes across all verticals design and build event-driven architectures centered around real-time streaming and stream processing. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that […]

Secure connectivity patterns to access Amazon MSK across AWS Regions

August 2023: Amazon MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. AWS customers often segment their workloads across accounts and Amazon Virtual Private Cloud (Amazon VPC) to streamline access management while being able to expand their footprint. […]

Increase Apache Kafka’s resiliency with a multi-Region deployment and MirrorMaker 2

April 2025: The contents of this post are outdated. Please refer to Introducing Amazon MSK Replicator – Fully Managed Replication across MSK Clusters in Same or Different AWS Regions for latest solution and code artifacts. Customers create business continuity plans and disaster recovery (DR) strategies to maximize resiliency for their applications, because downtime or data loss […]

Securing Apache Kafka is easy and familiar with IAM Access Control for Amazon MSK

This is a guest blog post by AWS Data Hero Stephane Maarek.  AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates […]

How Goldman Sachs migrated from their on-premises Apache Kafka cluster to Amazon MSK

This is a guest post by Zachary Whitford, Associate, Richa Prajapati, Vice President and Aldo Piddiu, Vice President in the Global Investment Research engineering team at Goldman Sachs. To see how Goldman Sachs is innovating more with AWS visit Goldman Sachs Leading Cloud Innovator page. The Global Investment Research (GIR) division at Goldman Sachs delivers […]