AWS Big Data Blog

Category: Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Increase Apache Kafka’s resiliency with a multi-Region deployment and MirrorMaker 2

Customers create business continuity plans and disaster recovery (DR) strategies to maximize resiliency for their applications, because downtime or data loss can result in losing revenue or halting operations. Ultimately, DR planning is all about enabling the business to continue running despite a Regional outage. This post explains how to make Apache Kafka resilient to […]

Read More

Securing Apache Kafka is easy and familiar with IAM Access Control for Amazon MSK

This is a guest blog post by AWS Data Hero Stephane Maarek.  AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates […]

Read More

How Goldman Sachs migrated from their on-premises Apache Kafka cluster to Amazon MSK

This is a guest post by Zachary Whitford, Associate, Richa Prajapati, Vice President and Aldo Piddiu, Vice President in the Global Investment Research engineering team at Goldman Sachs. The Global Investment Research (GIR) division at Goldman Sachs delivers client-focused research in the equity, fixed income, currency, and commodities markets. GIR analysts help the firm’s investor […]

Read More

Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry

Data streaming technologies like Apache Kafka and Amazon Kinesis Data Streams capture and distribute data generated by thousands or millions of applications, websites, or machines. These technologies serve as a highly available transport layer that decouples the data-producing applications from data processors. However, the sheer number of applications producing, processing, routing, and consuming data can […]

Read More

Stream Twitter data into Amazon Redshift using Amazon MSK and AWS Glue streaming ETL

This post demonstrates how customers, system integrator (SI) partners, and developers can use the serverless streaming ETL capabilities of AWS Glue with Amazon Managed Streaming for Kafka (Amazon MSK) to stream data to a data warehouse such as Amazon Redshift. We also show you how to view Twitter streaming data on Amazon QuickSight via Amazon Redshift.

Read More

Vortexa delivers real-time insights on Amazon MSK with Lenses.io

This post discusses how Vortexa harnesses the power of Apache Kafka to improve real-time data accuracy and accelerate time-to-market by using a combination of Lenses.io for greater observability and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to create clusters on demand.

Read More

Best practices from Delhivery on migrating from Apache Kafka to Amazon MSK

This is a guest post by Delhivery. In this post, we describe the steps Delhivery took to migrate from self-managed Apache Kafka running on Amazon Elastic Compute Cloud (Amazon EC2) to Amazon Managed Streaming for Apache Kafka (Amazon MSK). “We’ve been in production for over a year now,” said Akash Deep Verma, Senior Technical Architect, […]

Read More

Streaming web content with a log-based architecture with Amazon MSK

Content, such as breaking news or sports scores, requires updates in near-real-time. To stay up to date, you may be constantly refreshing your browser or mobile app. Building APIs to deliver this content at speed and scale can be challenging. In this post, I present an alternative to an API-based approach. I outline the concept […]

Read More

How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink

This guest post presents patterns for accessing an Amazon Managed Streaming for Apache Kafka cluster across your AWS account or Amazon Virtual Private Cloud (Amazon VPC) boundaries using AWS PrivateLink. In addition, the post discusses the pattern that the Transaction Banking team at Goldman Sachs (TxB) chose for their cross-account access, the reasons behind their […]

Read More

Govern how your clients interact with Apache Kafka using API Gateway

In this blog post, we will show you how Amazon API Gateway can answer these questions as a component between your Amazon MSK cluster and your clients. Amazon MSK is a fully managed service for Apache Kafka that makes it easy to provision Kafka clusters with just a few clicks without the need to provision servers, manage storage, or configure Apache Zookeeper manually. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.

Read More