AWS Big Data Blog

Category: Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Data streaming technologies like Apache Kafka and Amazon Kinesis Data Streams capture and distribute data generated by thousands or millions of applications, websites, or machines. These technologies […]

Stream Twitter data into Amazon Redshift using Amazon MSK and AWS Glue streaming ETL

This post demonstrates how customers, system integrator (SI) partners, and developers can use the serverless streaming ETL capabilities of AWS Glue with Amazon Managed Streaming for Kafka (Amazon MSK) to stream data to a data warehouse such as Amazon Redshift. We also show you how to view Twitter streaming data on Amazon QuickSight via Amazon Redshift.

Vortexa delivers real-time insights on Amazon MSK with Lenses.io

This post discusses how Vortexa harnesses the power of Apache Kafka to improve real-time data accuracy and accelerate time-to-market by using a combination of Lenses.io for greater observability and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to create clusters on demand.

Best practices from Delhivery on migrating from Apache Kafka to Amazon MSK

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This is a guest post by Delhivery. In this post, we describe the steps Delhivery took to migrate from self-managed Apache Kafka running on Amazon Elastic Compute […]

Streaming web content with a log-based architecture with Amazon MSK

Content, such as breaking news or sports scores, requires updates in near-real-time. To stay up to date, you may be constantly refreshing your browser or mobile app. Building APIs to deliver this content at speed and scale can be challenging. In this post, I present an alternative to an API-based approach. I outline the concept […]

How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink

August 2023: Amazon MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. This guest post presents patterns for accessing an Amazon Managed Streaming for Apache Kafka cluster across your AWS account or Amazon Virtual Private Cloud (Amazon VPC) […]

Govern how your clients interact with Apache Kafka using API Gateway

In this blog post, we will show you how Amazon API Gateway can answer these questions as a component between your Amazon MSK cluster and your clients. Amazon MSK is a fully managed service for Apache Kafka that makes it easy to provision Kafka clusters with just a few clicks without the need to provision servers, manage storage, or configure Apache Zookeeper manually. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.

Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Most businesses generate data […]

Extract Oracle OLTP data in real time with GoldenGate and query from Amazon Athena

This post describes how you can improve performance and reduce costs by offloading reporting workloads from an online transaction processing (OLTP) database to Amazon Athena and Amazon S3. The architecture described allows you to implement a reporting system and have an understanding of the data that you receive by being able to query it on arrival.