Posted On: Sep 27, 2023

Amazon Kinesis Data Firehose now integrates with Amazon MSK to offer a fully managed solution that simplifies the processing and delivery of streaming data from Amazon MSK Apache Kafka clusters into data lakes stored on Amazon S3. With just a few clicks, Amazon MSK customers can continuously load data from their desired Apache Kafka clusters to their Amazon S3 bucket, eliminating the need to develop or run their own connector applications. 

Amazon MSK is a fully managed service for Apache Kafka that makes it easier for you to build and run applications that use Apache Kafka as a data store. Kinesis Data Firehose is a fully managed service that continuously captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services. Kinesis Data Firehose automatically scales to match the throughput of your Amazon MSK data and without ongoing administration. Kinesis Data Firehose also offers easy to use features like JSON to Parquet/ORC for format conversion and batch aggregation to optimize the S3 file size. These features simplify data analytical/processing workflows on delivered data. 

To get started, you need an AWS account. Once you have an account, you can create a delivery stream in the Amazon Kinesis Console. To learn more, explore the Amazon Kinesis Data Firehose developer guide.

Amazon MSK to Amazon S3 delivery using Amazon Kinesis Data Firehose can be used in all commercial and AWS GovCloud (US) Regions where Amazon MSK and Kinesis Data Firehose are available

As of Feb 09, 2024, Kinesis Data Firehose is now Amazon Data Firehose