AWS Glue supports reading from self-managed Apache Kafka

Posted on: Oct 13, 2020

Streaming extract, transform, and load (ETL) jobs in AWS Glue can now ingest data from Apache Kafka clusters that you manage yourself. Previously, AWS Glue supported reading specifically from Amazon Managed Streaming for Apache Kafka (Amazon MSK). With this update, AWS Glue allows you to perform streaming ETL on data from Apache Kafka whether it is deployed on-premises or in the cloud.

AWS Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds. With the addition of self-managed Apache Kafka as a source, you can now also optionally use SSL when connecting to Apache Kafka as well as connect to clusters either inside or outside of an Amazon Virtual Private Cloud.

Self-managed Apache Kafka support in AWS Glue is available in the same AWS regions as AWS Glue.

To learn more, read about adding Streaming ETL jobs in our documentation.