Posted On: Mar 7, 2023

AWS Glue now supports Streaming ETL in version 4.0, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue 4.0 upgrades data integration engines, including an upgrade to Apache Spark 3.3.0 and to Python 3.10.

AWS Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds. This release includes an optimized state-management store to build efficient streaming solutions across micro-batches. This makes it easier to remove duplicates in a stream and to perform stream-based aggregations. You can also add a new column that indicates when a corresponding record was received by the stream for better data observability. This version also supports IAM authentication for Amazon Managed Streaming for Apache Kafka Serverless.

AWS Glue 4.0 Streaming ETL is now available in the same AWS regions as AWS Glue, except for China and GovCloud.

To learn more, read about Streaming ETL jobs in our documentation.