Posted On: Jun 28, 2022

AWS Glue streaming ETL (Extract Transform and Load) can now detect compressed data streaming from Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and self managed Apache Kafka. It can then automatically decompresses this data without customers having to write code, saving them development hours. AWS Glue Streaming ETL jobs continuously consume data from streaming sources, cleans and transforms the data in-flight, and makes it available for analysis in seconds. Customers compress data prior to streaming in-order to improve performance and to avoid throttling limits by Amazon Kinesis and Amazon MSK. Prior to this feature, customers had to write user defined functions to uncompress data from a stream, which is time consuming.

With this new feature, AWS Glue streaming ETL automatically detects if data is compressed in a stream and decompresses the data without customers having to write any code. AWS Glue streaming ETL supports auto-decompression for BZIP, GZIP, SNAPPY, XZ, ZSTD, and DEFLATE compression types and is supported on AVRO, JSON, CSV, and other file formats. To learn more, visit our documentation.

This feature is available in the same AWS Regions where AWS Glue is available.