AWS Big Data Blog
Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer
In today’s world with technology modernization, the need for near-real-time streaming use cases has increased exponentially. Many customers are continuously consuming data from different sources, including databases, applications, IoT devices, and sensors. Organizations may need to ingest that streaming data into data lakes built on Amazon Simple Storage Service (Amazon S3). You may also need […]
Writing to Apache Hudi tables using AWS Glue Custom Connector
December 2022: This post was reviewed for accuracy. In today’s world, most organizations have to tackle the 3 V’s of variety, volume and velocity of big data. In this blog post, we talk about dealing with the variety and volume aspects of big data. The challenge of dealing with the variety involves processing data from […]
Creating a source to Lakehouse data replication pipe using Apache Hudi, AWS Glue, AWS DMS, and Amazon Redshift
February 2021 update – Please refer to the post Writing to Apache Hudi tables using AWS Glue Custom Connector to learn about an easier mechanism to write to Hudi tables using AWS Glue Custom Connector. In this post, we include the modified Apache Hudi JARs as an external dependency. The AWS Glue Custom Connector feature […]
Developing AWS Glue ETL jobs locally using a container
April 2025: The contents of this post are outdated since Glue 1.0 is deprecated. Refer to Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container for latest solution. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your […]


