Vishal Pathak | AWS Big Data Blog

Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer

In today’s world with technology modernization, the need for near-real-time streaming use cases has increased exponentially. Many customers are continuously consuming data from different sources, including databases, applications, IoT devices, and sensors. Organizations may need to ingest that streaming data into data lakes built on Amazon Simple Storage Service (Amazon S3). You may also need […]

Review the Terms and Conditions and choose the Accept Terms button to continue.

Writing to Apache Hudi tables using AWS Glue Custom Connector

December 2022: This post was reviewed for accuracy. In today’s world, most organizations have to tackle the 3 V’s of variety, volume and velocity of big data. In this blog post, we talk about dealing with the variety and volume aspects of big data. The challenge of dealing with the variety involves processing data from […]

Creating a source to Lakehouse data replication pipe using Apache Hudi, AWS Glue, AWS DMS, and Amazon Redshift

February 2021 update – Please refer to the post Writing to Apache Hudi tables using AWS Glue Custom Connector to learn about an easier mechanism to write to Hudi tables using AWS Glue Custom Connector. In this post, we include the modified Apache Hudi JARs as an external dependency. The AWS Glue Custom Connector feature […]

Developing AWS Glue ETL jobs locally using a container

April 2025: The contents of this post are outdated since Glue 1.0 is deprecated. Refer to Develop and test AWS Glue version 3.0 and 4.0 jobs locally using a Docker container for latest solution. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your […]

AWS Big Data Blog

Author: Vishal Pathak

Ingest streaming data to Apache Hudi tables using AWS Glue and Apache Hudi DeltaStreamer

Writing to Apache Hudi tables using AWS Glue Custom Connector

Creating a source to Lakehouse data replication pipe using Apache Hudi, AWS Glue, AWS DMS, and Amazon Redshift

Developing AWS Glue ETL jobs locally using a container

Learn

Resources

Developers

Help