Amazon Athena adds support for querying Apache Hudi datasets in Amazon S3-based data lake

Posted on: Jul 14, 2020

Amazon Athena now supports querying the read-optimized view of an Apache Hudi dataset in your Amazon S3-based data lake.

Apache Hudi is an open-source data management framework used to simplify incremental data processing and data pipeline development. Hudi enables Amazon S3-based data lakes to comply with data privacy laws, consume real time streams and change data capture logs, reinstate late arriving data, and track change history and rollback. Apache Hudi is open-source and supports storing data on Amazon S3 in open source formats such as Apache Parquet and Apache Avro.

Data engineers use Apache Hudi support in Amazon EMR to develop data pipelines and to simplify incremental data management and data privacy use cases that require record-level insert, updates, and delete operations. With this release, customers can now run Athena queries to read the read-optimized view of a Hudi dataset.

For information and examples about how to create a Hudi table and run queries, please visit documentation.

Amazon Athena adds support for querying Apache Hudi datasets in Amazon S3-based data lake

Learn

Resources

Developers

Help