Posted On: Jul 16, 2021

Amazon Athena has updated its integration with Apache Hudi to support new features and the latest 0.8.0 community release. Hudi is an open-source data management framework used to simplify incremental data processing in S3 data lakes. The updated integration enables you to use Athena to query Hudi 0.8.0 tables managed via Amazon EMR, Apache Spark, Apache Hive or other compatible services and includes new support for snapshot queries and reading bootstrapped tables.

Apache Hudi provides record-level data processing that can help you simplify development of Change Data Capture (CDC) pipelines, comply with GDPR-driven updates and deletes, and better manage streaming data from sensors or devices that require data insertion and event updates. The 0.8.0 release makes it even easier for you to migrate large Parquet tables to Hudi without copying data so you can query and analyze them via Athena. Furthermore, with Athena’s new support for snapshot queries, you can now have near real-time views of your streaming table updates.

To learn more about Athena's integration with Hudi, see Using Athena to Query Apache Hudi Dataset and the Querying an Apache Hudi Dataset with Amazon Athena blog series.