Posted On: Oct 31, 2022

Amazon EMR release 6.8 now supports Apache Hudi 0.11.1 and Apache Iceberg 0.14.0. You can use these frameworks on Amazon EMR on EC2, and Amazon EMR on EKS as well as on Amazon EMR Serverless.

Apache Hudi 0.11.1 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Multi-Modal Index support and Data Skipping with Metadata Table that allows adding bloom filter and column stats indexes to tables which can significantly improve query performance, adds an Async Indexer service which allows users to create different kinds of indices (e.g., files, bloom filters, and column stats) in the metadata table without blocking ingestion, includes Spark SQL improvements adding support for update or delete records in Hudi tables using non-primary-key fields and Time travel query via timestamp as of syntax, includes Flink integration improvements with support for both Flink 1.13.x and 1.14.x and support for complex data types such as Map and Array etc. In addition, Hudi 0.11.1 includes bug fixes over Hudi 0.11.0 available in Amazon EMR release 6.7. For more details, refer to the OSS Hudi release docs.

Apache Iceberg 0.14.0 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Merge-on-read support for MERGE and UPDATE statements, adds support to rewrite partitions using Z-order that allows to re-organize partitions to be efficient with query predicates on multiple columns and also to keep similar data together, includes several performance improvements for scan planning in Spark queries, add support for row group skipping using Parquet bloom filters, etc. For more details, refer to the OSS Iceberg release docs.

Amazon EMR release 6.8 is generally available in all regions where Amazon EMR is available. See Regional Availability of Amazon EMR, and our release notes for more details.