Posted On: Oct 31, 2022
Apache Hudi 0.11.1 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Multi-Modal Index support and Data Skipping with Metadata Table that allows adding bloom filter and column stats indexes to tables which can significantly improve query performance, adds an Async Indexer service which allows users to create different kinds of indices (e.g., files, bloom filters, and column stats) in the metadata table without blocking ingestion, includes Spark SQL improvements adding support for update or delete records in Hudi tables using non-primary-key fields and Time travel query via timestamp as of syntax, includes Flink integration improvements with support for both Flink 1.13.x and 1.14.x and support for complex data types such as Map and Array etc. In addition, Hudi 0.11.1 includes bug fixes over Hudi 0.11.0 available in Amazon EMR release 6.7. For more details, refer to the OSS Hudi release docs.
Apache Iceberg 0.14.0 on Amazon EMR 6.8 includes support for Spark 3.3.0, adds Merge-on-read support for MERGE and UPDATE statements, adds support to rewrite partitions using Z-order that allows to re-organize partitions to be efficient with query predicates on multiple columns and also to keep similar data together, includes several performance improvements for scan planning in Spark queries, add support for row group skipping using Parquet bloom filters, etc. For more details, refer to the OSS Iceberg release docs.