Posted On: Jun 8, 2023

Amazon Athena for Apache Spark now supports open-source data lake storage frameworks Apache Hudi 0.13, Apache Iceberg 1.2.1, and Linux Foundation Delta Lake 2.0.2. These frameworks simplify incremental data processing of large data sets using ACID (atomicity, consistency, isolation, durability) transactions and make it simpler to store and process large data sets in your data lakes.

Amazon Athena for Apache Spark is a feature of Amazon Athena that lets you run interactive analytics on Apache Spark in under a second to analyze petabytes of data. As data lakes grow in size, it can be challenging to add incremental data to your data lake and keep your data transactionally consistent for all of your data users. These data lake frameworks simplify incremental data processing in S3 data lakes using ACID transactions, upserts, and deletes to create transactionally consistent files. With today's launch, data engineers can now create and manage data lake tables efficiently with features such as schema evolution. Schema evolution makes it simple to adapt your data to business changes as it enables changing the data structure of your existing data tables without needing to re-write your existing data to conform to your new structure

Apache Iceberg, Apache Hudi and Delta Lake support is available in 9 AWS regions where Amazon Athena for Apache Spark is available: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Mumbai). To learn more and to get started, visit the Amazon Athena for Apache Spark webpage.