Posted On: Mar 14, 2023

Amazon EMR is excited to announce a new capability that enables users to apply AWS Lake Formation based table and column level permissions on Amazon S3 data lake for write operations (i.e., INSERT INTO, INSERT OVERWRITE) with Apache Hive jobs submitted using Amazon EMR Steps API. This feature allows data administrators to define and enforce fine-grained table and column level security for customers accessing data via Apache Hive running on Amazon EMR.

Amazon EMR integration with AWS Lake Formation allows you to define and enforce database, table, and column-level permissions with open source data processing engines such as Apache Spark and Apache Hive running on Amazon EMR. Prior to this release, data administrators can define and enforce Lake Formation based permissions on Databases, Tables, and Columns for read only workloads with Apache Hive on EMR. With the current release, you can now use Hive to write to or alter Lake Formation-enabled Tables. This means you can enforce Lake Formation-based Database, Table, and Column level permissions when your customers are running INSERT INTO, INSERT OVERWRITE and ALTER TABLE queries. To use Lake Formation based permissions, customers must use Glue Data Catalog as the metastore.

This feature is available with Amazon EMR release 6.10 for Amazon EMR on EC2 clusters in all regions where Amazon EMR is available. To get started, refer to the Integrate Amazon EMR with AWS Lake Formation section in Amazon EMR documentation.