Posted On: Oct 6, 2021
We are announcing the support of using Apache Spark SQL to update Apache Hive metadata tables when using Amazon EMR integration with Apache Ranger.
This January, we launched Amazon EMR integration with Apache Ranger, a feature that allows you to define and enforce database, table, and column-level permissions when Apache Spark users access data in Amazon S3 through the Hive Metastore. Previously, with Apache Ranger is enabled, you were limited to only being able to read data using Spark SQL statements such as SHOW DATABASES and DESCRIBE TABLE. Now, you can also insert data into, or update the Apache Hive metadata tables with these statements: INSERT INTO, INSERT OVERWRITE, and ALTER TABLE.
This feature is enabled on Amazon EMR 6.4 in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), (Milan), Europe (Stockholm), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (São Paulo), Middle East (Bahrain), and Africa (Cape Town).
To get started, see the following list of resources:·
AWS Big Data Blog post:
- Authorize SparkSQL data manipulation on Amazon EMR using Apache Ranger
- Introducing Amazon EMR integration with Apache Ranger
Amazon EMR Management Guide: