AWS Big Data Blog
Enable federated governance using Trino and Apache Ranger on Amazon EMR
Managing data through a central data platform simplifies staffing and training challenges and reduces the costs. However, it can create scaling, ownership, and accountability challenges, because central teams may not understand the specific needs of a data domain, whether it’s because of data types and storage, security, data catalog requirements, or specific technologies needed for […]
Authorize SparkSQL data manipulation on Amazon EMR using Apache Ranger
This post was last updated July 2022. With Amazon EMR 6.7, all Apache Spark DDL’s are now supported, except for CREATE VIEW. For details, see the section under “limitations”. NOTE: You will need to redeploy Spark service definition (link) on your Apache Ranger server. Instructions on how to redeploy can be found here. With Amazon […]
Introducing Amazon EMR integration with Apache Ranger
This post was last updated July 2022. Data security is an important pillar in data governance. It includes authentication, authorization , encryption and audit. Amazon EMR enables you to set up and run clusters of Amazon Elastic Compute Cloud (Amazon EC2) instances with open-source big data applications like Apache Spark, Apache Hive, Apache Flink, and Presto. You may […]
Implement perimeter security in Amazon EMR using Apache Knox
Perimeter security helps secure Apache Hadoop cluster resources to users accessing from outside the cluster. It enables a single access point for all REST and HTTP interactions with Apache Hadoop clusters and simplifies client interaction with the cluster. For example, client applications must acquire Kerberos tickets using Kinit or SPNEGO before interacting with services on Kerberos enabled clusters. In this post, we walk through setup of Apache Knox to enable perimeter security for EMR clusters.
Implementing Authorization and Auditing using Apache Ranger on Amazon EMR
Updated 3/30/2022: Amazon EMR has announced official support of Apache Ranger (link). Open-source plugin support will not be maintained moving forward and compatibility with latest versions will not be tested. We recommend customers to move to the Amazon EMR support for Apache Ranger. Ranger Presto plugin support on EMR has been deprecated. Updated 12/03/2020: Support for […]




