Posted On: Jul 21, 2022
We are excited to launch two new features that help enforce access controls with Amazon EMR on EC2 clusters (EMR Clusters). These features are supported with jobs that are submitted to the cluster using the EMR Steps API. First is Runtime Role with EMR Steps. A Runtime Role is an AWS Identity and Access Management (IAM) role that you associate with an EMR Step. An EMR Step uses this role to access AWS resources. The second is integration with AWS Lake Formation to apply table and column-level access controls for Apache Spark and Apache Hive jobs with EMR Steps.
Previously, all jobs running on an EMR cluster used the IAM role associated with the EMR cluster’s EC2 Instances to access resources. This role is called the EMR EC2 Instance Profile. For example, if a Spark job and Hive job running on the same cluster needed to access different S3 buckets, then the Instance Profile must allow access to both the buckets. With Runtime Role for EMR Steps, you specify a different IAM role for the Spark and the Hive job, thus scoping down access at a job level. This allows you to simplify access controls on a single EMR cluster that is shared between multiple tenants, wherein each tenant is isolated using IAM roles.
In addition, you can use AWS Lake Formation to apply Table and Column-level permissions with Apache Spark and Apache Hive jobs submitted as EMR Steps. AWS Lake Formation is a fully managed service that makes it easy to build, secure, and manage data lakes. AWS Lake Formation enables you to apply fine-grained access control to data stored in data lakes, through a simple grant or revoke mechanism, much like a relational database management system (RDBMS). With this feature, table and column-level permissions defined in AWS Lake Formation for an IAM Role are seamlessly enforced with Apache Hive and Apache Spark jobs submitted as EMR Steps. This allows you to further simplify access controls, and provide each job with access to specific Databases, Tables, and Columns.