Posted On: Aug 8, 2023

Amazon EMR Studio is an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug big data and analytics applications written in PySpark, Python, Scala, and R. EMR Studio provides fully managed Jupyterlab Notebooks and tools such as Spark UI and YARN Timeline Service to simplify debugging. Today, we are excited to announce that EMR Studio workspaces now supports applying fine-grained data access control with AWS Lake Formation when accessing data through EMR on EC2 clusters.

When you connect to EMR clusters from EMR Studio workspaces, you can now choose the IAM role (called runtime IAM Role) that you want to connect with. Apache Spark interactive notebooks will access only the data and resources permitted by policies attached to this runtime role. When data is accessed from data lakes managed with AWS Lake Formation, you can enforce table and column-level access using policies attached to this runtime role. With this new capability, multiple users can connect to the same EMR cluster from their EMR Studio workspaces, each using a runtime role scoped with customized data access permissions. User sessions are completely isolated from one another on the shared cluster. This can also simplify provisioning of EMR clusters for interactive use cases, thus reducing operational overhead and saving costs.

This feature is generally available when connecting to Amazon EMR on EC2 clusters on release versions 6.11+ in all regions where EMR Studio is supported. To learn more, see the EMR documentation.