Posted On: Dec 22, 2023

Amazon EMR Serverless now supports AWS Lake Formation for fine-grained data access control with Apache Spark. This enables you to enforce database, table, column, row and cell-level policies for data stored in Amazon S3 from your EMR Serverless Spark jobs. Policies that you define in Lake Formation take effect when you run Spark applications using EMR Studio, AWS CLI, or job orchestrators such as Amazon Managed Workflows for Apache Airflow and AWS Step Functions.

Lake Formation makes it simple to build, secure, and manage data lakes. It allows you to define fine-grained access controls through grant and revoke statements, similar to those used with relational database management systems (RDBMS), and automatically enforce those policies via compatible engines like Athena, EMR on EC2, and Redshift Spectrum. With today's launch, the same Lake Formation rules that you set up for use with other services like Athena now apply to your Spark jobs on EMR Serverless, further simplifying security and governance of your data lakes.

Fine-grained access control with Apache Spark on EMR Serverless is in preview, and is available with the EMR 6.15 release in Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US East (N. Virginia, Ohio), and US West (N. California, Oregon). To get started, see Using AWS Lake Formation with Amazon EMR Serverless.