Posted On: Aug 9, 2019
Amazon EMR now supports enforcing AWS Lake Formation-based fine-grained access control policies for Apache Spark. You can enforce Databases, Tables, and Columns-level policies for data stored in Amazon S3. Policies defined in AWS Lake Formation are enforced when Spark applications are submitted using Apache Zeppelin or EMR Notebooks. Also included in this release is SAML-based single sign-on (SSO) to EMR Notebooks and Apache Zeppelin, simplifying authentication for organizations using Active Directory Federation Services (ADFS), Okta, or Auth0. With the combination of SAML-based SSO, and AWS Lake Formation policies, customers can securely run Spark applications on shared multi-tenant clusters with column-level access to data stored in Amazon S3.
AWS Lake Formation is a fully managed service that makes it easier for customers to build, secure, and manage data lakes. Lake Formation simplifies and automates many of the complex manual steps required to create a data lake, including collecting, cleaning, and cataloging data, and securely making that data available for analytics. Before Lake Formation, customers had to set up data access roles and enforce security policies across their storage and each of their different analytics engines, and update the security policies when permissions change or new end users are added. With Lake Formation, You can now define policies once and enforce them in the same way, for services including Amazon EMR, Amazon Redshift Spectrum, AWS Glue and Amazon Athena.
The integration between AWS Lake Formation and Amazon EMR is in beta, and is available with the EMR 5.26.0 release in the US East (N. Virginia), and US West (Oregon) regions.
To get started, see Integrating Amazon EMR with AWS Lake Formation (Beta).
You can stay up to date on EMR releases by subscribing to the EMR release notes feed. Use the icon at the top of the EMR Release Guide to link the feed URL directly to your favorite feed reader.