AWS Lake Formation Features

AWS Lake Formation makes it easier to centrally govern, secure, and globally share data for analytics and machine learning (ML).

With Lake Formation, you can centralize data security and governance using the AWS Glue Data Catalog, letting you manage metadata and data permissions in one place with familiar database-style features. It also delivers fine-grained data access control, so you can help ensure users have access to the right data down to the row and column level. You can then scale permissions across your users.

Lake Formation also makes it easier to share data internally across your organization and externally by using AWS Data Exchange, which lets you create a data mesh or meet other data sharing needs with no data movement.

Additionally, because Lake Formation tracks data interactions by role and user, it provides comprehensive data access auditing to verify the right data was accessed by the right users at the right time.

Centralize data permissions

Lake Formation centralizes permission management on your data resources in the AWS Glue Data Catalog, including databases and tables. You can define and manage access by role for your users and applications using familiar database-like grants, bringing the simplicity of data warehouses and databases to your data lake.

Lake Formation provides a single place to manage access controls for data in your data lake. You can define security policies that restrict access to data at the database, table, column, row, and cell levels with fine-grained access control (FGAC). These policies apply to AWS IAM users and roles and to users and groups when federating through an external identity provider. You can use FGAC to access data secured by Lake Formation within Amazon Redshift SpectrumAmazon AthenaAWS Glue ETL, and Amazon EMR for Apache Spark.

AWS Lake Formation helps you consistently enforce permissions across AWS analytics services with native integrations for Amazon AthenaAmazon SageMakerAmazon Redshift, as well as AWS Glue for data integration and Amazon EMR for big data processing. Integration with AWS Identity and Access Management (IAM) authenticates users and roles, enforcing permissions across AWS analytics and ML services.

Lake Formation is integrated with third-party partners so you can extend your permissions management to the engines you prefer, such as Starburst and Dremio. Lake Formation also integrates with Privacera and Collibra so you can pull permissions or push permissions with Lake Formation and exploit the reach of permissions management capabilities in both Privacera and Collibra. See the documentation for more information on Lake Formation partnerships.

Simplify security management and governance at scale

Lake Formation makes it easier to scale permissions across users with tag-based access controls. With tag-based access controls, you can set attributes on data and apply permissions to those attributes to scale. Lake Formation tag-based access control (LF-TBAC) dynamically uses data attributes in the tags to scale permissions as data changes.

Lake Formation tags can be quickly populated with your own business rules and ontologies such as departments, product lines, data ownership, data sensitivity (for example, public or private), and data classification (for example, Social Security Number, phone numbers). You can dynamically manage your tag values by using integrated AWS services, including AWS Glue Sensitive Data Detection. AWS Glue Sensitive Data Detection can identify a variety of personally identifiable information (PII) and other sensitive data like credit card numbers, helping you tag for data audit purposes or sensitive information.

Understand and share your data

Lake Formation lets you build permissions on databases and tables within the AWS Glue Data Catalog. This allows you to use the AWS Glue Data Catalog as a hub for managing and sharing your data. With AWS Glue Data Catalog federation features, you can extend permissions to data cataloged by your own Hive metastore or with Amazon Redshift data sharing. You can set up and enforce permissions on datasets presented through the AWS Glue Data Catalog, making it easier to control access to your data no matter where it lives.

AWS Lake Formation allows for data sharing with zero ETL, making it easier to maintain control of your data while still ensuring users have access. Lake Formation simplifies data sharing, letting you create a data mesh or meet other data-sharing needs. Lake Formation cross-account and cross-Region data-linking capabilities allow users to securely share distributed data lakes across multiple AWS accounts, AWS Organizations, and AWS Regions. Lastly, with Lake Formation data sharing, you can directly control who you are sharing data with, such as selecting the exact IAM principals in other accounts to help ensure data ownership is controlled by the owner once it is shared.

AWS Lake Formation allows business-to-business data sharing external to your organization for licensing or other uses. Lake Formation integrates with AWS Data Exchange — an AWS service that lets you find, subscribe to, and use third-party data in the cloud — so you can share data with external businesses without moving or copying the data.

With Lake Formation permissions on the AWS Glue Data Catalog, users enjoy online, text-based search capabilities to provide them a better understanding of data within the AWS Glue Data Catalog. You can search for relevant data by name, content, sensitivity, or any other defined custom labels.

Monitor data access and help ensure compliance

Lake Formation provides comprehensive audit logs with Amazon CloudTrail to monitor access and compliance with centrally defined policies. You can audit data access history across analytics and machine learning (ML) services that read the data using Lake Formation. This lets you see which users or roles have attempted to access what data and when. You can access audit logs in the same way you access other CloudTrail logs using the CloudTrail APIs and console.