AWS Big Data Blog

Decentralize LF-tag management with AWS Lake Formation

In today’s data-driven world, organizations face unprecedented challenges in managing and extracting valuable insights from their ever-expanding data ecosystems. As the number of data assets and users grow, the traditional approaches to data management and governance are no longer sufficient. Customers are now building more advanced architectures to decentralize permissions management to allow for individual groups of users to build and manage their own data products, without being slowed down by a central governance team. One of the core features of AWS Lake Formation is the delegation of permissions on a subset of resources such as databases, tables, and columns in AWS Glue Data Catalog to data stewards, empowering them make decisions regarding who should get access to their resources and helping you decentralize the permissions management of your data lakes. Lake Formation has added a new capability that further allows data stewards to create and manage their own Lake Formation tags (LF-tags). Lake Formation tag-based access control (LF-TBAC) is an authorization strategy that defines permissions based on attributes. In Lake Formation, these attributes are called LF-Tags. LF-TBAC is the recommended method to use to grant Lake Formation permissions when there is a large number of Data Catalog resources. LF-TBAC is more scalable than the named resource method and requires less permission management overhead.

In this post, we go through the process of delegating the LF-tag creation, management, and granting of permissions to a data steward.

Lake Formation serves as the foundation for these advanced architectures by simplifying security management and governance for users at scale across AWS analytics. Lake Formation is designed to address these challenges by providing secure sharing between AWS accounts and tag-based access control to be able scale permissions. By assigning tags to data assets based on their characteristics and properties, organizations can implement access control policies tailored to specific data attributes. This ensures that only authorized individuals or teams can access and work with the data relevant to their domain. For example, it allows customers to tag data assets as “Confidential” and grant access to that LF-Tag to only those users who should have access to confidential data. Tag-based access control not only enhances data security and privacy, but also promotes efficient collaboration and knowledge sharing.

The need for producer autonomy and decentralized tag creation and delegation in data governance is paramount, regardless of the architecture chosen, whether it be a single account, hub and spoke, or data mesh with central governance. Relying solely on centralized tag creation and governance can create bottlenecks, hinder agility, and stifle innovation. By granting producers and data stewards the autonomy to create and manage tags relevant to their specific domains, organizations can foster a sense of ownership and accountability among producer teams. This decentralized approach allows you to adapt and respond quickly to changing requirements. This methodology helps organizations strike a balance between central governance and producer ownership, leading to improved governance, enhanced data quality, and data democratization.

Lake Formation announced the tag delegation feature to address this. With this feature, a Lake Formation admin can now provide permission to AWS Identity and Access Management (IAM) users and roles to create tags, associate them, and manage the tag expressions.

Solution overview

In this post, we examine an example organization that has a central data lake that is being used by multiple groups. We have two personas: the Lake Formation administrator LFAdmin, who manages the data lake and onboards different groups, and the data steward LFDataSteward-Sales, who owns and manages resources for the Sales group within the organization. The goal is to grant permission to the data steward to be able to use LF-Tags to perform permission grants for the resources that they own. In addition, the organization has a set of common LF-Tags called Confidentiality and Department, which the data steward will be able to use.

The following diagram illustrates the workflow to implement the solution.

The following are the high-level steps:

  1. Grant permissions to create LF-Tags to a user who is not a Lake Formation administrator (the LFDataSteward-Sales IAM role).
  2. Grant permissions to associate an organization’s common LF-Tags to the LFDataSteward-Sales role.
  3. Create new LF-Tags using the LFDataSteward-Sales role.
  4. Associate the new and common LF-Tags to resources using the LFDataSteward-Sales role.
  5. Grant permissions to other users using the LFDataSteward-Sales role.

Prerequisites

For this walkthrough, you should have the following:

  • An AWS account.
  • Knowledge of using Lake Formation and enabling Lake Formation to manage permissions to a set of tables.
  • An IAM role that is a Lake Formation administrator. For this post, we name ours LFAdmin.
  • Two LF-Tags created by the LFAdmin:
    • Key Confidentiality with values PII and Public.
    • Key Department with values Sales and Marketing.
  • An IAM role that is a data steward within an organization. For this post, we name ours LFDataSteward-Sales.
  • The data steward should have ‘Super’ access to at least one database. In this post, the data steward has access to three databases: sales-ml-data, sales-processed-data, and sales-raw-data.
  • An IAM role to serve as a user that the data steward will grant permissions to using LF-Tags. For this post, we name ours LFAnalysts-MLScientist.

Grant permission to the data steward to be able to create LF-Tags

Complete the following steps to grant LFDataSteward-Sales the ability to create LF-Tags:

  1. As the LFAdmin role, open the Lake Formation console.
  2. In the navigation pane, choose LF-Tags and permissions under Permissions.

Under LF-Tags, because you are logged in as LFAdmin, you can see all the tags that have been created within the account. You can see the Confidentiality LF-Tag as well as the Department LF-Tag and the possible values for each tag.

  1. On the LF-Tag creators tab, choose Add LF-Tag creators.

  1. For IAM users and roles, enter the LFDataSteward-Sales IAM role.
  2. For Permission, select Create LF-Tag.
  3. If you want this data steward to be able to grant Create LF-Tag permissions to other users, select Create LF-Tag under Grantable permission.
  4. Choose Add.

The LFDataSteward-Sales IAM role now has permissions to create their own LF-Tags.

Grant permission to the data steward to use common LF-Tags

We now want to give permission to the data steward to tag using the Confidentiality and Department tags. Complete the following steps:

  1. As the LFAdmin role, open the Lake Formation console.
  2. In the navigation pane, choose LF-Tags and permissions under Permissions.
  3. On the LF-Tag permissions tab, choose Grant permissions.

  1. Select LF-Tag key-value permission for Permission type.

The LF-Tag permission option grants the ability to modify or drop an LF-Tag, which doesn’t apply in this use case.

  1. Select IAM users and roles and enter the LFDataSteward-Sales IAM role.

  1. Provide the Confidentiality LF-Tag and all its values, and the Department LF-Tag with only the Sales value.
  2. Select Describe, Associate, and Grant with LF-Tag expression under Permissions.
  3. Choose Grant permissions.

This gave the LFDataSteward-Sales role the ability to tag resources using the Confidentiality tag and all its values as well as the Department tag with only the Sales value.

Create new LF-Tags using the data steward role

This step demonstrates how the LFDataSteward-Sales role can now create their own LF-Tags.

  1. As the LFDataSteward-Sales role, open the Lake Formation console.
  2. In the navigation pane, choose LF-Tags and permissions under Permissions.

The LF-Tags section only shows the Confidentiality tag and Department tag with only the Sales value. As the data steward, we want to create our own LF-Tags to make permissioning easier.

  1. Choose Add LF-Tag.

  1. For Key, enter Sales-Subgroups.
  2. For Values¸ enter DataScientists, DataEngineers, and MachineLearningEngineers.
  3. Choose Add LF-Tag.

As the LF-Tag creator, the data steward has full permissions on the tags that they created. You will be able to see all the tags that the data steward has access to.

Associate LF-Tags to resources as the data steward

We now associate resources to the LF-Tags that we just created so that Machine Learning Engineers can have access to the sales-ml-data resource.

  1. As the LFDataSteward-Sales role, open the Lake Formation console.
  2. In the navigation pane, choose Databases.
  3. Select sales-ml-data and on the Actions menu, choose Edit LF-Tags.

  1. Add the following LF-Tags and values:
    1. Key Sales-Subgroups with value MachineLearningEngineers.
    2. Key Department with value analytics.
    3. Key Confidentiality with value Public.
  2. Choose Save.

Grant permissions using LF-Tags as the data steward

To grant permissions using LF-Tags, complete the following steps:

  1. As the LFDataSteward-Sales role, open the Lake Formation console.
  2. In the navigation pane, choose Data lake permissions under Permissions.
  3. Choose Grant.
  4. Select IAM users and roles and enter the IAM principal to grant permission to (for this example, the Sales-MLScientist role).

  1. In the LF-Tags or catalog resources section, select Resources matched by LF-Tags.
  2. Enter the following tag expressions:
    1. For the Department LF-Tag, set the Sales value.
    2. For the Sales-Subgroups LF-Tag, set the MachineLearningEngineers value.
    3. For the Confidentiality LF-Tag, set the Public value.

Because this is a machine learning (ML) and data science user, we want to give full permissions so that they can manage databases and create tables.

  1. For Database permissions, select Super, and for Table permissions, select Super.

  1. Choose Grant.

We now see the permissions granted to the LF-Tag expression.

Verify permissions granted to the user

To verify permissions using Amazon Athena, navigate to the Athena console as the Sales-MLScientist role. We can observe that the Sales-MLScientist role now has access to the sales-ml-data database and all the tables. In this case, there is only one table, sales-report.

Clean up

To clean up your resources, delete the following:

  • IAM roles that you may have created for the purposes of this post
  • Any LF-Tags that you created

Conclusion

In this post, we discussed the benefits of decentralized tag management and how the new Lake Formation feature helps implement this. By granting permission to producer teams’ data stewards to manage tags, organizations empower them to use their domain knowledge and capture the nuances of their data effectively. Furthermore, granting permission to data stewards enables them to take ownership of the tagging process, ensuring accuracy and relevance.

The post illustrated the various steps involved in decentralized Lake Formation tag management, such as granting permission to data stewards to create LF-Tags and use common LF-Tags. We also demonstrated how the data steward can create their own LF-Tags, associate the tags to resources, and grant permissions using tags.

We encourage you to explore the new decentralized Lake Formation tag management feature. For more details, see Lake Formation tag-based access control.


About the Authors

Ramkumar Nottath is a Principal Solutions Architect at AWS focusing on Analytics services. He enjoys working with various customers to help them build scalable, reliable big data and analytics solutions. His interests extend to various technologies such as analytics, data warehousing, streaming, data governance, and machine learning. He loves spending time with his family and friends.

Mert Hocanin is a Principal Big Data Architect at AWS within the AWS Lake Formation Product team. He has been with Amazon for over 10 years, and enjoys helping customers build their data lakes with a focus on governance on a wide variety of services. When he isn’t helping customers build data lakes, he spends his time with his family and traveling.