AWS Big Data Blog

Enhance data security with fine-grained access controls in Amazon DataZone

Fine-grained access control is a crucial aspect of data security for modern data lakes and data warehouses. As organizations handle vast amounts of data across multiple data sources, the need to manage sensitive information has become increasingly important. Making sure the right people have access to the right data, without exposing sensitive information to unauthorized individuals, is essential for maintaining data privacy, compliance, and security.

Today, Amazon DataZone has introduced fine-grained access control, providing you granular control over your data assets in the Amazon DataZone business data catalog across data lakes and data warehouses. With the new capability, data owners can now restrict access to specific records of data at row and column levels, instead of granting access to the entire data asset. For example, if your data contains columns with sensitive information such as personally identifiable information (PII), you can restrict access to only the necessary columns, making sure sensitive information is protected while still allowing access to non-sensitive data. Similarly, you can control access at the row level, allowing users to see only the records that are relevant to their role or task.

In this post, we discuss how to implement fine-grained access control with row and column asset filters using this new feature in Amazon DataZone.

Row and column filters

Row filters enable you to restrict access to specific rows based on criteria you define. For instance, if your table contains data for two regions (America and Europe) and you want to make sure that employees in Europe only access data relevant to their region, you can create a row filter that excludes rows where the region is not Europe (for example, region != 'Europe'). This way, employees in America won’t have access to Europe’s data.

Column filters allow you to limit access to specific columns within your data assets. For example, if your table includes sensitive information such as PII, you can create a column filter to exclude PII columns. This makes sure subscribers can only access non-sensitive data.

The row and column asset filters in Amazon DataZone enable you to control who can access what using a consistent, business user-friendly mechanism for all of your data across AWS data lakes and data warehouses. To use fine-grained access control in Amazon DataZone, you can create row and column filters on top of your data assets in the Amazon DataZone business data catalog. When a user requests a subscription to your data asset, you can approve the subscription by applying the appropriate row and column filters. Amazon DataZone enforces these filters using AWS Lake Formation and Amazon Redshift, making sure the subscriber can only access the rows and columns that they are authorized to use.

Solution overview

To demonstrate the new capability, we consider a sample customer use case where an electronics ecommerce platform is looking to implement fine-grained access controls using Amazon DataZone. The customer has multiple product categories, each operated by different divisions of the company. The platform governance team wants to make sure each division has visibility only to data belonging to their own categories. Additionally, the platform governance team needs to adhere to the finance team requirements that pricing information should be visible only to the finance team.

The sales team, acting as the data producer, has published an AWS Glue table called Product sales that contains data for both Laptops and Servers categories to the Amazon DataZone business data catalog using the project Product-Sales. The analytic teams in both the laptop and server divisions need to access this data for their respective analytics projects. The data owner’s objective is to grant data access to consumers based on the division they belong to. This means giving access to only rows of data with laptop sales to the laptops sales analytics team, and rows with servers sales to the server sales analytics team. Additionally, the data owner wants to restrict both teams from accessing the pricing data. This post demonstrates the implementation steps to achieve this use case in Amazon DataZone.

The steps to configure this solution are as follows:

  1. The publisher creates asset filters for limiting access:
    1. We create two row filters: a Laptop Only row filter that limits access to only the rows of data with laptop sales, and a Server Only row filter that limits access to the rows of data with server sales.
    2. We also create a column filter called exclude-price-columns that excludes the price-related columns from the Product Sales
  2. Consumers discover and request subscriptions:
    1. The analyst from the laptops division requests a subscription to the Product Sales data asset.
    2. The analyst from the servers division also request a subscription to the Product Sales data asset.
    3. Both subscription requests are sent to the publisher for approval.
  3. The publisher approves the subscriptions and applies the appropriate filters:
    1. The publisher approves the request from the analysts in the laptops division, applying the Laptop Only row filter and the exclude-price-columns columns filter.
    2. The publisher approves the request from the consumer in the servers division, applying the Server Only row filter and the exclude-price-columns columns filter.
  4. Consumers access the authorized data in Amazon Athena:
    1. After the subscription is approved, we query the data in Athena to make sure that the analyst from the laptops division can now access only the product sales data for the Laptop
    2. Similarly, the analyst from the servers division can access only the product sales data for the Server
    3. Both consumers can see all columns except the price-related columns, as per the applied column filter.

The following diagram illustrates the solution architecture and process flow.

Prerequisites

To follow along with this post, the publisher of the product sales data asset must have published a sales dataset in Amazon DataZone.

Publisher creates asset filters for limiting access

In this section, we detail the steps the publisher takes to create asset filers.

Create row filters

This dataset contains the product categories Laptops and Servers. We want to restrict access to the dataset that is authorized based on the product category. We use the row filter feature in Amazon DataZone to achieve this.

Amazon DataZone allows you to create row filters that can be used when approving subscriptions to make sure that the subscriber can only access rows of data as defined in the row filters. To create a row filter, complete the following steps:

  1. On the Amazon DataZone console, navigate to the product-sales project (the project to which the asset belongs).
  2. Navigate to the Data tab for the project.
  3. Choose Inventory data in the navigation pane, then the asset Product Sales, where you want to create the row filter.

You can add row filters for assets of type AWS Glue tables or Redshift tables.

  1. On the asset detail page, on the Asset filters tab, choose Add asset filter.

We create two row filters, one each for the Laptops and Servers categories.

  1. Complete the following steps to create a laptop only asset row filter:
    1. Enter a name for this filter (Laptop Only).
    2. Enter a description of the filter (Allow rows with product category as Laptop Only).
    3. For the filter type, select Row filter.
    4. For the row filter expression, enter one or more expressions:
      1. Choose the column Product Category from the column dropdown menu.
      2. Choose the operator = from the operator dropdown menu.
      3. Enter the value Laptops in the Value field.
    5. If you need to add another condition to the filter expression, choose Add condition. For this post, we create a filter with one condition.
    6. When using multiple conditions in the row filter expression, choose And or Or to link the conditions.
    7. You can also define the subscriber visibility. For this post, we kept the default value (No, show values to subscriber).
    8. Choose Create asset filter.
  2. Repeat the same steps to create a row filter called Server Only, except this time enter the value Servers in the Value field.

Create column filters

Next, we create column filters to restrict access to columns with price-related data. Complete the following steps:

  1. In the same asset, add another asset filter of type column filter.
  2. On the Asset filters tab, choose Add asset filter.
  3. For Name, enter a name for the filter (for this post, exclude-price-columns).
  4. For Description, enter a description of the filters (for this post, exclude price data columns).
  5. For the filter type, select Column to create the column filter. This will display all the available columns in the data asset’s schema.
  6. Select all columns except the price-related ones.
  7. Choose Create asset filter.

Consumers discover and request subscriptions

In this section, we switch to the role of an analyst from the laptop division who is working within the project Sales Analytics - Laptop. As the data consumer, we search the catalog to find the Product Sales data asset and request access by subscribing to it.

  1. Log in to your project as a consumer and search for the Product Sales data asset.
  2. On the Product Sales data asset details page, choose Subscribe.
  3. For Project, choose Sales Analytics – Laptops.
  4. For Reason for request, enter the reason for the subscription request.
  5. Choose Subscribe to submit the subscription request.

Publisher approves subscriptions with filters

After the subscription request is submitted, the publisher will receive the request, and they can approve it by following these steps:

  1. As the publisher, open the project Product-Sales.
  2. On the Data tab, choose Incoming requests in the left navigation pane.
  3. Locate the request and choose View request. You can filter by Pending to see only requests that are still open.

This opens the details of the request, where you can see details like who requested the access, for what project, and the reason for the request.

  1. To approve the request, there are two options:
    1. Full access – If you choose to approve the subscription with full access option, the subscriber will get access to all the rows and columns in our data asset.
    2. Approve with row and column filters – To limit access to specific rows and columns of data, you can choose the option to approve with row and column filters. For this post, we use both filters that we created earlier.
  2. Select Choose filter, then on the dropdown menu, choose the Laptops Only and pii-col-filter
  3. Choose Approve to approve the request.

After access is granted and fulfilled, the subscription looks as shown in the following screenshot.

  1. Now let’s log in as a consumer from the server division.
  2. Repeat the same steps, but this time, while approving the subscription, the publisher of sales data approves with the Server only The other steps remain the same.

Consumers access authorized data in Athena

Now that we have successfully published an asset to the Amazon DataZone catalog and subscribed to it, we can analyze it. Let’s log in as a consumer from the laptop division.

  1. In the Amazon DataZone data portal, choose the consumer project Sales Analytics - Laptops.
  2. On the Schema tab, we can view the subscribed assets.
  3. Choose the project Sales Analytics - Laptops and choose the Overview
  4. In the right pane, open the Athena environment.

We can now run queries on the subscribed table.

  1. Choose the table under Tables and views, then choose Preview to view the SELECT statement in the query editor.
  2. Run a query as the consumer of Sales Analytics - Laptops, in which we can view data only with product category Laptops.

Under Tables and views, you can expand the table product_sales. The price-related columns are not visible in the Athena environment for querying.

  1. Next, you can switch to the role of analyst from the server division and analyze the dataset in similar way.
  2. We run the same query and see that under product_category, the analyst can see Servers only.

Conclusion

Amazon DataZone offers a straightforward way to implement fine-grained access controls on top of your data assets. This feature allows you to define column-level and row-level filters to enforce data privacy before the data is available to data consumers. Amazon DataZone fine-grained access control is generally available in all AWS Regions that support Amazon DataZone.

Try out the fine-grained access control feature in your own use case, and let us know your feedback in the comments section.


About the Authors

Deepmala Agarwal works as an AWS Data Specialist Solutions Architect. She is passionate about helping customers build out scalable, distributed, and data-driven solutions on AWS. When not at work, Deepmala likes spending time with family, walking, listening to music, watching movies, and cooking!

Leonardo Gomez is a Principal Analytics Specialist Solutions Architect at AWS. He has over a decade of experience in data management, helping customers around the globe address their business and technical needs. Connect with him on LinkedIn.

Utkarsh Mittal is a Senior Technical Product Manager for Amazon DataZone at AWS. He is passionate about building innovative products that simplify customers’ end-to-end analytics journeys. Outside of the tech world, Utkarsh loves to play music, with drums being his latest endeavor.