Protecting Sensitive Data with Collibra Protect and AWS Lake Formation
By Wouter Mertens, Sr. Director, Product Management – Collibra
By Nishant Agarwal, Director, Technical Partnerships – Collibra
By Marc Campabadal, Technical Marketing – Collibra
By Venkatesh Aravamudan and Leon Stigter – AWS
What’s the point of data if you can’t get your hands on (or mind around) it?
In today’s data-driven world, ensuring the security and proper management of sensitive information is paramount. Collibra Protect and AWS Lake Formation offer a powerful combination to address the growing challenges of enterprise data access governance.
Collibra Protect, part of the Collibra Data Intelligence Cloud, protects sensitive data and makes it available, or partially available, to specified groups of users. AWS Lake Formation is a fully managed serverless service that allows you to build clean and secure data lakes in days.
In this post, we’ll show you how to start building data access policies at scale. Collibra is an AWS Partner and AWS Marketplace Seller that provides data governance and catalog solutions giving teams tools that make it easy to consume data across the enterprise.
Challenge in Enterprises
A common enterprise challenge is that different groups of people need varying access levels to the same data. Data producers require a different level of access to data than data consumers, and financial analysts use company data differently than HR data analysts.
With Collibra Protect, you get intelligent controls for better results with less risk. You grant access to individuals and protect sensitive information based on access rules and data protection standards.
All of your rules and standards with different data access levels are managed through the Collibra platform and pushed to the data source. The aim is to promote a safe data-open culture in organizations.
Simplified Access Governance
The goal of Collibra Protect is to centralize and simplify access governance and remove the need for repetitive action and approval. Data access and privacy management promotes an ethical company standard, giving permission to view information only to those that need it. Collibra Protect allows you to perform these actions accordingly.
As an example of how Collibra Protect is used, consider a data steward giving everyone access to a dataset. Based on data categories in Collibra Protect, the steward can allow or deny access to parts of that dataset to groups within the organization—this is known as differential access. It’s suggested that rules/standards are grouped together (by business processes, for example) so you don’t have to make a rule or standard for every dataset.
Why AWS Lake Formation?
AWS Lake Formation provides a single place to manage access controls for data in your data lake. You can define security policies that restrict access to data at the database, table, column, row, and cell levels. These policies apply to AWS Identity and Access Management (IAM) users and roles, and to users and groups when using SAML-based identity providers (IdPs).
Data filters in AWS Lake Formation can be used to govern access at row, column, and cell levels. Tag-based access control can be achieved by defining LF-tags and attaching them to databases, tables, or columns. This allows you to scale data governance, manage hundreds or even thousands of data permissions, and share controlled access across analytic, machine learning (ML), and extract, transform, and load (ETL) services for consumption.
Collibra Protect + AWS Lake Formation Benefits
With the combination of both products, organizations can:
- Allow data stewards to control access to their datasets or data categories without the need of technical expertise or support from IT departments.
- Leverage Collibra’s capabilities to identify, classify, and tag sensitive data within the organization’s data landscape and control the access from that structure.
- Audit and evaluate the rules and standards associated with data.
- Leverage the integrations and capabilities of Lake Formation to control access at a granular level for AWS products that support it.
- Have a single pane to look and control access in the AWS environment.
The architecture diagram below shows how Collibra Protect residing in Collibra’s cloud platform integrates with AWS Lake Formation and enforces data protection policies in various underlying services.
Figure 1 – Collibra and AWS Lake Formation integration.
How it Works
Collibra Protect relies on the creation of protection standards and access rules. Protection standards apply data protection to the source data based on how the data is classified or categorized within the Collibra platform. Access rules grant access to a less restrictive view of the data that overrides the restrictions from protection standards.
Given a table with a column for personal emails, for example, we can create a protection standard that will hide that column to all users, and then create an access rule that shows that column to the users in the marketing group to launch an email campaign.
The key benefit of using Collibra Protect is that with a few clicks you can make sure your business-critical data is accessible by the right users and your sensitive data is protected.
Collibra Protect makes use of AWS Lake Formation’s Data Filter feature to protect data. Whenever a protection standard or access rule is set up then it’s pushed to AWS Lake Formation and a data filter is created automatically.
Each data filter belongs to a specific table and includes the following information:
- Filter name (this will be prefixed with collibra/assetid).
- Table name.
- Name of the database that contains the table.
- Column specification – list of columns to include or exclude in query results.
- Row filter expression – expression that specifies the rows to include in query results. With some restrictions, the expression has the syntax of a WHERE clause in the PartiQL language. To specify all rows, enter true in the console or use AllRowsWildcard in API calls.
In this section, we are going to create a data protection standard to hide all of the columns that contain personal emails across the databases. Then, we will allow the marketing team to access a dataset that contains personal information like first name, last name, and personal emails by creating a data access rule. They’ll be able to see the name and email to inform customers about a promotion, but we’ll hide the last name for compliance.
Data Protection Standard
After clicking on Create a Data Protection Standard the setup menu will show up. We’ll start by assigning a name and a description.
Next, select the group Everyone in the drop-down menu. Then select data classification and choose personal email. Data classes are a form of a tag that are assigned to the columns in the Collibra Data Catalog that are used to provide context to the data itself.
Figure 2 – Data protection standard setup.
After saving the standard, it will result in the following in AWS Lake Formation:
- Create an LF-tag
- Assign the tag to all columns identified as personal email
- For each of the tables:
- Create data filter to exclude columns tagged as personal email
- Assign the data filters to all groups
Now, all of the columns identified as “personal email” have been hidden in AWS. Let’s proceed by creating an access rule.
After clicking on Create a Data Access Rule the setup menu will show up. As in the standard setup, we’ll give it a name and a description.
Now, we want the marketing team to be able to access the email information and the names of the customers to send out a promotion. We’ll select the group marketing and the asset customer by country which is a table that contains the information the team needs.
As an optional feature, we’ll hide the “last name” information since the team doesn’t need it and this way we secure that sensitive information. We’ll do so by selecting data classification and choosing last name in the drop-down menu.
Figure 3 – Data access rule setup.
Collibra Protect offers advanced filtering controls. For example, we could show only the customers from a specific country or region. For simplicity, though, we’ll leave it empty and click the Save button.
The resulting filter and data access being created automatically in AWS Lake Formation are:
- For each of the tables with the column identified as “last name”:
- Create a data filter with exclude columns
- Assign the data filters to Marketing in AWS Lake Formation; note that the table “customer” is targeted by the Collibra dataset “Customer by country”
Figure 4 – Resulting data filter in AWS Lake Formation.
- For users with Marketing role:
- Grant access to table “Customer”
- Apply the previously created data filter
Figure 5 – Assigned data filter to Marketing in AWS Lake Formation.
Collibra Protect along with AWS Lake Formation is a powerful combination that offers a robust, comprehensive solution to address the growing challenges of enterprise data access governance.
As businesses continue to rely on vast amounts of data to make informed decisions, it becomes increasingly important to manage and protect sensitive information while providing the necessary access to relevant stakeholders.
By leveraging Collibra Protect’s centralized access governance and data protection capabilities alongside AWS Lake Formation’s serverless service for building clean and secure data lakes, organizations can realize these benefits:
- Effectively strike a balance between data openness and security.
- Simplify access governance.
- Promote ethical data access and privacy management.
- Scale data governance across your data sources and services.
By harnessing the power of Collibra Protect and AWS Lake Formation, organizations can confidently navigate the complex data landscape and facilitate the secure data sharing environment that can drive business growth.
Collibra – AWS Partner Spotlight
Collibra is an AWS Partner whose data governance and catalog solutions give teams powerful tools that make it easy to consume data across the enterprise.