Automate Data Sharing with Collibra and AWS Lake Formation
By Nishant Agarwal, Director, Technical Partnerships – Collibra
By Naveen JD, Principal Data Architect – AWS
By Praveen Kanumarlapudi, Data Architect – AWS
By Venkatesh Aravamudan, Partner Solutions Architect – AWS
Enterprise data consumers often have difficulty discovering usable and trusted data. Collibra and Amazon Web Services (AWS) are collaborating to develop products and capabilities that facilitate data intelligence for customers.
The innovations include new capabilities that make it easier to migrate data and workloads to the cloud, as well as enhancements to search, collaboration, business process automation, and analytics. There are also new integrations that assist customers with data access, data governance, data quality, and observability in the cloud.
In this post, we show how Collibra and AWS Lake Formation can combine and automate data governance in a data marketplace. AWS Lake Formation is a fully managed service that makes it easier for you to build, secure, and manage data lakes.
Collibra Data Intelligence Cloud
Collibra provides an end-to-end, integrated data intelligence platform designed to automate data workflows and provide users with trustworthy data insights.
With a data catalog, flexible governance, continuous quality, built-in privacy, and active partner community, the Collibra Data Intelligence Cloud is a unified system of engagement for data that enables you to increase the productivity of every data consumer and realize your full business potential.
Collibra Data Intelligence cloud can support a wide range of analytics use cases, such as:
- Easily connect to data sources, business applications, and business intelligence (BI) tools and gain deep visibility into your data ecosystem. Provide all users with rich content and context using Collibra’s robust metadata graph.
- Bring data, business, and IT teams together to discover, trust, access, and share data assets with an easy-to-use, enterprise-wide platform. Create a common data language, automate processes, and streamline collaboration.
- Operationalize data privacy by centralizing and automating policies and workflows. Meet regulatory standards and minimize risks by governing sensitive data and ensuring secure and compliant access across the business.
- Enable business users to quickly, easily, and securely locate, comprehend, and access data through a centralized platform. Build data trust to confidently drive your business by integrating data governance, data quality, and data privacy.
Figure 1 – Collibra Data Intelligence Cloud overview.
AWS Lake Formation
AWS Lake Formation provides its own permissions model that augments the AWS Identity and Access Management (IAM) permissions model. This centrally-defined permissions model enables fine-grained access to data stored in data lakes through a simple grant/revoke mechanism.
Lake Formation enforces permissions at the table, column, and tag level. The configured Lake Formation security policies help ensure users can access only the data they are authorized to access.
A data product marketplace is a collaboration platform that facilitates the management of vast quantities of data, enabling the delivery of additional value to users throughout the entire organization.
A well-designed platform provides a broader selection and more competitive offering, enhancing the customer experience and leading to an increase in the number of customers. These consumers attract more producers to your platform, which improves the customer experience, creating a virtuous cycle of growth at a faster rate and on a larger scale.
Figure 2 – Data marketplace value proposition for data providers and consumers.
This data marketplace integration refines and extends the original intent of the data lake, enabling businesses to know where all their data resides and track its usage. It was developed in response to the growing need to enable different business domains to define, access, and control their own data and data products.
By understanding what data is consumed where, businesses can make informed decisions regarding which data to improve and how to organize data across the enterprise.
Data Sharing with Collibra + AWS Lake Formation
This post provides Collibra and AWS users with an overview of automating end-to-end data sharing. Producers can easily create, package, and deliver data products in a unified environment, and their metadata will be added to the Collibra Data Intelligence Cloud. Users can discover, evaluate, and gain access to data using the Collibra Data Intelligence Cloud.
Collibra and AWS Lake Formation enable data platform teams to provide secure data access at scale, service enterprise governance requirements, and enable self-service data analytics. This increases enterprise adoption success on the data lake usage. It also enforces data access policies at runtime, ensuring each user only views the data they are entitled.
Access restrictions can be applied at the table or column level. Data stewards can manage user access to datasets.
Solution Overview and Architecture
As shown in high-level architecture below, data consumers search and request access for data products or datasets they need access to in Collibra’s data catalog.
Once the access request is approved, Collibra will send the payload to an API endpoint configured in the Access Control Automation Framework. This framework shares resources (databases, tables) with the consumption applications.
Figure 3 – Solution overview of Access Control Automation Framework.
- Consumer browses the data marketplace in Collibra and requests access to the data of interest. The data request is approved using Collibra workflow, which then invokes the automation framework.
- Access Control Automation Framework receives the payload from Collibra, translates the payload received into AWS Lake Formation API requests, and invokes the Lake Formation API.
- AWS Lake Formation grants the necessary cross-account shares to the required consumers.
- Consumers access the data objects.
Access Control Automation Framework
Below is the architecture diagram integrating AWS and Collibra to orchestrate cross account data sharing.
Figure 4 – Solution architecture of Access Control Automation Framework.
- Data consumer searches for the data products and datasets they need access to and requests the access in Collibra. Data access request in Collibra triggers a Collibra workflow, which sends notifications to the data owner or data steward to approve/decline the data access request.
- Once the data request is approved, Collibra sends the payload to an Amazon API Gateway in the central AWS account with a database, table, column, and the AWS IAM role that needs access.
- Amazon API Gateway triggers an AWS Lambda function to orchestrate the request to do the cross-account data share using another Lambda function.
- The AWS Lambda function from the previous step invokes another Lambda function that translates the payload from Amazon API Gateway into AWS Lake Formation API and requests to grant necessary cross-account data shares.
- AWS Lake Formation from the central account enables the cross-account data share in the consumption account.
- In the consumption account, the Lambda function creates resource links in AWS Lake Formation for the shared resources from the central account, and gives permissions to the consumption role on the resource links created.
- As cross-account shares are enabled successfully, the data consumers are able to query the data objects from the consumption account using services such as Amazon Athena and AWS Glue.
Data sharing via a data marketplace makes it easy for business users to find, understand, trust, and get compliant access to data assets and products in a self-service collaborative environment enabled by the Collibra Data Intelligence Cloud and AWS Lake Formation.
Businesses looking to accelerate insights and drive strategic initiatives, new growth avenues, and optimize operations can effectively leverage existing data assets and domain expertise across the enterprise, while ensuring compliant data usage and access by following the approach in this post.
Collibra – AWS Partner Spotlight
Collibra is an AWS Partner whose data governance and catalog solutions give teams powerful tools that make it easy to consume data across the enterprise.