AWS Partner Network (APN) Blog
Simplifying Data Management with CloudCover Data Pipes on AWS
By Arun Chandrasekaran, Data Solution Consultant – CloudCover
By Evangeline Andal, Public Sector Solutions Architect, Education – AWS
CloudCover |
Many companies aspire to become more data-driven to enhance their business decision-making. Because of this, data is often regarded as an organization’s most valuable assets.
To get the most out of your data, you must first collect it and then transform it into a reusable format. However, managing and utilizing data held in numerous organizational silos can be challenging.
The key to extracting value from data is to foster cross-organizational collaboration in order to unite teams and reduce time spent on data discovery, preparation, and governance.
Amazon Web Services (AWS) has broad range of analytic services to help businesses become more data-driven, but most companies don’t have a team to stitch together these AWS services and maintain best practices.
CloudCover is an AWS Select Tier Services Partner and AWS Marketplace Seller with the DevOps Competency that developed a unified solution to remove the complexity of cloud and data ecosystems, allowing organizations to focus on delivering value from their data.
In this post, we’ll show you how CloudCover Data Pipes leverages AWS services to provide a cloud-native data management platform allowing you to gain insights from your trusted data.
Data Pipes, a CloudCover Product
Data Pipes is a cloud-native data management platform conceived through CloudCover’s success in consulting, designing, building, and managing cloud infrastructure. It simplifies data management by removing the need to manage any underlying AWS services, and allows companies to focus on extracting value from their data and collaborating across organizations.
Data Pipes is deployed in the customer’s own AWS account, ensuring customers retain data ownership and control. As an advocate for security, CloudCover is ISO 27001 and SOC 2 certified, and HITRUST accredited.
Data Pipes has three guiding principles:
- End-to-end workflow: Data Pipes provides a unified platform for users to ingest, prepare, discover, govern, and analyze data. It helps teams reduce the time needed to build an end-to-end workflow from months to days, and users can easily discover data, identify its origin, perform analysis, and share it across the organization.
Figure 1 – Data Pipes console.
- Future proof: Data Pipes leverages AWS cloud-native services to meet the demand for scalability, availability, and agility. This enables the constant and quick delivery of new features.
- Secure by design: Data Pipes is deployed in the customer’s AWS environment, giving them full control and ownership of their data. It also inherits the security and compliance controls from AWS managed services and infrastructure, and promotes data ethics and security in your business.
Solution Details
To begin, identify your data sources. Data Pipes Ingestion collects and imports data from these sources into the data lake, and each table ingested is automatically registered in the catalog.
Data Pipes Catalog provides a centralized, searchable inventory of data sources to help users understand the context, structure, and relationships between datasets. Empowering end users to directly access the data for exploration and consumption using business intelligence or machine learning tools without intervention from IT or data professionals.
Data Pipes structures datasets into domains, which are areas of the organization that produce, own, and consume data. Each domain is responsible for governing their own data, providing a more flexible and scalable model to safeguard sensitive data and protect it against unauthorized access.
Figure 2 – Key features of Data Pipes.
Data Ingestion
Data Pipes comes with an extensive list of pre-built data source connectors for on-premises and cloud databases, software-as-a-service (SaaS) applications, documents, NoSQL sources, and more to quickly load data into your cloud data environment.
Data Catalog
At the core of Data Pipes is the data catalog that provides users a platform for data discovery. It allows you to manage, organize, and search for data to accelerate data-driven business decision. Users can consume the data using Amazon QuickSight and Amazon SageMaker, or any JDBC-compliant tool of their choice.
Data Governance
Data Pipes provides a powerful suite of tools to control access to data for users, leveraging AWS Identity and Access Management (IAM) and AWS Lake Formation. Authorized users can create rules that grant users access to data based on the role and department. These rules provides granular access control to specific tables, and individual columns and rows.
Individual columns in Data Pipes can be marked as classified, which allows rules to be easily created at scale to deny access for unauthorized users to classified data. Tables containing classified data can also be completely removed from the catalog view.
In addition, Data Pipes automatically keeps an audit trail of data access and permission changes by the users in real-time. These trails can be easily exported for any audit purposes.
Solution Diagram
Data Pipes is a containerized application deployed inside the customer’s virtual private cloud (VPC). It runs on Amazon Elastic Kubernetes Service (Amazon EKS) to automatically manage the availability, scalability, and management of the containers.
Figure 3 – Solution diagram.
Data Pipes leverages Amazon Macie to automatically identify sensitive information, such as personally identifiable information (PII) or any custom data type you define. Users have the options to protect sensitive information by choosing masking, tokenizing, or tagging the data.
Data Pipes fully-integrates with AWS Glue DataBrew, a visual data preparation tool, to perform various transformations such as cleaning and normalizing of data without writing a single code. Transformed data will be stored in the data lake in Amazon Simple Storage Service (Amazon S3) which can be queried interactively using Amazon Athena.
AWS Lake Formation is used to ensure data governance and access control. This simplifies security management and governance at scale, and provides fine-grained permission across your data lake.
Data can be consumed using data visualization tool such as Amazon QuickSight or train it for machine learning using Amazon SageMaker. Data Pipes also supports JDBC-compliant applications.
How it Works
As shown in Figure 4 below, Data Pipes ingestion simplifies the loading of data into AWS in three steps. From a no-code interface, users can create data ingestion pipelines from different data sources into Amazon Athena and perform data preparation using AWS Glue DataBrew.
Any sensitive data is automatically identified, and data owners can select how to secure it before storing into the data lake. Data Pipes supports both column- and row-level security.
Figure 4 – Data Pipes ingestion.
At the core of Data Pipes is the catalog (Figure 5), where users can discover and search for existing data from a search engine. Data owners can create and edit the metadata of their datasets so consumers can better understand what they’re searching for.
Figure 5 – Data Pipes catalog.
Clicking into a table from the catalog will bring the user to a table details page, where you can see more information about a table such as metadata, governance, description, origin, usage, and relationship to other datasets using data lineage. You can also send an access request to the data owner to get full access to the table.
Data Pipes enables data consumption directly from the catalog using AWS analytics and machine learning products such as Amazon QuickSight, Amazon Athena, and Amazon SageMaker.
Next, Figure 6 shows the Data Pipes business glossary, which allows domain owners to create common definitions and business terms and map them to data elements searchable via the Catalog.
Figure 6 – Business glossary.
Below, you can see how Data Pipes allows data owners to define security classification for tables and columns that will be used for access control management. Data owners can grant or deny access to specific tables and columns based on the user’s role and department.
Figure 7 – Access control.
Lastly, Figure 8 shows the data lineage which tracks the flow of data helping users understand where it originated and how it’s being utilized, thus ensuring users they are working with the correct dataset.
Figure 8 – Data lineage.
Conclusion
The key to becoming a data-driven organization is to foster collaboration between all parts of the organization, allowing teams to discover, consume, and share data and analytics seamlessly.
Data Pipes, a CloudCover product, empowers users to easily consume data from an easy-to-use interface, while giving data owners peace of mind with control over who has access to what data. Taking a cloud-native approach future proofs the platform with the continuous advancement of cloud services, allowing users to easily gain access to the latest technologies.
To learn more about Data Pipes, check out datapipes.io. You can also explore Data Pipes in AWS Marketplace.
CloudCover – AWS Partner Spotlight
CloudCover is an AWS Competency Partner that developed a unified solution to remove the complexity of cloud and data ecosystems, allowing organizations to focus on delivering value from their data.