Guidance for Connecting CDPs to Data Lakes with AWS Clean Rooms
Overview
How it works
This architecture diagram shows how marketers using customer data platforms (CDPs) can set up AWS Clean Rooms collaborations with publishing partners to combine first- and third-party customer data directly.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
Amazon CloudWatch, which continuously monitors operations and enables access to log files, is configurable so you can monitor the reliability, availability, and performance of AWS Clean Rooms. AWS CloudTrail automatically tracks event histories, enabling you to access information about who made requests to AWS Clean Rooms, the IP address from which the request was made, when it was made, and additional details. You can also configure an event trail for more details in tracking API requests.
Read the Operational Excellence whitepaperSecurity
This Guidance lets you use scoped-down AWS Identity and Access Management (IAM) policies to provide specific users and roles access. Using IAM, you can apply the principle of least privilege to restrict who can access and run queries on AWS Clean Rooms.
Read the Security whitepaperReliability
Amazon S3 stores multiple copies of data across Availability Zones, providing 99.999999999 percent durability of the data stored within S3 buckets. Additionally, AWS Glue and AWS Clean Rooms are serverless and fully managed by AWS, so the overall infrastructure is elastic, highly available, and fault tolerant, with built-in reliability and resiliency.
Read the Reliability whitepaperPerformance Efficiency
AWS Glue crawlers enable you to quickly scan and define the schemas for your data and register these schemas to your Data Catalog. You can configure these crawlers to run on a schedule or use an invocation to crawl source data. You can also configure AWS Glue to scale up or down within a specified range of AWS Glue job workers so that it only uses as much compute capacity as needed. Additionally, AWS Clean Rooms enables you to share subsets of your data quickly and securely, and it only provisions the necessary capacity to implement a query.
Read the Performance Efficiency whitepaperCost Optimization
Amazon S3 provides low-cost storage for building data lakes and storing data. It also provides different storage tiers and lifecycle policies to optimize storage. For example, you can use Amazon S3 Intelligent-Tiering to provide automated data archiving based on usage or implement lifecycle policies to move data between storage tiers, helping you optimize costs. Additionally, this Guidance uses pay-as-you-go services, so you pay only for what you consume.
Read the Cost Optimization whitepaperSustainability
AWS Clean Rooms enables you to share only subsets of your data, reducing the need for data duplication across multiple platforms. Additionally, this Guidance reduces the need for CDPs to create custom solutions that might require additional compute resources. AWS Glue and AWS Clean Rooms are both serverless services, which means they scale seamlessly to meet compute needs, such as by provisioning only the compute resources required to run a query. This enables you to avoid unnecessary compute and waste of resources so that you use the least amount of carbon generation necessary.
Read the Sustainability whitepaperDisclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages