This Guidance demonstrates how to import data from an Adobe Experience Platform (AEP) to AWS Clean Rooms. Using AWS services, customers can import their profile information from AEP into their AWS account, then process, normalize, and prepare it for marketing campaigns.
The AEP admin schedules a "daily" export job in the AEP to push the profile data to the customer’s Amazon Simple Storage Service (Amazon S3) bucket within a pre-defined prefix.
Create a rule in Amazon EventBridge to schedule the data processing in AWS Step Functions once a day.
The AWS Lambda function decrypts the files from the source Amazon S3 bucket using AWS Key Management Service (AWS KMS) and places them in a different prefix for AWS Glue DataBrew to pick up and process.
AWS Glue DataBrew recipe will be executed to ingest the data from the decrypted source Amazon S3 bucket:prefix location. The data will be normalized, and Personal Identifiable Information (PII) data will be hashed (SHA256).
The output of the AWS Glue DataBrew recipe will be written to the target Amazon S3 bucket:prefix location in parquet format. The output file setting will be an "overwrite" as the profile data is a full refresh. An AWS Glue Crawler job is triggered to "refresh" the table definition and its associated meta-data.
The AWS Lambda function starts after the AWS Glue Crawler completes its run. The Lambda will move the source data files to an "archive" prefix location as part of clean-up activity.
The Lambda function starts after the AWS Glue Crawler completes its run. The Lambda will move the source data files to an "archive" prefix location as part of clean-up activity.
An event will be published to Amazon Simple Notification Service (Amazon SNS) to inform the user that the new data files are now available for consumption within AWS Clean Rooms.
The user utilizes the latest data within AWS Clean Rooms to collaborate with other data producers.
Security, Logging and Audit
The solution uses the below AWS services to promote security and access control:
- AWS Identity and Access Management (IAM) – least privilege access to specific resources and operations
- AWS KMS – provide encryption for data at rest and data in transit (using PGP encryption of data files)
- AWS Secrets Manager – provide hashing keys for PII data
- Amazon CloudWatch – monitor logs and metrics across all services used in this solution
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
This Guidance uses a multi-tier architecture where every tier is independently scalable, deployable, and testable. The various facets of this multi-tier architecture are compute, storage, data management (catalog), and orchestration that are decoupled from each other.
Observability is built-in, with every service publishing metrics to CloudWatch where dashboards and alarms can be configured.
Resources are protected using an Amazon S3 bucket to block public access. The data at rest in Amazon S3 is encrypted using Amazon S3-managed keys (SSE-S3). The data in transit from the external system into Amazon S3 is encrypted (with AWS KMS) and transferred over HTTPS.
Every service or technology chosen for each architecture layer is serverless and fully managed by AWS, making the overall architecture elastic, highly available, and fault-tolerant. Step Functions include error handling and notifications/alarms in case of failures.
CloudWatch logs and metrics are used to track logs and events. CloudWatch alarms are configured to send notifications when thresholds are crossed.-
The selection of AWS managed services for this architecture are purpose-built for Extract, Transform, and Load (ETL) applications (using AWS Glue and AWS Step Functions). A detailed implementation guide is provided for the user to experiment and use this Guidance within their AWS account. The serverless architecture reduces the amount of underlying infrastructure you need to manage, allowing you to focus on solving your business needs. You can use automated deployments to deploy the isolated customer data platform (CDP) tenants into any region quickly, providing data residence and reduced latency. In addition, you can experiment and test each CDP layer, enabling you to perform comparative testing against varying load levels, configurations, and services.
Using serverless technologies, you only pay for the resources you consume. As the data ingestion velocity increases and decreases, the costs will align with usage. When AWS Glue is performing data transformations, you only pay for the infrastructure while the processing is occurring. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts and measure costs specific to each tenant, application module, and service.
IAM policies are created using the least-privilege access, such that every policy is restricted to the specific resource and operation
By using serverless services extensively, you get the most out of your resources. Compute is only used when needed.
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
Data Connectors for AWS Clean Rooms
Deploy this solution to simplify the process of selecting application sources and preparing data for collaborating in AWS Clean Rooms.
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
Adobe, the Adobe logo, Acrobat, the Adobe PDF logo, Adobe Premiere, Creative Cloud, InDesign, and Photoshop are either registered trademarks or trademarks of Adobe in the United States.