This guidance shows best practices for building a customer data platform covering customer data on AWS from a broad range of sources – including contact centers, email, web/mobile entries, point of sale (POS) systems, customer relationship management (CRM) systems and social media. It explores each stage of building the platform and covers data ingestion, identity resolution, segmentation, analysis and activation.
Disclaimer: Not for production use
Source systems of customer data include customer interactions, clickstreams and call center logs.
Data from customer touchpoints is ingested into the marketing customer data platform (CDP) data lake using Amazon Kinesis, Amazon AppFlow, Amazon EKS and an Amazon API Gateway.
Ingested data is sent – in its original, immutable format – to an Amazon Simple Storage Service (Amazon S3) Raw Zone bucket.
Raw data is then transformed into efficient data formats – such as Parquet or Avro – and moved to a Clean Zone Amazon S3 bucket.
CDP processing and pipeline orchestration is conducted using purpose-built data processing components and transformation libraries through AWS Step Functions and then Amazon Personalize, AWS Lambda, and AWS Glue.
Data in the Amazon S3 Curated Zone is now ready for post-CDP-processing consumption and is organized by subject areas, segments, and profiles.
The analytics layer uses Amazon Redshift, Amazon QuickSight, Amazon SageMaker and Amazon Athena to natively integrate with the Curated Zone for analytics, dashboards, ad hoc reporting, and ML purposes.
Customer data is then aggregated across platforms and published using customer APIs for consumption using Amazon DynamoDB and an Amazon API Gateway.
Amazon Pinpoint and Amazon Connect are used to activate multiple customer channels such as mobile push, voice, and email for targeted marketing communications.
Using AWS Lake Formation, fine-grained access controls can be enforced on catalog tables, columns, and rows on the data lake.
The resulting catalog in AWS Glue helps you manage both business and technical metadata, with versioning, at scale.
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Multiple approaches are often required for optimal performance across a workload. Well-Architected systems use multiple solutions and features to improve performance. This layered component-oriented architecture shown in this guidance allows you to build each layer independently using infrastructure as code. By separating the ingestion, processing, storage, unified governance, cataloging, and consumption, the modules in it can be more easily tested and deployed. In addition, each layer can have defined response procedures and organized game days focused on practicing runbooks and applying lessons learned. Observability is built in with process level metrics, logs and dashboards. Customize these mechanisms to your needs, and create alarms in Amazon CloudWatch to inform your on-call team on any issues.
The Security and Governance layer is responsible for providing mechanisms for access control, encryption, auditing and data privacy. Using AWS Key Management Service (AWS KMS), data is persisted in an encrypted format to protect it from unauthorized access. AWS Lake Formation applies central audited governance, fine-grained access controls, and data classification tagging - enabling you to secure data at the object, database, table, column, and row-level.
Each architecture layer can be independently monitored with Amazon CloudWatch key performance indicators (KPIs) using Amazon CloudWatch with automated resolution using services such as Amazon EventBridge. In addition, serverless services such as AWS Glue and Amazon DynamoDB scale horizontally, automatically responding to the velocity of data ingestion and processing. Finally, with a siloed isolation tenant architecture, you can also deploy tenant-specific resources to reduce the impact of a single failure.
Using serverless technologies, you only provision the exact resources you use. The serverless architecture reduces the amount of underlying infrastructure you need to manage, allowing you to focus on onboarding new customers and building new product feature enhancements. You can use automated deployments to deploy the isolated CDP tenants into any region quickly - providing data residence and reduced latency. In addition, you can experiment and test each CDP layer, enabling you to perform comparative testing against varying load levels, configurations, and services.
Using serverless technologies, you only pay for the resources you consume. As the data ingestion velocity increases and decreases, the costs will align with usage. When Amazon Glue is performing data transformations, you only pay for infrastructure during the time the processing is occurring. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts and measure costs specific to each tenant, application module, and service.
By extensively using serverless services, you maximize overall resource utilization - as compute is only used as needed. The efficient use of serverless resources reduces the overall energy required to operate the workload. You can also use the AWS Billing Conductor carbon footprint tool to calculate and track the environmental impact of the workload over time at an account, region, and service level.
A modern approach to implementing the serverless Customer Data Platform
An overview and architecture of building a Customer Data Platform on AWS
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.