This Guidance shows how you can build a well-architected customer data platform with data from a broad range of sources, including contact centers, email, web and mobile entries, point of sale (POS) transactions, and customer relationship management (CRM) systems. It explores each stage of building the platform, starting with the extraction of batched and real-time data streams. Next, this Guidance shows how to cleanse, enrich, and process the data to create a unified customer record across all data sources. Finally, the processed data is ready for analysis and collaboration, all in a restricted, secure environment where you set the controls. The data can be used to build more personalized customer experiences and to enhance the monetization of your marketing campaigns. 

Architecture Diagram


Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • This Guidance has observability built-in, with every AWS service publishing metrics to Amazon CloudWatch where dashboards and alarms can be configured, enhancing operational excellence to support a well-architected framework. And by using CloudWatch alarms, or Amazon Simple Notification Service (Amazon SNS), you are notified and can respond appropriately to incidents.

    Read the Operational Excellence whitepaper 
  • IAM policies are created using the least privilege access principle, and include restrictions to the specific resource and operation, supporting a secure framework for people and machine access. To further protect resources in this Guidance, secrets and configuration items are centrally managed and secured using AWS Key Management Service (AWS KMS). And to protect data, the Amazon S3 bucket is encrypted using the AWS KMS keys for data at rest. The data in transit is encrypted and transferred over HTTPS. 

    Additionally, all of the Amazon S3 buckets are blocked from public access, and access to DynamoDB is only required within a virtual private cloud (VPC). Thus, we are using a VPC endpoint to limit access from only the required VPC. Doing this prevents that traffic from traversing the open internet and being subject to that environment.

    Read the Security whitepaper 
  • By deploying this Guidance, you also implement a highly available network topology in multiple ways. First, every service and technology chosen for each architecture layer is serverless and fully managed by AWS, making the overall architecture elastic, highly available, and fault-tolerant. Second, DynamoDB has a point-in-time recovery feature that provides continuous backups of your tables and enables you to restore your table data to any point-in-time in the preceding 35 days. Third, Amazon S3 offers industry-leading durability, availability, performance, security, and virtually unlimited scalability at very low costs. Finally, AWS serverless services, including Lambda, are fault-tolerant and designed to handle failures. If a service invokes a Lambda function and there is a service disruption, Lambda invokes the function in a different Availability Zone.

    Read the Reliability whitepaper 
  • The services selected for this Guidance are designed to enhance your workload performance. For example, by using serverless technologies, you only provision the exact resources you use. The serverless architecture reduces the amount of underlying infrastructure you need to manage, allowing you to focus on solving your business needs. You can also use automated deployments to deploy the different components of this Guidance into any AWS Region quickly, providing data residence and reduced latency. 

    Also, all components of this Guidance are collocated in a specific Region and use a serverless stack, which avoids the need for you to make location decisions about your infrastructure apart from the Region choice. 

    Read the Performance Efficiency whitepaper 
  • By using serverless technologies and managed services, you only pay for the resources you consume, helping you control costs. Another way this Guidance can help optimize costs is by helping you plan for data transfer charges. To do this, we recommend you identify data egress points and evaluate the use of network services like AWS PrivateLink and AWS Direct Connect to reduce data transfer costs. 

    To further optimize compute costs for this Guidance, scoping of near real-time data ingestion allows you to leverage Amazon Kinesis Data Streams with a provisioned capacity mode. Provisioned capacity mode is best suited for predictable application traffic or for applications where the traffic is consistent, increases gradually, or where you can forecast capacity requirements to control costs. Similarly, for DynamoDB, use provisioned capacity mode for predictable workloads to reign in costs. Also, when AWS Glue is performing data transformations, you only pay for the infrastructure while the processing is occurring. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts and measure costs specific to each tenant, application module, and service.

    Read the Cost Optimization whitepaper 
  • This Guidance scales to continually match the needs of your workloads with only the minimum resources required through the extensive use of serverless services. The efficient use of these resources also reduces the overall energy required to operate your workloads. And, this Guidance uses purpose-built data stores for specific workloads, which minimizes the resources provisioned. For example, Amazon S3 is used for data lake storage, and DynamoDB is used to support low latency queries.

    Finally, all of the services used in this Guidance are managed services that allocate hardware according to the workload demand. We recommend using the provisioned capacity options (as mentioned previously) in the services when available, and when the workload is predictable, to reduce cost.

    Read the Sustainability whitepaper 

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.


The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Was this page helpful?