Implement customer churn prediction by performing advanced analytics on customer data
This Guidance uses machine learning (ML) to help you build a churn prediction model using structured and unstructured data. Customer churn, or customer attrition, measures the number of customers that stop using one of your products or services. By using a model that forecasts churn, you can take preventative action to identify behaviors and patterns that indicate churn probability for a set of customers. This Guidance can help business-to-business (B2B) organizations that use customer feedback and relationships to better understand customer satisfaction.
Please note: [Disclaimer]
Architecture Diagram
[text]
Step 1
A Large Language Model (LLM) running on Amazon SageMaker processes unstructured data from case management, ecommerce, and file systems by extracting sentiment data and recognizing entities and topics.
Step 2
An AWS Glue job consolidates structured customer data that has been extracted from e-commerce, customer relationship management (CRM), enterprise resource planning (ERP), and master data management (MDM) systems in addition to unstructured data insights obtained by the LLM into a single Amazon Simple Storage Service (Amazon S3) bucket.
Step 3
A SageMaker pipeline pre-processes the consolidated data and trains the churn model according to a specified metric by using an AutoML job. The AutoML job tries several ML models like neural networks or decision trees and adjusts its hyperparameters to obtain the best model possible for your data and use case.
Step 4
Create and register a churn model in the SageMaker model registry to manage versioning and deployment.
Step 5
An AWS Lambda function invokes a SageMaker batch inference job on a schedule using Amazon EventBridge to estimate the churn probability of a batch of customers. The inference results are stored in an S3 bucket.
Step 6
Amazon SageMaker Clarify helps you better understand the model’s reasoning behind the churn detection. It generates a report that identifies the important features of the model that influenced the churn score and stores this in Amazon S3.
Step 7
A Lambda function generates a summary of churn results, which Amazon Simple Notification Service (SNS) sends to decision-makers over email.
Step 8
You can query and analyze the churn results stored in S3 through Amazon Athena and AWS Glue. Additionally, you can visualize and further analyze churn results using Amazon QuickSight.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance incorporates text data to enrich the dataset used to create a SageMaker model to predict a customer’s risk of churn. Amazon S3 stores the churn results as CSV files, and Athena queries the results without requiring additional operational overhead. Amazon SNS sends automated analysis reports to decision-makers so they can quickly act and reduce the likelihood of customer churn.
-
Security
AWS Identity and Access Management (IAM) controls access to data, ML models, and churn insights through granular permissions based on roles. Additionally, SageMaker can only access data through Amazon Virtual Private Cloud (Amazon VPC) endpoints. This means that data does not travel across the public internet, limiting potential points of data exposure.
-
Reliability
SageMaker uses distributed training libraries to reduce training time and optimize model scaling. SageMaker also initiates batch transformation tasks across multiple Availability Zones to reduce risk of failure during training. If one Availability Zone fails, training can continue across another Availability Zone. Additionally, Athena, QuickSight, and AWS Glue are serverless services, making it easy to scale data queries and visualizations without you having to worry about provisioning additional infrastructure.
-
Performance Efficiency
SageMaker batch inference allows you to process batches of data so you can run churn analysis on a set of customers at a time, rather than requiring you to have an endpoint up and running at all times. To support spikes in batch inference workloads, Lambda provides serverless compute that automatically scales based on demand.
-
Cost Optimization
To help reduce costs, AWS Glue jobs are used for extract, transform, and load (ETL) on a batch of user data rather than individual records. Additionally, Lambda processes events to start batch transformation analysis so that you can spin up compute capacity only as needed rather than having a server running at all times. A combination of AWS Glue, Athena, and QuickSight consume churn insights as the most cost-effective way to read batched data stored in Amazon S3.
-
Sustainability
By extensively using serverless services, such as Lambda, AWS Glue, Athena, and QuickSight, you maximize overall resource utilization as compute is only used as needed. These serverless services scale to meet demand, reducing the overall energy required to operate the workload. You can also use the AWS Billing Conductor carbon footprint tool to calculate and track the environmental impact of the workload over time at an account, Region, and service level.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.