This Guidance demonstrates how to use machine learning (ML) to combat fraudulent financial transactions. With AWS services, financial institutions can create an application that simulates credit card transactions. This application enables financial institutions to train and develop an ML model capable of generating near-real-time inferences. Financial institutions can then identify fraudulent transactions and use ML to predict fraud before it strikes.
Architecture Diagram
Step 1
The Amazon Elastic Compute Cloud (Amazon EC2) instance simulates a credit card transaction application. It uses tools such as Python packages that generate fake data, and inserts credit card transactions into Amazon Kinesis Data Streams.
Step 2
Kinesis Data Streams stores the incoming transaction data.
Step 3
An Amazon Redshift Streaming Ingestion view is created on top of the data stream, which automatically ingests streaming data into Amazon Redshift.
Step 4
Build, train, and deploy a machine learning (ML) model using Amazon Redshift ML. The Redshift ML model is trained using historical transactional data.
Step 5
Generate ML predictions for the credit card transactions in the streaming data using Amazon SageMaker.
Step 6
You can alert customers using Amazon Simple Notification Service (Amazon SNS). You can also update the Amazon EC2 instance when you want to mitigate risk, such as blocking the financial transaction.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
You can monitor your organization's operational health and notify operators of faults using Amazon CloudWatch. With this service, you can customize metrics, alarms, and dashboards. For more on how to gain insights into your operations, see the Amazon Redshift Streaming Ingestion developer guide.
-
Security
AWS has implemented a variety of services to ensure secure authentication and authorization that are compatible with this Guidance. These include:
- AWS Identity and Access Management (IAM)
- AWS IAM Identity Center (Successor to AWS Single Sign-On)
- AWS Certificate Manager (ACM)
- AWS Key Management Service (AWS KMS)
- Amazon Redshift role-based access control (RBAC)
These services are designed to provide secure access control and encryption of data for both people and machine access.
This Guidance recommends several AWS security services, such as IAM. It emphasizes the use of network security best practices, such as implementing network segmentation and controlling access with security groups and network access control lists (ACLs).
To protect data in this Guidance, AWS services such as Amazon Simple Storage Service (Amazon S3), AWS KMS, and AWS CloudTrail are used. Data is encrypted both in transit and at rest, and access to data is controlled by using IAM. CloudTrail logs all API activity to provide visibility into any unauthorized access attempts.
Furthermore, Amazon Redshift offers column-level and row-level access controls, as well as dynamic data masking to protect the data.
-
Reliability
This Guidance implements a reliable application-level architecture by leveraging AWS services such as Amazon Redshift Streaming Ingestion, which offers reliable data availability by storing data in Redshift Managed Storage (RMS). A Kinesis data stream also offers reliable data availability and retention up to 365 days by default.
Backup is available immediately for Amazon Redshift, as this service offers several fault tolerance levels within the service. And Amazon Redshift can be deployed in multiple Availability Zones, making services always available in case of a rare, but possible, Available Zone failure.
-
Performance Efficiency
This Guidance uses several services to meet the workload requirements of various scaling, traffic, and data access patterns. Kinesis Data Streams supports near real-time data ingestion. AWS Lambda and Amazon SNS are used for event-driven compute and messaging. Amazon Redshift is a service built for data warehousing and analytics. SageMaker is used to build, train, and deploy ML models. And Amazon Redshift ML supports predictive analytics. These AWS services allow this Guidance to meet workload requirements by providing scalable and efficient solutions for processing, analyzing, and storing data. Additionally, Amazon Redshift has an auto-scaling feature that ensures this Guidance can dynamically adjust resources to meet changing demand.
-
Cost Optimization
This Guidance primarily follows a serverless architecture, which automatically scales to match the demand and ensures only the required resources are used. Services like Lambda, Amazon API Gateway, and Amazon S3 provide the required infrastructure for the serverless architecture, and AWS Auto Scaling adjusts the capacity based on the workload. Additionally, services like CloudWatch help in optimizing the application and monitoring the resources.
-
Sustainability
This Guidance uses several AWS services to support data access and storage patterns. Kinesis Data Streams are used for near real-time data ingestion, while Lambda functions process the data and send notifications through Amazon SNS. Amazon Redshift is used for data warehousing and analytics, and SageMaker provides an environment for machine learning. Amazon Redshift ML is used for predictive analytics on the data stored in Amazon Redshift, allowing for the creation of models to support various data access and storage patterns.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Near-real-time fraud detection using Amazon Redshift Streaming Ingestion with Amazon Kinesis Data Streams and Amazon Redshift ML
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.