Guidance for Near Real-Time Fraud Detection Using Amazon Redshift Streaming Ingestion

This Guidance demonstrates how to use machine learning (ML) to combat fraudulent financial transactions. With AWS services, financial institutions can create an application that simulates credit card transactions. This application enables financial institutions to train and develop an ML model capable of generating near-real-time inferences. Financial institutions can then identify fraudulent transactions and use ML to predict fraud before it strikes.

Architecture Diagram

Download the architecture diagram PDF

Guidance Architecture Diagram for Near Real-Time Fraud Detection Using Amazon Redshift Streaming Ingestion

Step 1
The Amazon Elastic Compute Cloud (Amazon EC2) instance simulates a credit card transaction application. It uses tools such as Python packages that generate fake data, and inserts credit card transactions into Amazon Kinesis Data Streams.

Step 2
Kinesis Data Streams stores the incoming transaction data.

Step 3
An Amazon Redshift Streaming Ingestion view is created on top of the data stream, which automatically ingests streaming data into Amazon Redshift.

Step 4
Build, train, and deploy a machine learning (ML) model using Amazon Redshift ML. The Redshift ML model is trained using historical transactional data.

Step 5
Generate ML predictions for the credit card transactions in the streaming data using Amazon SageMaker.

Step 6
You can alert customers using Amazon Simple Notification Service (Amazon SNS). You can also update the Amazon EC2 instance when you want to mitigate risk, such as blocking the financial transaction.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

You can monitor your organization's operational health and notify operators of faults using Amazon CloudWatch. With this service, you can customize metrics, alarms, and dashboards. For more on how to gain insights into your operations, see the Amazon Redshift Streaming Ingestion developer guide.

Read the Operational Excellence whitepaper
Security
AWS has implemented a variety of services to ensure secure authentication and authorization that are compatible with this Guidance. These include:
- AWS Identity and Access Management (IAM)
- AWS IAM Identity Center (Successor to AWS Single Sign-On)
- AWS Certificate Manager (ACM)
- AWS Key Management Service (AWS KMS)
- Amazon Redshift role-based access control (RBAC)
These services are designed to provide secure access control and encryption of data for both people and machine access.

This Guidance recommends several AWS security services, such as IAM. It emphasizes the use of network security best practices, such as implementing network segmentation and controlling access with security groups and network access control lists (ACLs).

To protect data in this Guidance, AWS services such as Amazon Simple Storage Service (Amazon S3), AWS KMS, and AWS CloudTrail are used. Data is encrypted both in transit and at rest, and access to data is controlled by using IAM. CloudTrail logs all API activity to provide visibility into any unauthorized access attempts.

Furthermore, Amazon Redshift offers column-level and row-level access controls, as well as dynamic data masking to protect the data.
Read the Security whitepaper
Reliability

This Guidance implements a reliable application-level architecture by leveraging AWS services such as Amazon Redshift Streaming Ingestion, which offers reliable data availability by storing data in Redshift Managed Storage (RMS). A Kinesis data stream also offers reliable data availability and retention up to 365 days by default.

Backup is available immediately for Amazon Redshift, as this service offers several fault tolerance levels within the service. And Amazon Redshift can be deployed in multiple Availability Zones, making services always available in case of a rare, but possible, Available Zone failure.

Read the Reliability whitepaper
Performance Efficiency

This Guidance uses several services to meet the workload requirements of various scaling, traffic, and data access patterns. Kinesis Data Streams supports near real-time data ingestion. AWS Lambda and Amazon SNS are used for event-driven compute and messaging. Amazon Redshift is a service built for data warehousing and analytics. SageMaker is used to build, train, and deploy ML models. And Amazon Redshift ML supports predictive analytics. These AWS services allow this Guidance to meet workload requirements by providing scalable and efficient solutions for processing, analyzing, and storing data. Additionally, Amazon Redshift has an auto-scaling feature that ensures this Guidance can dynamically adjust resources to meet changing demand.

Read the Performance Efficiency whitepaper
Cost Optimization

This Guidance primarily follows a serverless architecture, which automatically scales to match the demand and ensures only the required resources are used. Services like Lambda, Amazon API Gateway, and Amazon S3 provide the required infrastructure for the serverless architecture, and AWS Auto Scaling adjusts the capacity based on the workload. Additionally, services like CloudWatch help in optimizing the application and monitoring the resources.

Read the Cost Optimization whitepaper
Sustainability

This Guidance uses several AWS services to support data access and storage patterns. Kinesis Data Streams are used for near real-time data ingestion, while Lambda functions process the data and send notifications through Amazon SNS. Amazon Redshift is used for data warehousing and analytics, and SageMaker provides an environment for machine learning. Amazon Redshift ML is used for predictive analytics on the data stored in Amazon Redshift, allowing for the creation of models to support various data access and storage patterns.

Read the Sustainability whitepaper

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open implementation guide

Open sample code on GitHub

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Near-real-time fraud detection using Amazon Redshift Streaming Ingestion with Amazon Kinesis Data Streams and Amazon Redshift ML

Disclaimer

Was this page helpful?

Guidance for Near Real-Time Fraud Detection Using Amazon Redshift Streaming Ingestion

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

Near-real-time fraud detection using Amazon Redshift Streaming Ingestion with Amazon Kinesis Data Streams and Amazon Redshift ML

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer