Catching fraud faster by building a proof of concept in Amazon Fraud Detector

Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities, such as the creation of fake accounts or online payment fraud. Unlike general-purpose machine learning (ML) packages, Amazon Fraud Detector is designed specifically to detect fraud. Amazon Fraud Detector combines your data, the latest in ML science, and more than 20 years of fraud detection experience from Amazon.com and AWS to build ML models tailor-made to detect fraud in your business.

This post helps you develop and run a robust proof of concept (POC) for the Online Fraud Insights model, so that you can evaluate the value of Amazon Fraud Detector to your business. The Online Fraud Insights model is a supervised ML model that you can adapt to solve a variety of fraud types, such as new account fraud, online transaction fraud, or fake reviews. Depending on data availability, you can usually complete Amazon Fraud Detector POCs in as little as 1–2 weeks.

The fraud lifecycle

Fraud occurs in all shapes and sizes, but generally follows the same three-step lifecycle:

A bad actor joins your platform
The bad actor commits fraud (often multiple times on the same account)
Eventually, you discover the fraud and block the bad actor

The following diagram visualizes this lifecycle.

The bad actor first registers the account, which the diagram denotes as Time Zero (T0). The subsequent fraudulent events occur at time T1, T2, and T3, and go undetected by the system. At some later point in time (TN), which could be hours, days, or even weeks, in the form of a chargeback, you discover the fraud and block the bad actor.

Discovering fraud late is costly. There are financial costs related to cost of goods sold, cost of human reviews, and fees associated with chargebacks. In addition, there may be reputational or regulatory risks for the company. In the perfect scenario, you could predict fraud and stop it before any damage occurs. One way is to catch bad actors right as they register the account (T0) and never let them into the platform in the first place.

This is where Amazon Fraud Detector can help. You can build a custom model that predicts who is likely to commit fraud and use the information to help prevent (or minimize) fraud before it ever occurs. In some cases, Amazon Fraud Detector has helped detect fraud up to 95% faster.

When designing a POC with the Online Fraud Insights model, you should build a model that detects bad actors as close to T0 as possible (ideally at account registration). As part of an extended POC, you can also build additional models that detect bad actors at subsequent high-risk events (like transactions).

Planning your POC for online fraud insights

To get started with a POC with the Online Fraud Insights model, consider the following:

Your specific use case – The Online Fraud Insights model works well detecting a multitude of online fraud and abuse types, such as new account fraud, transaction fraud, or fake reviews abuse. As of this writing, it can’t detect account takeover (ATO).
When you want to detect fraud in the lifecycle – This is a key component of designing the right POC. Detecting fraud earlier is best.

You should break your POC down into three phases:

Gathering training data – Gather the data with which to train the model.
Training the model and configuring the detector – Build the model and train it to detect fraud in your businesses. Build a detector, which is a combination of the trained model and rules.
Evaluating performance – Determine if Amazon Fraud Detector is catching fraud faster and reducing your fraud losses

You can complete the POC within the free trial period, and generally takes 1–2 weeks. If you need help with your use case, contact the Amazon Fraud Detector team.

Gathering training data

The first step of the POC is gathering the relevant data to train the model.

Except in rare cases, more data is generally better than less. To train the model, Amazon Fraud Detector requires at least 10,000 records, with at least 500 of those records identified as fraudulent; however, the more records and variety of fraudulent examples you provide, the better. That doesn’t mean you should artificially create examples, instead just add more historical events. You should gather at least 3–6 months of data.

When training a fraud detection model, you want to use data that is mature, which means the fraud lifecycle is complete. Depending on the use case, this can take 30–45 days or more. For example, in the case of detecting card-not-present transaction (chargeback) fraud, identifying a fraudulent charge generally takes a statement cycle. The most recent data in your training dataset should be at least 4–6 weeks old, but use your best judgment in deciding which data is old enough to have completed the fraud Lifecycle.

In summary, to train the model, adhere to the following best practices:

Use at least 10,000 records, with 500 of those being fraudulent examples (but more is always better).
Training data spans 3–6 months (although less works). Oldest data is no more than a year old.
The dataset is mature, at least 4 weeks old, although every business is different.

Training the model and configuring the detector

Now that you’ve gathered your training data, it’s time to train a model and create a detector. For more information, see the following:

For more information about creating these resources programmatically using the AWS SDK or AWS Command Line Interface (AWS CLI), see the GitHub repo.

Picking a model score threshold

Your model produces scores between 0 (least risky) to 1,000 (most risky). You should use the score threshold to identify an acceptable fraud capture rate while balancing false positives. For example, assume you want to identify fraudulent account registrations. The following chart shows that at a score threshold of 500 (meaning any score 500 or above is labeled as fraud) catches 53.2% of fraudulent account registrations. It also shows that 0.4% of legitimate events are incorrectly classified as fraud.

Amazon Fraud Detector also provides a tabular view of the same information. The following image shows that at a score threshold of 150 (meaning any score 150 or above is labeled as fraud) catches 74% of fraudulent account registrations. It also shows that 3% of legitimate events are incorrectly classified as fraud.

Evaluating performance

After you complete the preceding steps, you have an active model and detector build for your fraud use case as shown in the following images.

You can now use this model and detector to evaluate how much value Amazon Fraud Detector provides your business. To do so, you need to generate fraud predictions using the AWS SDK and the GetEventPrediction API. The GetEventPrediction API takes in a new event (such as account registration) and outputs a fraud score based on the model and the outcome based on the detector rules. You should test the model using the most recent data or data up to 30 days old. This makes sure the test dataset doesn’t overlap with your training dataset. At a minimum, test the model on 1,000–5,000 records over a 2-week period. You want enough data to adequately evaluate the model’s effectiveness in identifying fraud.

For more information about running the GetEventPrediction API in a batch fashion, see Fraud Detector Predict API on GitHub. You should replace the synthetic dataset with your test dataset. You want to test the model on the latest data you have available, so data from the last day (or week) is most valuable to evaluate model performance.

You might consider one of the following evaluation strategies, depending on your use case and data availability:

Time to detect fraud
Dollars saved
Investigation yield

Time to detect fraud

One way to determine if Amazon Fraud Detector is adding value is to benchmark using the time to detect fraud (TTDF) metric. You measure TTDF from the moment the bad actor begins their fraud Lifecycle. The following diagram (also shown earlier) illustrates the fraud detection timeline.

TTDF starts at account registration (T0) and is defined as the difference between T0 and TN. The goal is to minimize TTDF. As part of the POC, you can determine if your average TTDF has decreased.

For this use case, you want to build a model in Amazon Fraud Detector to detect fraud at registration (T0). To calculate TTDF, you need to do the following:

Use the model performance metrics to determine the right model score threshold that balances true positive rate vs. false positive rate for your business. A 1% false positive rate is a good starting point when picking the model threshold, although the ideal cutoff depends on your risk tolerance and what action you take (such as block the account or send for manual review).
Run a sample of new account registrations through Amazon Fraud Detector to generate fraud scores and outcomes. For each event, save the timestamp of when the event occurred (T0), risk score, and outcome. If that event gets marked as fraudulent, store the timestamp of when the fraud decision was made (TN).
For those events marked as fraudulent, subtract TN from T0. This is the TTDF.

In an ideal scenario, you see that Amazon Fraud Detector identified a significant portion of the total fraud and the TTDF is shorter than with your current system.

Dollars saved

As an extension to the TTDF strategy, you can also estimate the dollars saved compared to your current approach. The Online Fraud Insights model is designed to stop bad actors early in the Lifecycle (prevention), rather than letting everyone in and identifying fraud after the fact (mitigation). If you stop bad actors from registering an account, you can dramatically limit the impact of their subsequent activity within your business. For this use case, assume you want to detect bad actors at registration (T0).

To benchmark your dollars saved, you need to complete the following:

Use the model performance metrics to determine the right model score threshold that balances true positive rate vs. false positive rate for your business. A 1% false positive rate is a good starting point when picking the model threshold, although the ideal cutoff depends on your risk tolerance and what action you take.
Run your test sample of registration events through Amazon Fraud Detector to generate fraud scores and outcomes. For each event, save the timestamp of when the event occurred (T0), risk score, and outcome. If that event gets marked as fraudulent, store the timestamp of when the fraud decision was made (TN) and the fraud losses in dollars.
Count the following:
1. Fraud events prevented based on the score threshold. For each fraud event, remember to store the losses in dollars. If this data is unavailable, use an estimate.
2. Legitimate events that were identified incorrectly as fraudulent based on the score threshold. For each legitimate event that was identified incorrectly, estimate the lost potential revenue or the cost associated with investigating the false positive.
Take the counts for each bucket and multiple by the per-event cost. For example, out of 1,000 events of which 20 are fraud and 980 are legitimate, you might find the following:
1. The model successfully identified 10 of the 20 fraud events at registration, thereby preventing the fraud from ever occurring. For each fraud event, assume $150 in fraud losses, so total mitigated fraud losses is 10 x $150 = $1,500.
2. The model incorrectly flagged 1% of legitimate population as potentially fraudulent, thereby incorrectly flagging 10 customers as potentially fraudulent. These customers were manually investigated, which costs $5 per investigation. The total cost of false positives is 10 x $5 = $50.
Calculate the total dollars saved by taking the mitigated fraud losses and subtracting the cost of false positives and the cost of running Amazon Fraud Detector. For this use case, if you assume each evaluation costs $0.03 per evaluation, your total dollars saved is $1,500 (fraud losses mitigated) – $50 (cost of false positives) – $30 (cost of running Amazon Fraud Detector) = $1,420 in savings.
Compare the dollars, counts, and percentages from Amazon Fraud Detector at your chosen score threshold to your current fraud system.

Ideally, Amazon Fraud Detector shows a significant savings compared to your current solution.

Investigation yield

Another way to benchmark Amazon Fraud Detector is to check the manual investigation yield vs. your current fraud system. Yield is defined as the number of investigations that turn out to be fraudulent divided by the total number of investigations. Ideally, you want your yield rate to be high, so that your human investigators don’t waste time investigating false positives.

To benchmark your investigation yield, complete the following:

Use the model performance metrics to determine the right model score threshold that balances true positive rate vs. false positive rate for your business. A 5% false positive rate is a good starting point when picking the model threshold, although the ideal cutoff depends on your risk tolerance.
Run a sample of events through Amazon Fraud Detector to generate fraud scores and outcomes.
Record how many events you send for investigation (total investigation count or TIC) and how many of the investigated events were marked as fraudulent (fraud investigation count of FIC).
To calculate investigation yield, divide the FIC by TIC.
Compare the yield from Amazon Fraud Detector to the yield of your current fraud system.

Ideally, your yield is higher with Amazon Fraud Detector, and also catches a sufficient amount of fraud.

Conclusion

After you have a successful POC, you can transition to a live system by either running your detector in a shadow mode or routing production traffic to the detector. Transitioning to a shadow mode or production is outside the scope of this post, but AWS Solutions Architects are ready to help you work through the next stage of implementation.

About the Authors

Chris Porter is a Sr. Product Manager working on Amazon Fraud Detector. He is passionate about helping AWS customers stop fraud by leveraging machine learning. In his spare time, he enjoys hiking, skiing, and exploring the mountains of the Pacific Northwest.

Mike Ames is a Research Science Manager working on Amazon Fraud Detector. He helps companies use machine learning to combat fraud, waste and abuse. In his spare time, you can find him jamming to 90s metal with an electric mandolin.