Build, train, and deploy a fraud detection model
In this tutorial, you learn how to use Amazon Fraud Detector to build, train, and deploy a fraud detection model. Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities such as online payment fraud and the creation of fake accounts.
Globally each year, organizations lose tens of billions of dollars to online fraud. Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and more than 20 years of fraud detection expertise from Amazon, to identify potentially fraudulent activity so customers can catch more online fraud faster. Amazon Fraud Detector automates the time consuming and expensive steps to build, train, and deploy an ML model for fraud detection, making it easier for customers to leverage the technology. Amazon Fraud Detector customizes each model it creates to a customer’s own dataset, making the accuracy of models higher than current one-size fits all ML solutions. And, because you pay only for what you use, you avoid large upfront expenses.
In this tutorial, you assume the role of a fraud analyst working at an ecommerce website. You have been asked to create a machine learning model to predict whether a new account registration is fraudulent or not. The model will be trained on an account registration dataset that contains information on customer email, event timestamp, IP address, and fraud label.
The data has been labeled for your convenience and a column in the dataset identifies whether the account registration is fraudulent or legitimate. A version of this dataset is publicly available for download from the Amazon Fraud Detector User Guide.
In this tutorial, you complete the following steps:
- Learn about Amazon Fraud Detector components
- Create an Amazon S3 bucket with sample training data
- Create an Event in Amazon Fraud Detector
- Train and deploy the fraud detection model
- Create and publish the detector
- Clean up the resources used in this tutorial
About this Tutorial | |
---|---|
Time | 90 minutes |
Cost | Less than $3 |
Use Case | Machine Learning |
Products | Amazon Fraud Detector |
Audience | Data analysts, Developers |
Level | Intermediate |
Last Updated | November 24, 2020 |
Step 1: Learn about Amazon Fraud Detector components
This step provides an overview of the two main Amazon Fraud Detector components: detectors and models.
For more information, see Amazon Fraud Detector concepts.
A detector is a rules-based categorization engine that predicts predefined outcomes based on user configuration. For this tutorial, you define the model score thresholds as rules for the detector.
Models can either be trained within Amazon Fraud Detector using custom user data or they can be accessed from precreated Amazon Sagemaker endpoints.
The high-level configuration flow is depicted in the following diagram.
To create models within Amazon Fraud Detector, you must provide data for training. This data has input features (defined by variables) and output labels (defined by labels in the Amazon Fraud Detector service). Additionally, you define events based on the type of entities sending the data for predictions. The following diagram shows the sequence of component creation followed in this tutorial.
Step 2: Create the Amazon S3 bucket for training data
In this step, you create an Amazon S3 bucket and upload the training dataset to the S3 bucket.
Note: First, download and save the dataset. Then, extract the files using an archive utility. You will upload these files to the S3 bucket later in this step.
2.1 — Navigate to Amazon S3 in the AWS Management Console and choose Create bucket.
2.2 – On the Create bucket page, provide a unique name for the bucket. For this tutorial, the S3 bucket is named fraud-detector-getting-started-1. For Region, note the Region in which you are creating this S3 bucket. You want to ensure you select the same Region for your Amazon Fraud Detector service as the S3 bucket. For this tutorial, the Region is us-east-1.
2.3 – Keep the default settings for all remaining options and choose Create bucket.
2.4 – In the Buckets list, choose your newly created bucket to open the details view.
2.5 – Choose the Objects tab, then choose Upload.
2.6 – On the Upload page, choose Add files.
2.7 – Navigate to the training_data folder, select the registration_data_20K_minimum.csv file and choose Open.
2.8 – On the Upload screen, select the check box to acknowledge that bucket versioning is disabled. Keep all remaining options as the default selections and choose Upload.
The status of the file changes to Succeeded once the upload has completed.
Step 3: Create an Event for the Amazon Fraud Detector model
In this step, you create an Event with historical data for model training.
To create an event, you first define the entities participating in the event, then define the labels and variables in the event data.
For more information, see Create event types.
3.1 — Navigate to the Amazon Fraud Detector console and in the left navigation bar, choose Entities.
3.2 – On the Entity types page, choose Create.
3.3 – On the Create entity type page, for Entity type name, type customer and add an optional Entity description. Then, choose Create entity.
The new customer entity type now appears in your list of Entity types.
3.4 – In the left navigation pane, choose Labels, then choose Create.
You need to create two labels based on the training dataset – one for fraudulent transactions and one for legitimate transactions. Note that these labels must match the data from your input. In the registration_data_20K_minimum.csv file sample dataset you use for this tutorial, the data for these transactions is labeled as fraud and legit.
3.5 – On the Create label page, for Label name, type fraud and add an optional Label description. Then, choose Create label.
Next, create the second label for legitimate transactions.
3.6 – Choose Labels and choose Create. On the Create label page, for Label name, type legit and add an optional Label description. Then, choose Create label.
Your Labels list now includes your newly created fraud and legit labels.
Next, you create an event with the labels and entities information, and then define the variables using the training data.
3.7 – In the left navigation bar, choose Events and then choose Create.
3.8 – On the Create event type page, in the Event type details box, for Name, type registration and add an optional event description. For entity, choose the customer entity you created in Step 3.3.
3.9 – In the Event variables box, make the following selections:
- For Choose how to define this event's variables, choose Select variables from a training dataset.
- For IAM role, choose Create IAM role. On the Create IAM role prompt, for S3 location, type the name of the S3 bucket you created in Step 2 and choose Create role. This role provides access to the S3 bucket with training data.
Note: Amazon Fraud Detector creates an IAM role named AmazonFraudDetector-DataAccessRole-*** for you. Make note of this role as you need it later in the tutorial.
- For Data location, provide the S3 location of the bucket you created in Step 2 and choose Upload.
This step makes the training data accessible within Amazon Fraud Detector for automatic variable recognition. Notice the variables read in from training dataset by Amazon Fraud Detector. Next, you map the variables to the predefined list of variables.
3.10 – In the Variable mapping section, for Variable types, make the following selections:
- For ip_address, choose IP Address.
- For email_address, choose Email Address.
3.11 – In the Labels section, choose the Labels drop-down, and select the fraud label and the legit label you created previously. Then, choose Create event type.
Choose Cancel to dismiss the Build a model prompt that appears.
Step 4: Train and deploy fraud detection model
In this step, you create a fraud detection machine learning model using the training dataset you uploaded to Amazon S3 and the event you created in Amazon Fraud Detector.
4.1 — In the left navigation pane of the Amazon Fraud Detector console, choose Models. Then, choose Add model, Create model.
4.2 — On the Define model details page, make the following selections:
- For Model name, give the model an easily identifiable name and add an optional description.
- For Model type, choose Online Fraud Insights.
- For Event type, choose the registration event you created in Step 3.
4.3 — In the Historical event data section, for IAM role, choose the IAM role you created in Step 3. For Training data location, provide the S3 location of the training data file. Then, choose Next.
4.4 — On the Configure training page, notice the Model inputs section is prepopulated with model input variables based on the registration event you selected. In the Label classification section, for Fraud labels, choose fraud. For Legitimate labels, choose legit. Choose Next.
4.5 — On the Review and Create page, review all the information and choose Create and train model. A confirmation message appears for the model creation and the model starts training.
The training process takes approximately 45 minutes for this dataset.
When the model is ready, the Status changes from Training to Ready to Deploy.
4.6 — When the Model version status is Ready to deploy, under Version, choose 1.0. The Version details view opens.
4.7 — In the Model performance section, explore the various metrics generated by the training, including the Score distribution and the Confusion matrix. This data is useful in defining the appropriate thresholds for prediction using the detector.
Choose the Table tab to explore further.
You can choose your thresholds for the predictions based on the False positive rate (FPR), True positive rate (TPR), and Precision values in the table.
For the purposes of this tutorial, you use an upper threshold of 900 and a lower threshold of 700. Make a note of these thresholds as you use them in Step 5 for the creation of outcomes and rules in the detector.
4.8 — Scroll to the top of the Version details page and choose Actions, Deploy model version. On the Deploy model version prompt that appears, choose Deploy version.
4.9 — On the Deploy model version prompt, choose Deploy version.
The Version details shows a Status of Deploying. When the model is ready, the Status changes to Active.
You are finished with model training and deployment. Next, create the Fraud Detector.
Step 5: Create and publish the fraud detector
In this step, you create the fraud detector by defining the outcomes, fraud detection model, and rules using the thresholds selected in Step 4 (model training). After you create the fraud detector, you test the outcomes with sample training data, then finally publish the detector.
5.1 — In the left navigation pane of the Amazon Fraud Detector console, choose Outcomes and then choose Create.
For this tutorial, you create three outcomes:
- high_risk: indicates an outcome that is fraudulent.
- medium_risk: indicates an outcome that requires manual review.
- low_risk: indicates an outcome that is legitimate.
Next, create the first outcome.
5.2 — On the New outcome page, for Outcome name, type high_risk and add an optional outcome description. Then, choose Save outcome.
5.3 — Repeat Steps 5.1 and 5.2 to create outcomes for medium_risk and low_risk. Your list of Outcomes now includes three outcomes.
Next, create the detector.
5.4 — In the left navigation pane, choose Detectors, then choose Create detector.
5.5 — On the Define detector details page, for Detector name, type detector-getting-started and add an optional description. For Event type, choose registration. Then, choose Next.
5.6 — On the Add model page, choose Add model.
Note: You can also bring the models deployed using Amazon Sagemaker endpoints to connect with your detectors. For this tutorial, you use the model you trained in the previous steps.
5.7 — On the Add model prompt, choose the fraud detector model you created in Step 4. For version, choose 1.0. Then, choose Add model.
5.8 — On the Add model page, choose Next.
5.9 — On the Add rules page, create the first rule named auto-fraud-rule:
- For Name, type auto-fraud-rule and add an optional description.
- For Expression, type $<modelname>_insightscore >= 900
- For Outcomes, choose high_risk.
Then, choose Add rule.
5.10 — On the Add rules page, choose Add another rule. Repeat Step 5.9 to create the remaining two rules:
To create the review-rule:
- For Name, type review-rule and add an optional description.
- For Expression, type $<modelname>_insightscore < 900 and $<modelname>_insightscore > 700
- For Outcomes, choose medium_risk.
To create the auto-legit rule:
- For Name, type auto-legit-rule and add an optional description.
- For Expression, type $<modelname>_insightscore <=700
- For Outcomes, choose low_risk.
You should now have three rules. On the Add rules page, choose Next.
Note: The scores and rules used in this tutorial are for demonstration purposes only. Amazon strongly recommends reviewing your thresholds with business and legal teams to create appropriate rules.
5.11 — On the Configure rule execution page, keep the default settings and choose Next.
5.12 — On the Review and create page, review the selections and choose Create detector.
Your detector is created and shows a Draft status. To make the detector available for use, you must publish it. Before you publish the detector, run a few tests using the training dataset.
5.13 — Open the registration_data_20K_minimum.csv file and locate a row with an EVENT_LABEL showing fraud. On the detector details page, scroll down to the Run test section and enter the data from this row. Choose Run test.
The detector version returns a high risk outcome for the data. Next, run a test for a low risk outcome.
5.14 — In the registration_data_20K_minimum.csv file, locate a row with an EVENT_LABEL showing legit. In the Run test section, enter the data from this row. Choose Run test.
The detector version returns a low risk outcome for the data. Continue to run tests to check the outcomes.
Note: You may see different results than shown in the example images as the model is not memorizing the data but making predictions. As with any prediction algorithm, the results may not be 100% accurate.
5.15 — After you've run a few tests, on the detector details page, choose Actions, then choose Publish.
5.16 — On the Publish version prompt, choose Publish version.
A confirmation message appears and the detector status changes to Active.
Now that the detector is published, you can invoke it using the Amazon Fraud Detector API. See the following sample boto request. For more information, see the Amazon Fraud Detector User Guide.
import boto3, uuid, json
client = boto3.client('frauddetector', region_name='us-east-2')
response = client.get_event_prediction(
detectorId="detector-getting-started",
eventId=str(uuid.uuid4()),
eventTypeName="registration",
eventTimestamp="2019-08-10T20:44:00Z",
entities=[{"entityType": "customer", "entityId": str(uuid.uuid4())},],
eventVariables={
"email_address": "fake_acostsusan@example.org",
"ip_address": "46.41.252.160"
}
)
print('The predicted outcome is :' +json.dumps(response['ruleResults'][0]['outcomes']))
Step 6: Clean up
In the following steps, you clean up the resources you created in this tutorial.
It is a best practice to delete instances and resources that you are no longer using so that you are not continually charged for them.
Delete Amazon Fraud Detector detector, rules, and model
6.1 — Navigate to the Amazon Fraud Detector console, and in the left navigation pane, choose Detectors.
6.2 — Choose the detector you created for this tutorial and then choose version 1.0.
6.3 — Choose Actions, then choose Deactivate. Choose Deactivate this detector version without replacing it with a different version and choose Deactivate detector version. The detector status changes to Inactive.
6.4 — Choose Actions, then choose Delete. Type the name of your detector version and choose Delete detector version.
6.5 — Choose the Associated rules tab and choose one of the rules you created for this tutorial. On the Rules version page, choose Actions, then choose Delete rule version. Type the name of your rule to confirm and choose Delete version. Repeat this step to delete the remaining two rules.
6.6 — On the Detectors page, choose Action, then choose Delete detector. Type the name of your detector to confirm and choose Delete detector.
6.7 — In the left navigation pane, choose Models, then choose the model you created for this lab. Choose version 1.0
6.8 — Choose Action, then choose Undeploy model version. Type Undeploy and choose Undeploy model version. Wait for model version to change to Ready to deploy before continuing.
6.8 — On the Models page, choose Actions, then choose Delete. Type the version name of your model to confirm and choose Delete model version.
6.9 — On the Models page, choose Actions, then choose Delete model. Type the name of your model to confirm and choose Delete model.
Delete S3 bucket
6.10 — Navigate to the S3 console.
6.11 — In the left navigation pane, choose Buckets and select the bucket that you created for this tutorial. Choose Empty.
6.12 — In the confirmation box, type permanently delete and choose Empty. Then, choose Exit.
6.13 — Choose the bucket again, and choose Delete.
6.14 — In the confirmation box, type the name of the bucket to confirm and choose Delete bucket.
Delete IAM roles
6.13 — Navigate to the IAM console, and in the navigation pane, choose Roles.
6.14 — Search for fraud and then select the check box next to the role you created for this tutorial.
6.15 — At the top of the page, choose Delete role.
6.16 — In the confirmation dialog box, choose Yes, Delete.
6.17 — In the navigation pane, choose Policies.
6.18 — Search for fraud and then select the check box next to the policy created for this tutorial.
6.19 — At the top of the page, choose Policy actions, then Delete.
6.20 — In the confirmation dialog box, choose Delete.
Delete other resources
Optionally, delete the events, entities, labels, outcomes, and variables created for this tutorial. These resources do not incur any charges and maybe reusable for other detectors. For information on cleaning up these resources, see Delete resources in the Amazon Fraud Detector User Guide.
Congratulations
You built, trained, and deployed a fraud detection model using Amazon Fraud Detector.
Recommended next steps
Learn more about Amazon Fraud Detector features
Find out more about the features of Amazon Fraud Detector.
Read about more Amazon Fraud Detector applications
Read the blog post Catching fraud faster by building a proof of concept in Amazon Fraud Detector.
Find more Amazon Fraud Detector resources
See Amazon Fraud Detector Notebooks for code samples and more.