Performing batch fraud predictions using Amazon Fraud Detector, Amazon S3, and AWS Lambda

Amazon Fraud Detector is a fully managed service that makes it easy to identify potentially fraudulent online activities, such as the creation of fake accounts or online payment fraud. Unlike general-purpose machine learning (ML) packages, Amazon Fraud Detector is designed specifically to detect fraud. Amazon Fraud Detector combines your data, the latest in ML science, and more than 20 years of fraud detection experience from Amazon.com and AWS to build ML models tailor-made to detect fraud in your business.

This post walks you through how to use Amazon Fraud Detector with Amazon Simple Storage Service (Amazon S3) and AWS Lambda to perform a batch of fraud predictions on event records (such as account registrations and transactions) in a CSV file. This architecture enables you to trigger a batch of predictions automatically upon uploading your CSV file to Amazon S3 and retrieve the fraud prediction results in a newly generated CSV also stored in Amazon S3.

Solution overview

Amazon Fraud Detector can perform low-latency fraud predictions, enabling your company to dynamically adjust the customer experience in your applications based on real-time fraud risk detection. But suppose you want to generate fraud predictions for a batch of events after the fact; perhaps you don’t need a low-latency response and want to evaluate events on an hourly or daily schedule. How do you accomplish this using Amazon Fraud Detector? One approach is to use an Amazon S3 event notification to trigger a Lambda function that processes a CSV file of events stored in Amazon S3 when the file is uploaded to an input S3 bucket. The function runs each event through Amazon Fraud Detector to generate predictions using a detector (ML model and rules) and uploads the prediction results to an S3 output bucket. The following diagram illustrates this architecture.

To create this Lambda-based batch prediction system, you complete the following high-level steps:

Create and publish a detector version containing a fraud detection model and rules, or simply a ruleset.
Create two S3 buckets. The first bucket is used to land your CSV file, and the second bucket is where your Lambda function writes the prediction results to.
Create an AWS Identity and Access Management (IAM) role to use as the execution role in the Lambda function.
Create a Lambda function that reads in a CSV file from Amazon S3, calls the Amazon Fraud Detector get_event_prediction function for each record in the CSV file, and writes a CSV file to Amazon S3.
Add an Amazon S3 event trigger to invoke your Lambda function whenever a new CSV file is uploaded to the S3 bucket.
Create a sample CSV file of event records to test the batch prediction process.
Test the end-to-end process by uploading your sample CSV file to your input S3 bucket and reviewing prediction results in the newly generated CSV file in your output S3 bucket.

Creating and publishing a detector

You can create and publish a detector version using the Amazon Fraud Detector console or via the APIs. For console instructions, see Get started (console) or Amazon Fraud Detector is now Generally Available. After you complete this step, note the following items, which you need in later steps:

AWS Region you created the detector in
Detector name and version
Name of the entity type and event type used by your detector
List of variables for the entity type used in your detector

The following screenshot shows the detail view of a detector version.

The following screenshot shows the detail view of an event type.

Creating the input and output S3 buckets

Create the following S3 buckets on the Amazon S3 console:

fraud-detector-input – Where you upload the CSV file containing events for batch predictions
fraud-detector-output – Where the Lambda function writes the prediction results file

Make sure you create your buckets in the same Region as your detector. For more information, see How do I create an S3 Bucket?

Creating the IAM role

To create the execution role in IAM that gives your Lambda function permission to access the AWS resources required for this solution, complete the following steps:

On the IAM console, choose Roles.
Choose Create role.
Select Lambda.
Choose Next.
Attach the following policies:
- AWSLambdaBasicExecutionRole – Provides the Lambda function with write permissions to Amazon CloudWatch Logs.
- AWSXRayDaemonWriteAccess – Allows the AWS X-Ray daemon to relay raw trace data and retrieve sampling data to be used by X-Ray.
- AmazonFraudDetectorFullAccessPolicy – Provides permissions to create resources and generate fraud predictions in Amazon Fraud Detector.
- AmazonS3FullAccess – Provides the Lambda function permissions to read and write objects in Amazon S3. This policy provides broad Amazon S3 access; as a best practice, consider reducing the scope of this policy to the S3 buckets required for this example, or use an inline policy such as the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::fraud-detector-input/*",
                "arn:aws:s3:::fraud-detector-output/*"
            ]
        }
    ]
}

Choose Next.
Enter a name for your role (for example, lambda-s3-role).
Choose Create role.

Creating the Lambda function

Now let’s create our Lambda function on the Lambda console.

On the Lambda console, choose Create function.
For Function name, enter a name (for example, afd-batch-function).
For Runtime, choose Python 3.8.
For Execution role, select Use an existing role.
For Existing role, choose the role you created.

Choose Create function

Next, we walk through sections of the code used in the Lambda function. This code goes into the Function code section of your Lambda function. The full Lambda function code is available in the next section.

Packages

import json
import csv
import boto3

Defaults

# -- make a connection to fraud detector -- 
client = boto3.client("frauddetector")
# -- S3 bucket to write scored data to -- 
S3_BUCKET_OUT = "fraud-detector-output"
# -- specify event, entity, and detector  -- 
ENTITY_TYPE    = "customer"
EVENT_TYPE     = "new_account_registration_full_details"
DETECTOR_NAME  = "new_account_detector"
DETECTOR_VER   = "3"

We have entered the values from the detector we created and the output S3 bucket. Replace these default values with the values you used when creating your output S3 bucket and Amazon Fraud Detector resources.

Functions

We use a few helper functions along with the main lambda_handler() function:

get_event_variables(EVENT_TYPE) – Returns a list of the variables for the event type. We map these to the input file positions.
prep_record(record_map, event_map, line) – Returns a record containing just the data required by the detector.
get_score(event, record) – Returns the fraud prediction risk scores and rule outcomes from the Amazon Fraud Detector get_event_predictionfunction. The get_score function uses two extra helper functions to format model scores (prep_scores) and rule outcomes (prep_outcomes).

Finally, the lambda_handler(event, context) drives the whole process. See the following example code:

get_event_variables(EVENT_TYPE)
def get_event_variables(EVENT_TYPE):
    """ return list of event variables 
    """
    response = client.get_event_types(name=EVENT_TYPE)
    event_variables = []

    for v in response['eventTypes'][0]['eventVariables']:
        event_variables.append(v)
    return event_variables
prep_record(record_map, event_map, line)
def prep_record(record_map, event_map, line):
    """ structure the record for scoring 
    """
    record = {}
    for key in record_map.keys():
        record[key] = line[record_map[key]]
        
    event = {}
    for key in event_map.keys():
        event[key] = line[event_map[key]]
    return record, event

prep_scores(model_scores)
def prep_scores(model_scores):
    """ return list of models and scores
    """
    detector_models = []
    for m in model_scores:
        detector_models.append(m['scores'])
    return detector_models

prep_outcomes(rule_results)
def prep_outcomes(rule_results):
    """ return list of rules and outcomes 
    """
    detector_outcomes = []
    for rule in rule_results:
        rule_outcomes ={}
        rule_outcomes[rule['ruleId']] = rule['outcomes']
        detector_outcomes.append(rule_outcomes)
    return detector_outcomes 

def get_score(event, record):
def get_score(event, record):
    """ return the score to the function
    """
    pred_rec = {}
    
    try:
        pred = client.get_event_prediction(detectorId=DETECTOR_NAME, 
                                       detectorVersionId=DETECTOR_VER,
                                       eventId = event['EVENT_ID'],
                                       eventTypeName = EVENT_TYPE,
                                       eventTimestamp = event['EVENT_TIMESTAMP'], 
                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':event['ENTITY_ID']}],
                                       eventVariables=  record) 
                                       
        pred_rec["score"]   = prep_scores(pred['modelScores'])
        pred_rec["outcomes"]= prep_outcomes(pred['ruleResults'])

    except: 
        pred_rec["score"]   = [-999]
        pred_rec["outcomes"]= ["error"]
    
    return pred_rec

The following is the full code for the Lambda function:

import boto3 
import csv
import json

# -- make a connection to fraud detector -- 
client = boto3.client("frauddetector")

# -- S3 bucket to write batch predictions out to -- 
S3_BUCKET_OUT = "fraud-detector-output"

# -- specify event, entity, and detector  -- 
ENTITY_TYPE    = "customer"
EVENT_TYPE     = "new_account_registration_full_details"
DETECTOR_NAME  = "new_account_detector"
DETECTOR_VER   = "3"

def get_event_variables(EVENT_TYPE):
    """ return list of event variables 
    """
    response = client.get_event_types(name=EVENT_TYPE)
    event_variables = []

    for v in response['eventTypes'][0]['eventVariables']:
        event_variables.append(v)
    return event_variables

def prep_record(record_map, event_map, line):
    """ structure the record for scoring 
    """
    record = {}
    for key in record_map.keys():
        record[key] = line[record_map[key]]
        
    event = {}
    for key in event_map.keys():
        event[key] = line[event_map[key]]
    return record, event

def prep_scores(model_scores):
    """ return list of models and scores
    """
    detector_models = []
    for m in model_scores:
        detector_models.append(m['scores'])
    return detector_models

def prep_outcomes(rule_results):
    """return list of rules and outcomes
    """
    detector_outcomes = []
    for rule in rule_results:
        rule_outcomes = {}
        rule_outcomes[rule['ruleId']] = rule['outcomes']
        detector_outcomes.append(rule_outcomes)
    return detector_outcomes

def get_score(event, record):
    """ return the score to the function
    """
    pred_rec = {}
    
    try:
        pred = client.get_event_prediction(detectorId=DETECTOR_NAME, 
                                       detectorVersionId=DETECTOR_VER,
                                       eventId = event['EVENT_ID'],
                                       eventTypeName = EVENT_TYPE,
                                       eventTimestamp = event['EVENT_TIMESTAMP'], 
                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':event['ENTITY_ID']}],
                                       eventVariables=  record) 
                                       
        pred_rec["score"]   = prep_scores(pred['modelScores'])
        pred_rec["outcomes"]= prep_outcomes(pred['ruleResults'])

    except: 
        pred_rec["score"]   = [-999]
        pred_rec["outcomes"]= ["error"]
    
    return pred_rec

def lambda_handler(event, context):
    """ the lambda event handler triggers the process. 
    """
    S3_BUCKET_IN = event['Records'][0]['s3']['bucket']['name']
    S3_FILE      = event['Records'][0]['s3']['object']['key']
    S3_OUT_FILE  = "batch_{0}".format(S3_FILE)
    
    
    # -- open a temp file to write predictions to. 
    f = open("/tmp/csv_file.csv", "w+")
    temp_csv_file = csv.writer(f) 
    
    # -- get the input file -- 
    s3    = boto3.resource('s3')
    obj   = s3.Object(S3_BUCKET_IN, S3_FILE)
    data  = obj.get()['Body'].read().decode('utf-8').splitlines()
    lines = csv.reader(data)
    
    # -- get the file header -- 
    file_variables = next(lines)
    
    # -- write the file header to temporary file -- 
    temp_csv_file.writerow(file_variables + ["MODEL_SCORES", "DETECTOR_OUTCOMES"])
    
    # -- get list of event variables -- 
    event_variables = get_event_variables(EVENT_TYPE)
    
    # -- map event variables to file structure -- 
    record_map = {}
    for var in event_variables:
        record_map[var] = file_variables.index(var)
    
    # -- map event fields to file structure --
    event_map = {}
    for var in ['ENTITY_ID', 'EVENT_ID', 'EVENT_TIMESTAMP']:
        event_map[var] = file_variables.index(var)
    
   # -- for each record in the file, prep it, score it, write it to temp. 
    for i,line in enumerate(lines):
        record, event       = prep_record(record_map, event_map, line)
        record_pred         = get_score(event, record)
        #print(list(record_pred.values()))
        temp_csv_file.writerow(line + list(record_pred.values()))
    
    
    # -- close the temp file and upload it to your OUTPUT bucket    
    f.close()
    s3_client = boto3.client('s3')
    s3_client.upload_file('/tmp/csv_file.csv', S3_BUCKET_OUT, "batch_pred_results_" + S3_FILE  )
    
    return {
        'statusCode': 200,
        'body': json.dumps('Batch Complete!')
    }

After you add the code to your Lambda function, choose Deploy to save.

Configuring your Lambda settings and creating the Amazon S3 trigger

The batch prediction processes require memory and time to process, so we need to change the Lambda function’s default memory allocation and maximum run time.

On the Lambda console, locate your function.
On the function detail page, under Basic settings, choose Edit.
For Memory, choose 2048 MB.
For Timeout, enter 15 min.
Choose Save.

A 15-minute timeout allows the function to process up to roughly 4,000 predictions per batch, so you should keep this in mind as you consider your CSV file creation and upload strategy.

You can now make it so that this Lambda function triggers when a CSV file is uploaded to your input S3 bucket.

At the top of the Lambda function detail page, in the Designer box, choose Add trigger.
Choose S3.
For Bucket, choose your input S3 bucket.
For Suffix, enter .csv.

A warning about recursive invocation appears. You don’t want to trigger a read and write to the same bucket, which is why you created a second S3 bucket for the output.

Select the check-box to acknowledge the recursive invocation warning.
Choose Add.

Creating a sample CSV file of event records

We need to create a sample CSV file of event records to test the batch prediction process. In this CSV file, include a column for each variable in your event type schema. In addition, include columns for:

EVENT_ID – An identifier for the event, such as a transaction number. The field values must satisfy the following regular expression pattern: ^[0-9a-z_-]+$.
ENTITY_ID – An identifier for the entity performing the event, such as an account number. The field values must also satisfy the following regular expression pattern: ^[0-9a-z_-]+$.
EVENT_TIMESTAMP – A timestamp, in ISO 8601 format, for when the event occurred.

Column header names must match their corresponding Amazon Fraud Detector variable names exactly.

In your CSV file, each row corresponds to one event that you want to generate a prediction for. The following screenshot shows an example of a test CSV file.

For more information about Amazon Fraud Detector variable data types and formatting, see Create a variable.

Performing a test batch prediction

To test our Lambda function, we simply upload our test file to the fraud-detector-input S3 bucket via the Amazon S3 console. This triggers the Lambda function. We can then check the fraud-detector-output S3 bucket for the results file.

The following screenshot shows that the test CSV file 20_event_test.csv is uploaded to the fraud-detector-input S3 bucket.

When batch prediction is complete, the results CSV file batch_pred_results_20_event_test.csv is uploaded to the fraud-detector-output S3 bucket (see the following screenshot).

The following screenshots show our results CSV file. The new file has two new columns: MODEL_SCORES and DETECTOR_OUTCOMES. MODEL_SCORES contains model names, model details, and prediction scores for any models used in the detector. DETECTOR_OUTCOMES contains all rule results, including any matched rules and their corresponding outcomes.

If the results file doesn’t appear in the output S3 bucket, you can check the CloudWatch log stream to see if the Lambda function ran into any issues. To do this, go to your Lambda function on the Lambda console and choose the Monitoring tab, then choose View logs in CloudWatch. In CloudWatch, choose the log stream covering the time period you uploaded your CSV file.

Conclusion

Congrats! You have successfully performed a batch of fraud predictions. Depending on your use case, you may want to use your prediction results in other AWS services. For example, you can analyze the prediction results in Amazon QuickSight or send results that are high risk to Amazon Augmented AI (Amazon A2I) for a human review of the prediction.

Amazon Fraud Detector has a 2-month free trial that includes 30,000 predictions per month. After that, pricing starts at $0.005 per prediction for rules-only predictions and $0.03 for ML-based predictions. For more information, see Amazon Fraud Detector pricing. For more information about Amazon Fraud Detector, including links to additional blog posts, sample notebooks, user guide, and API documentation, see Amazon Fraud Detector.

The next step is to start dropping files into your S3 bucket! Good luck!

About the Authors

Nick Tostenrude is a Senior Manager of Product in AWS, where he leads the Amazon Fraud Detector service team. Nick joined Amazon nine years ago. He has spent the past four years as part of the AWS Fraud Prevention organization. Prior to AWS, Nick spent five years in Amazon’s Kindle and Devices organizations, leading product teams focused on the Kindle reading experience, accessibility, and K-12 Education.

Mike Ames is a Research Science Manager working on Amazon Fraud Detector. He helps companies use machine learning to combat fraud, waste and abuse. In his spare time, you can find him jamming to 90s metal with an electric mandolin.

Artificial Intelligence