Detecting fraud in games using machine learning
As video games rise in popularity and more games move toward free-to-play models, there’s more opportunity for fraudulent behavior among players. Fraud is problematic for studios because it devalues in-game currency that’s bought with real money and the digital goods that can be purchased with it. Fraud also causes players to lose trust in the community and the developer. It may cause players to stop playing your game, impacting your game’s lifetime value.
Video games have unique challenges when it comes to detecting fraud. For example, purchases made in games are typically smaller in value and more frequent than purchases made through online retailers. This means there are many possible instances of fraud that need to be investigated by relatively small teams. Another unique challenge is that the payer is not always the player. Account sharing is common in games, with children often using a parent’s payment information—and sometimes without their knowledge. This can lead to erratic and unpredictable spending that looks like fraud. And with games now integrating more social features, opportunities for social engineering and phishing are more common than ever. This can potentially lead to compromised accounts and unauthorized purchases.
With all of these factors combined, hardcoded or heuristically-based approaches to detecting fraud are unreliable. As malicious players become craftier, game developers need to do more than just understand and predict player behavior. Developers must be able to create an automated system to identify these fraudulent events. Detecting and managing fraud can often be difficult. However, models trained using machine learning algorithms help make use of the data game developers already have—automating escalation and taking action against cases of suspicious behavior.
Game developers don’t always have the resources or expertise to build and maintain such a system. The Fraud Detection Using Machine Learning solution helps developers get up and running—so they can train and run their own machine learning models that help detect in-game fraud. The following diagram reviews the architecture of this solution:
This solution enables you to run automated transaction processing. The machine learning (ML) model detects potentially fraudulent activity and flags it for review. The solution provides a dataset of credit card transactions contained in an Amazon S3 bucket, but you can modify the solution to use datasets for other types of fraudulent behavior.
The crux of the solution is an Amazon SageMaker notebook instance with two ML models: random cut forest (RCF) and XGBoost. RCF is an unsupervised algorithm for detecting anomalous data points within a dataset. RCF works well for detecting fraud because it looks for unusual patterns in all of the data points you give it. For example, a model trained with RCF might recognize that power-up purchases from City A is normal but a hundred of them from City B is not. However, games can have events that look fraudulent but, in reality, are not. This is where you can use the supervised XGBoost algorithm to train a model based on your own data. You can teach your model the unique spending behavior in your game by identifying fraudulent events versus normal purchases. For more information, read our RCF documentation and XGBoost documentation.
The solution also deploys an AWS Lambda function that processes transactions from the example dataset and invokes the two Amazon SageMaker endpoints that assign anomaly and classification scores to incoming data points. A REST API in Amazon API Gateway triggers predictions using signed HTTP requests. An Amazon Kinesis Data Firehose delivery stream then loads the processed transactions into another Amazon S3 bucket for storage.
This reference architecture also provides an example of how to invoke the prediction REST API as part of the Amazon SageMaker notebook.
Once the transactions have been loaded into Amazon S3, you can use analytics tools and services, such as Amazon QuickSight for visualization, reporting, one-time queries, and more detailed analysis.
There are a few key points to remember if you want to use your own dataset in the solution. Before running the notebook to train your model, make sure to change the location of your training dataset in the Amazon SageMaker notebook to the Amazon S3 bucket and path where it’s stored. And make sure to change the code inside the AWS Lambda function, so it correctly handles the format of your inference requests.
Before deploying your own fraud detection solution, keep in mind that Amazon SageMaker and Amazon Kinesis Data Firehose are currently only available in specific AWS Regions. Be sure to launch this solution in a Region where these services are available. For the most current service availability by Region, see the AWS Region Table.
Follow the Fraud Detection Using Machine Learning deployment guide to learn how to deploy your own fraud detection solution using an AWS CloudFormation template. And add even more intelligence to your games by attending one of our virtual workshops: