AppsFlyer Prevents Mobile Ad Fraud through Cost-Effective Processing of 100 Billion Events per Day on AWS
2020
In 2019, nearly a quarter of all nonorganic mobile app installations were fraudulent; in other words, marketing budgets for one in four app installs were stolen due to various reasons, such as bots—instead of actual people—clicking on and installing those apps. Advertising fraud is expensive for businesses and makes marketing and advertising data far less reliable.
In an effort to combat mobile app fraud, mobile attribution company AppsFlyer manages, measures, identifies, and blocks fraudulent installs with its Protect360 antifraud solution powered by Amazon Web Services (AWS). Primarily using the cost-effective compute power of Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances—combined with Savings Plans or Reserved Instances and On-Demand Instances—AppsFlyer provisions its machine learning algorithm, which processes 100 billion events per day, and saves its customers $8.1 million per day by preventing fraudulent activity.
If we reach a certain threshold, we can provision more Amazon EC2 Spot Instances immediately, and we’re able to handle the load."
Ido Berkovitch
Research and Development Director, AppsFlyer
From Detection to Prevention
App advertisers, along with their publishers and media partners, rely on knowing who their customers are, what they’re looking for, and how they’re finding products. “There’s a lot of money involved in the user-acquisition industry,” says Elena Levi, senior product manager for Protect360 at AppsFlyer. “And fraudsters try to take advantage of it. So it’s crucial to filter out fraudulent engagements.”
When Protect360 launched in 2017, the product initially only identified fraudulent behavior. Eventually AppsFlyer used its own machine learning models to expand Protect360’s capabilities to include fraud prevention as well. “We apply several protection layers when dealing with fraud,” says Levi. “Our first protection layer identifies a single installation when it arrives and decides in real time if it’s fraudulent or not. If it is, we block it or correct its attribution to the rightful owner. Beyond examining a single install, we also examine clusters: we take a group of installs, analyze the group’s characteristics and behavioral pattern, and decide whether it is statistically probable. If a cluster deviates from normal behavior patterns in a manner that is statistically significant, we’ll label these installs as fraudulent and block anything coming from that cluster.”
AppsFlyer also lets its customers add their own business logic on top of this system to better fit the advertiser’s unique requirements and apply them for fraud detection—so that they could identify cases where a specific app version was hacked, for example. Protect360 also offers its advertisers tools for additional fraud investigation and examination, giving advertisers access to their activity data alongside indicators and insights that could highlight fraud based on the advertiser’s unique business logic. Because fraudsters constantly test AppsFlyer’s fraud protection solution with new fraud methods and patterns, not all of these can be caught in real time. For these cases, AppsFlyer applies its unique postattribution fraud detection layer, which identifies fraudulent clusters as they materialize and blocks their installs from the moment of identification—and, retrospectively, provides a level of protection that leaves undetected fraud at its absolute minimum.
Machine Learning Algorithms and Compute
Protect360 does all this for 100 billion daily events, writing 30 TB of data per day to its data lake in Amazon Simple Storage Service (Amazon S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. For machine learning in this use case, Protect360 uses statistical algorithms—maximum likelihood estimation, a method of estimating the parameters of a probable distribution, and Bayesian networks, an advanced probabilistic decisioning model. Clusters and patterns are reviewed through maximum likelihood estimation checks to determine abnormal behavior patterns, while Bayesian networks algorithms are applied for identifying fraudulent activity on a single-install level, determining the probability of each install to be fraudulent in real time and with maximum accuracy. Protect360 then blocks installs that provide very strong indications of being fraudulent, which account for the majority of fraud detected. In doing so, it maintains a false positive rate of 0.5 percent or lower in probabilistic algorithms in addition to deterministic algorithms. The product blocks millions of installs per day in real time, making decisions in 0.5 ms. “We also block in-app events after the install occurs,” says Levi. “By blocking fraud both for the installed app and its in-app events, we let customers make their decisions based on clean data and look at the real return on investment to both optimize their user acquisition funnels and optimize their in-app event activity. This is the main value of the product.”
Processing 100 billion events per day with its machine learning pipeline requires powerful computers. In batch-processing workloads, Protect360 relies wholly on Amazon EC2 Spot Instances, which enable AppsFlyer to run hyperscale workloads at 75–80 percent less than the On-Demand price. “If we reach a certain threshold, we can provision Amazon EC2 Spot Instances immediately, and we’re able to handle the load,” adds Ido Berkovitch, research and development director at AppsFlyer. The company manages and automatically scales Spot Instances using Amazon EC2 Auto Scaling groups, which give the company the ability to choose between allocation strategies for the lowest price and optimized capacity.
Amazon EC2 Auto Scaling groups also enable AppsFlyer to configure multiple instance types to increase diversification and thus reduce the Spot Instances interruption rate. In Protect360, Amazon EC2 Spot Instances account for 70–90 percent of the real-time workloads while Reserved Instances and Savings Plans account for 10–30 percent to reduce risk. AppsFlyer sometimes uses On-Demand Instances, which enable it to pay for compute capacity by the second. “About 1 percent of our real-time usage is On-Demand,” says Berkovitch.
With the amount and variety of compute capacity it requires, AppsFlyer has found plenty of opportunities to experiment on AWS, and the company has discovered it can save money by opting for the newer generation of the compute-optimized instances. “For example, we moved from M4 instances to C5 instances, which are a more expensive type,” says Berkovitch, noting those instances comprise all types, including Reserved Instances and Amazon EC2 Spot Instances. “But eventually we were able to reduce the number of instances and actually cut service costs by 25 percent.”
Delivering the Best-Possible Value to Customers
AppsFlyer isn’t the only one saving on costs: Protect360, powered by Amazon EC2 instances, saves AppsFlyer customers an average of $8.1 million every day in fraudulent install funds that would otherwise go to fraudsters. And “the actual savings are a lot higher because the direct savings are not necessarily the entire savings for the customer,” says Michel Hayet, product marketing manager for Protect360. “You have to take into account the alternative costs of how our customers could have used the budgets lost to fraud. We help our customers distinguish between fraudulent sources and legitimate sources to help them make better decisions moving forward and take fraud out of their return on investment equation.”
AppsFlyer is always looking to add new services, features, and functionality using other AWS services. “We are in charge of another product in AppsFlyer called Validation Rules, which enables advertisers to define rules about how to handle their own traffic,” says Berkovitch. As it continues building Validation Rules, AppsFlyer expects to use Amazon DynamoDB, a key-value and document database that delivers single-digit millisecond performance at any scale.
AppsFlyer has found great success so far in staying ahead of fraud and will continue to do so on AWS. “Fraud evolves all the time,” says Levi. “It’s an arms race. That’s why our fraud-analysis team keeps its finger on the pulse and why we do everything we can to stay ahead.”
To learn more, visit aws.amazon.com/advertising-marketing
About AppsFlyer
Founded in 2011, AppsFlyer is a software-as-a-service mobile-marketing analytics and attribution platform. Operating out of 18 global offices, AppsFlyer helps more than 12,000 customers track how end users interact with brands through various platforms, channels, and devices.
Benefits of AWS
- Saves customers $8.1 million per day through fraud prevention
- Runs batch processing 100% on Amazon EC2 Spot Instances
- Saves 75–80% using Amazon EC2 Spot vs. On-Demand Instances
- Cut service costs by 25% by switching from Amazon EC2 M4 to C5 instances
- Processes 100 billion events every day
- Stores 30 TB of data every day
- Maintains a 0.5% false positive rate or lower in probabilistic and deterministic algorithms
- Blocks millions of installs per day in real time, making decisions in 0.5 ms
AppsFlyer Reference Architecture
AWS Services Used
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS Cloud.
Amazon Simple Storage Service (Amazon S3)
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Amazon DynamoDB
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.