Mobvista Builds Data Platform on AWS to Process 100 Billion Ad Requests per Day
2020
Mobvista is a leading technology platform dedicated to driving global business growth in the digital age. The company was listed on the Hong Kong Stock Exchange in December 2018 as "the first stock of mobile intelligent marketing in the new economy," with 700 employees and offices in 18 cities around the world.
Mobvista has three major brands: AI-driven programmatic and interactive advertising platform Mintegral, performance-based marketing agency Nativex, and game analysis platform GameAnalytics, which have formed two business lines, namely mobile advertising and data analysis and technology, covering a global range of developers and customers. In the Appsflyer H2 2019 Performance Index, AI-driven programmatic and interactive advertising platform Mintegral rose to second place on the Global Growth Index and sixth on the Global Performance Index. In addition, GameAnalytics has served a total of 80,000 game developers, with an average of 1.75 billion monthly active users, covering more than one-third of the world's gaming user base.
Our machine learning platform built on AWS has significantly boosted our ad revenue performance. It has also allowed us to easily handle ad data requests from 1 billion independent mobile devices and process with efficiency over 100 billion online predictions daily, dramatically increasing our revenue and net profit."
Chao Cai
Group Vice President, Group Chief Architect, Mobvista
Challenges
An Average of 100 Billion Large-Scale Ad Requests per Day
As a mobile marketing platform, Mobvista provides 24/7 mobile marketing solutions and insights in more than 200 countries and regions across the world. Mobvista handles 100 billion ad requests per day, using data processing and machine learning to buy, sell, and serve ad opportunities at ultra-low latency.
The company processes over 20 PB of data per day and adds billions of events into its machine learning training model every hour, which means high storage throughput and elasticity are key requirements. Mobvista’s primary challenge is how to meet these two requirements while optimizing costs at the same time.
Low End-to-End Latency is a Key Industry Requirement
Another challenge comes from latency requirements present in the advertising industry. For each ad request, Mobvista’s end-to-end data latency must be less than 50 milliseconds. Mobvista then updates the user interest data into its low-latency data stores in real-time bidding servers, which can then act based on the latest user information to buy and deliver more effective, personalized ads for customers in real time. Ramon Zhu, group vice president for Mobvista, says, "When it comes to data latency, we must capture changes in users' interests in real time so that we can meet the demands for traffic monetization or maximize user acquisition efficiency."
Dual Challenge: Data Sparsity and Complexity
Mobvista’s machine learning platform also faces a two-layered issue, both in terms of data sparsity and data complexity. Mobvista’s Deep Neural Network (DNN) model has a sparse embedding layer that contains over 10 billion dimensional features. In addition, before entering the analysis phase, datasets including user ad click-through logs and training samples require complex preprocessing.
Why Amazon Web Services
Comprehensive and Powerful Cloud Services System
Since its inception, Mobvista has adopted a cloud-native philosophy and gradually established an Amazon Web Services (AWS)-based, full-stack mobile ad platform. The primary consideration for Mobvista’s machine learning pipeline is a high throughput of storage and elastic cluster resources, and after preliminary research, Mobvista began testing Amazon EC2 M5d Instances. Each M5d instance is equipped with a local NVMe disk, which delivers high performance when Mobvista sees traffic surges into Spark, the open source big data processing framework, during a large-scale data shuffling process. Subsequently, Mobvista decided to migrate its whole computing cluster to M5d instances and deployed thousands of instances with local NVMe disks running over 10 TB of shuffled data, tripling performance without increasing costs.
Mobvista also deployed all-online and partial-offline data clusters with Amazon EC2 Spot Instances in conjunction with Amazon EC2 Auto Scaling groups. These AWS services enabled Mobvista to easily scale up during peak hours and reduce costs for batch and real-time analysis resources by 50 percent.
"AWS provides us with a wide range of solutions, whether it is the diversified Amazon EC2 selection, complete elastic scalability, or strong data lake solutions, all of which are critical drivers for our platform’s successful growth," Zhu says.
Low-Latency, Cross-Regional Data Processing and Model Training Across the World
Mobvista uses PySpark scripts for data processing on its machine learning platform, Apache MXNet and Gluon API for model training, and MXNet C++SDK for online neural network model predictions. MXNet’s parameter server architecture supports deep neural network model training with a large-scale sparse embedding layer and dynamic batch processing, which runs petabyte-scale parameter analysis with less than 10-millisecond latency.
Using Apache MXNet Open Source Framework for Sparse Data
Mobvista chose Apache MXNet for its machine learning framework. The framework—combined with the company’s own set of customizations—enabled Mobvista to develop accurate predictions despite a scaled and sparse dataset. For its offline models, Mobvista used MXNet with Apache Spark running on AWS, reading data from Amazon Storage Service (Amazon S3) through Spark, and then transmitting the data to an MXNet framework for distributed model training.
Mobvista’s Senior Algorithm Architect Xu Chen says, "MXNet has offered strong architectural advantages based on the features of large-scale ad products and data sparsity. By adapting and redeveloping Spark, we were able to develop special features for our own platform and we aim to provide related services to other customers."
Results
Mobvista successfully launched a full-stack machine learning platform called MindAlpha in 2018 and released a version on the AWS Marketplace in October 2018.
Machine Learning Applied Across Workloads
The MindAlpha machine learning platform enables Mobvista to use these benefits across its business units and workloads—and drive low-latency, cost-efficient ad predictions at scale. The platform is used in workloads that include ad recalls, ad bidding, and ad ranking matching. "The one-stop, big data machine learning platform we built on AWS has improved our online monetization capability several times over," Zhu says. "It has allowed us to easily handle data requests from 1 billion independent mobile devices per day and process over 100 billion online predictions per day, increasing our revenue and net profit dramatically."
Increased Productivity and Efficiency at Reduced Costs
The machine learning platform also provides Mobvista developers with an end-to-end automated machine learning model developing experience. The platform can complete data analysis and model training operations with an average of more than 20,000 events per day. Xu Chen, Mobvista’s Senior Algorithm Architect, says, "After launching the one-stop big data machine learning platform, the artificial intelligence capabilities of Mobvista's product operations have seen significant growth. Through large-scale adaptive online learning, more than 50 percent of related manpower can be invested in different, more valuable work." In terms of cost optimization, through Mobvista’s machine learning models and services like Amazon EC2 Spot Instances, Amazon EC2 Auto Scaling groups, and Amazon EC2 M5d Instances, the platform can easily support 100 billion ad requests and 10 petabytes’ computing per day, saving the company $200,000 per month in costs.
Mobvista will continue to regularly experiment with new types of AWS compute instances, including Graviton instances, as well as GPU and ARM chips. "As we continue to develop our deep learning model, we will integrate AWS services like Amazon SageMaker to improve our distributed model training and streamline our machine learning efforts,” Zhu says.
Learn more about AWS for Advertising and Marketing: aws.amazon.com/advertising-marketing
Mobvista's Data Platform on AWS
Mobvista Reference Architecture
About Mobvista
Mobvista is a leading technology platform dedicated to driving global business growth in the digital age. The company was listed on the Hong Kong Stock Exchange in December 2018 as "the first stock of mobile intelligent marketing in the new economy," with 700 employees and offices in 18 cities around the world.
Benefits of AWS
- Increased revenue and net profits by processing over 100 billion online predictions per day
- Saved $200,000 per month in costs with Amazon EC2 Spot Instances, Amazon EC2 Auto Scaling groups, and Amazon EC2 M5d Instances
- Reduced costs for batch and real-time analysis by 50 percent
- Increased staff productivity
AWS Services Used
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS Cloud.
Amazon EC2 M5d Instances
Amazon EC2 M5 Instances are the next generation of the Amazon EC2 General Purpose compute instances.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.