Customer Stories / Software

2023

Cheetah Mobile Builds a Near-Real-Time Data Lake and Reduces Costs Using AWS | Case Study | AWS

Cheetah Mobile Inc. was founded in November 2010 by Fu Sheng. It is committed to making life better with technology in a world where man and machine coexist. It was released on May 8, 2014, in the New York Stock Exchange. Cheetah Mobile builds an AI-driven business matrix, covering three major sectors: software services, enterprise offshore services, and AI retail services. It empowers the industry with AI and strives to grow into the world’s leading internet company in the AI industry.

30%

cost savings

 

Near

real-time data analysis

3 days to minutes

Reduced time to data availability from 3 days to minutes

Overview

The data managed by businesses of all kinds is experiencing explosive growth. IDC research shows that the amount of data created from 2022 to 2024 will exceed all the data created in the past 30 years, and the emergence of generative AI will further accelerate the rapid development of data analysis. While accumulating a large amount of data, the top priority is to conduct in-depth mining and analysis through aggregated data and to use data to drive business, support decision-making, and optimize business processes for cost reduction and efficiency.

Chinese mobile internet company Cheetah Mobile Inc. Cheetah Mobile is the front runner of Chinese mobile internet companies going global. Using Amazon Web Services (AWS) cheetah mobile built a near-real-team data warehouse and reduced clickstream analytics costs by 30 percent.

Amazon Web Services Case Study:  cmcm

Opportunity | Using Amazon Redshift Serverless to Grab Opportunities from Data for Cheetah Mobile

Cheetah Mobile was founded in November 2010 and is committed to making life better with technology. On May 8, 2014, it was officially listed on the New York Stock Exchange. As of October 2023, Cheetah Mobile is strategically upgrading from mobile internet to AI-driven industrial internet, with security tools plus AI scenarios as the core, and it aims to form a Cheetah Mobile system covering software applications, mobile entertainment, AI, and other industry enterprises. Cheetah Mobile has been using AWS for several years, and as early as 2012, Cheetah Mobile ran the global mobile application App Clean Master on AWS.

At the beginning of 2023, Cheetah Mobile released a brand-new application globally and sent its related data to Cheetah Mobile’s analysis system. The system was based on another cloud service provider’s own database products and analysis tools, and the operation team of Cheetah Mobile can directly query the original database for business analysis.

However, with the increasing amount and complexity of data, the original data analysis architecture posed some challenges. First, the original database pricing model was based on the Cheetah Mobile operation team and business intelligence personnel directly operating on the data warehouse. Once writing query statements incorrectly, a wasted amount of computation and scanning will be produced. This may increase costs of client teams.
Second, the original database could not guarantee that all logs would be ingested into the database within a day and a near-real-time ingestion was not possible. For example, when the data scale was large, the data content of the day could only be queried after 3 days of ingestion. In addition, when the number of original database rows exceeded 200 million rows per day, there was the issue of missing logs, which presented a challenge for Cheetah Mobile’s continued growth.

Based on its long-time use of AWS, Cheetah Mobile decided to use AWS to migrate its user behavior data analysis workload. For its solution, the company chose Amazon Redshift Serverless, which businesses can use to get insights from data in seconds without having to manage data warehouse infrastructure.

kr_quotemark

After migrating to the near-real-time data warehouse using Amazon Redshift Serverless as the core, the cost of the application team in the clickstream analytics load was reduced by 30 percent.”

Han Feng
Technical Director, Cheetah Mobile Inc.

Solution | Building a Near-Real-Time Data Lake Using Amazon Redshift Serverless and Amazon S3

Cheetah Mobile built a data analysis solution based on Amazon Redshift Serverless. Cheetah Mobile then used this solution for proof of concept verification testing, and the test results showed that the solution could resolve the company’s issues with cost and data access, which led Cheetah Mobile to migrate its entire user behavior analysis workload to AWS. (See Figure 1: System Architecture Diagram of Cheetah Mobile on AWS below.) In conjunction with Amazon Redshift Serverless, Cheetah Mobile uses Amazon Redshift Query Editor v2.0, which is a web-based analyst workbench to explore, share, and collaborate on data with teams in SQL through a common interface. The company also uses Amazon Kinesis Data Streams, which is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale, and Amazon Simple Storage Service (Amazon S3), which is an object storage service offering cutting-edge scalability, data availability, security, and performance. Amazon Redshift Serverless not only can analyze data based on its own internal tables but also can query data in Amazon S3. Amazon Redshift and Amazon S3 can be seamlessly combined to achieve part of the intelligent data lake architecture. To communicate between services, Cheetah Mobile uses AWS Lambda, a serverless, event-driven compute service.

[Placeholder for Figure 1. System Architecture Diagram of Cheetah Mobile on AWS]

The solution pushes Nginx logs to Vector and then sends them to Amazon Kinesis Data Streams. The stream data is then ingested through an AWS Lambda function to run extract, transform, load (ETL) processes. The processed data is stored in Amazon S3, and a second AWS Lambda function is triggered through Amazon S3 to copy the data to the Amazon Redshift data warehouse. For near-real-time data querying, Cheetah Mobile uses Amazon Redshift Query Editor V2,makes it easy to query data using SQL and gain insights by visualizing results using charts and graphs with a few clicks. With this solution, the possibility of data loss is greatly reduced, and the company has improved data availability to the minute level compared to the 3-day waiting period it had before.

With the autoscaling capabilities of Amazon Redshift Serverless, Cheetah Mobile only pays for the computing capacity consumed by the data warehouse when it is active, so the company has its costs under control. “After migrating to the near-real-time data warehouse using Amazon Redshift Serverless as the core, the cost of the application team in the clickstream analytics load was reduced by 30 percent,” says Han Feng, technical director of Cheetah Mobile. The company expects to process 20 TB of new logs per day in the future, so this cost optimization is crucial.

Schematic diagram of Cheetah Mobile’s system architecture based on Amazon Web services

Outcome | Continuing the Migration to AWS

Using AWS solutions, Cheetah Mobile has extended batch analysis capabilities to near-real-time analysis, and its data is stored in a data warehouse with low latency and high throughput. Additionally, Cheetah Mobile has complete control over its own data and can manage private data in any way it wants. AWS provides high security and encryption services to prevent unauthorized access and builds data-related services with ultrahigh data privacy and security standards.

This serverless database migration is just the beginning of Cheetah Mobile’s projects using AWS in this field. In the future, Cheetah Mobile will gradually migrate the rest of its applications and plan to build more new application loads directly on AWS.

About Cheetah Mobile

Cheetah Mobile Inc. was founded in November 2010 by Fu Sheng. It is committed to making life better with technology in a world where man and machine coexist. It was released on May 8, 2014, in the New York Stock Exchange. Cheetah Mobile builds an AI-driven business matrix, covering three major sectors: software services, enterprise offshore services, and AI retail services. It empowers the industry with AI and strives to grow into the world’s leading internet company in the AI industry.

AWS Services Used

Amazon Redshift Serverless

Get insights from data in seconds without having to manage data warehouse infrastructure

Amazon S3

Object storage built to retrieve any amount of data from anywhere

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.

Amazon Redshift Streaming Ingestion

Generate near real time insights through streaming data ingestion into your data warehouse and data visualizations

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.