In late 2020, CleverTap entered a hypergrowth phase. As it grew, it began experiencing linearly increasing costs for its in-house event processing and storage infrastructure, called CleverTap Data Store. Serving as a middle layer in CleverTap’s tech stack, Data Store clusters accounted for 60 percent of CleverTap’s compute infrastructure and were thus a primary infrastructure cost center. The business wanted a better solution to store its increasing data load in a cost-efficient manner. The CleverTap platform answers millions of aggregate queries per day with average response times within a second, so maintaining performance was a priority.
Furthermore, as CleverTap evolved its products over the years, it added new ML capabilities that were prompting a need for high performance computing (HPC). The company began reevaluating its Data Store build, which was designed with an in-memory, time-series, row-based architecture.
CleverTap was born in the Amazon Web Services (AWS) Cloud and began looking for alternatives to an in-memory database backed by
Amazon Elastic Block Storage
(Amazon EBS), which, at the time, was its main data storage service. It decided to change the way it stored data from a row-based to a column-based structure in
Amazon Simple Storage Service
“Amazon S3 allowed us to store more data in an efficient way that’s scalable for our growth rate, without affecting performance,” says Lalitha Duru, VP of Engineering at CleverTap. Engineers then optimized their query engine to read only required data with minimal disk seeks. This optimization and the switch from an in-memory database in Amazon EBS to Amazon S3 led to a reduction in
Amazon Elastic Compute Cloud
(Amazon EC2) instances, which resulted in millions of dollars in savings per year.
Next, CleverTap began scouting for more cost-effective compute engines than the x86 processors it had been using. The company initiated a proof of concept with
. “We narrowed in on the Graviton2 family of processors since it offered a good balance between network and CPU performance,” Duru says.
CleverTap subscribes to
AWS Enterprise Support
and received assistance from its AWS technical account manager and Graviton specialists from around the globe. It took six months to migrate 40 petabytes of data and more than 1,000 instances from x86 to Graviton, which followed the company’s timeline. “We got excellent support from the AWS team in facilitating a smooth migration to Graviton,” adds Duru.
CleverTap found that Graviton2 instances were up to 20 percent less expensive than their x86 counterparts and offered equal or better performance. The instances also excelled in terms of performance while executing vectorized ML algorithms.
By adopting a data lake approach to storage with Amazon S3 and converting to Graviton2 instances for queries, CleverTap has reduced its overall compute requirements by 50–75 percent. It has also laid a strong foundation for HPC for increasing compute-intensive ML workloads. Duru summarizes, “We were able to store and query much larger datasets much faster without incurring a substantial cost for storing data that is not queried frequently.”
Following the success of the initial migration to Graviton, CleverTap is now migrating other systems in its tech stack, including its MongoDB cluster,
clusters, and internal microservices. To orchestrate microservices, it has started using
Amazon Elastic Container Service
(Amazon ECS). Engineers are collaborating with AWS to increase serverless workloads running on
, which is also compatible with Graviton instances.
Duru concludes, “We were able to achieve the best cost-performance by switching to a network columnar architecture leveraging Amazon S3 and Graviton.”