Sprinklr Reduces Machine Learning Inference Costs on AWS Inferentia

Overview

Sprinklr provides a unified customer experience management (Unified-CXM) platform that combines different applications for marketing, advertising, research, customer care, sales, and social media engagement. As a cloud-first company, Sprinklr is always looking to improve efficiency and optimize its cost savings. To help it achieve its goals, the company uses Amazon Web Services (AWS)—specifically Amazon Elastic Compute Cloud (Amazon EC2), which provides secure, resizable compute capacity in the cloud.

In 2021, Sprinklr had the opportunity to try Amazon EC2 Inf1 Instances, which are powered by AWS Inferentia, a high-performance machine learning (ML) inference chip built from the ground up optimized for ML inference applications. By migrating its real-time workloads on its Unified-CXM platform from GPU-based Amazon EC2 instances onto AWS Inferentia, Sprinklr has realized a significant cost savings and has seen latency reduce by more than 30 percent on those workloads. Moreover, by reducing latency, the company has also improved the performance of its products and services for its customers.

AI artificial intelligence concept Central Computer Processors CPU concept, 3d rendering, Circuit board, Technology background, Motherboard digital chip, Tech science background, machine learning

About Sprinklr

With advanced artificial intelligence, Sprinklr’s Unified-CXM platform helps companies deliver human experiences to every customer, every time, across any modern channel. Headquartered in New York City, Sprinklr works with over 1,000 global enterprises and over 50 percent of the Fortune 100.

Using ML to Create a Better Customer Experience

Sprinklr, founded in 2009, is an American software company with employees all over the world. The company is an early adopter of new AWS services, and its mission is to help organizations worldwide make their customers happier. It offers over 31 different software products across 4 robust product suites and has developed an advanced proprietary artificial intelligence engine for enterprises to analyze publicly available data and engage with customers across 30 digital and social channels. With Sprinklr, businesses can collaborate across teams internally and digital channels externally to create a better customer experience.

Sprinklr is always looking to improve its customer experience while lowering compute costs and optimizing efficiency. “Our goal is to always make use of the latest technology so that we can have greater cost savings,” says Jamal Mazhar, vice president of infrastructure and DevOps at Sprinklr. Sprinklr hoped to reduce latency while lowering its ML inference costs and looked to innovations from AWS to meet those challenges. “When we learned about AWS Inferentia, it was a natural process for us to take that into consideration for our cost initiative drives,” says Yogin Patel, senior director of product engineering, artificial intelligence at Sprinklr. With the goals of reducing compute costs and improving customer satisfaction, Sprinklr began testing Amazon EC2 Inf1 Instances in July 2021.

Working to Continually Improve Performance and Cost Savings

Sprinklr’s Unified-CXM platform uses ML algorithms on unstructured data sourced from many different channels to deliver insights to its customers. For example, the company’s natural language processing and computer vision ML models analyze different data formats sourced from social media posts, blog posts, video content, and other content available on public domains across more than 30 channels. Sprinklr is able to derive customer sentiment and intent from this content to deliver product insights to its customers. Currently, the company runs about 10 billion predictions per day across its more than 500 models. Sprinklr divides its workloads into two groups—latency optimized and throughput optimized. Latency refers to how long it takes for an inference to reach its destination, and throughput refers to the number of packets that are processed within a specific time period. “If latency goes down by 20 percent in even one model, that compounds to very large cost savings,” says Patel.

AWS Inferentia features a large amount of on-chip memory, which can be used for caching large models instead of storing them off chip. This can have a significant impact in reducing inference latency because the processing cores of AWS Inferentia, called NeuronCores, have high-speed access to models that are stored in on-chip memory and are not limited by the off-chip memory bandwidth. NeuronCores also provide high-performance inference in the cloud at significantly lower costs, and they make it easy for developers to integrate ML into their business applications.

When Sprinklr began migrating models to Amazon EC2 Inf1 Instances and running benchmark tests, the company saw latency reduce by more than 30 percent on the latency-optimized workloads. “We’re always interested in testing new AWS services, experimenting with workloads, and benchmarking new instances,” says Patel. Seeing the significantly reduced latency that AWS Inferentia was able to deliver in tests, Sprinklr decided to migrate all of its latency-optimized workloads to Amazon EC2 Inf1 Instances. “The goal is always to have lower latency, which means a better customer experience,” Mazhar says. “Using Amazon EC2 Inf1 Instances, we are able to achieve that.”

After migrating about 20 models to Amazon EC2 Inf1 Instances, Sprinklr started migrating its computer vision and text models for improved efficiency and cost savings. The team is now able to deploy a model using Amazon EC2 Inf1 Instances in under 2 weeks. As the company migrated to AWS Inferentia, it found the process simple with the ample resources and support available. “We have been able to quickly get in touch with the right teams,” says Mazhar. “The support from AWS helps us boost our customer satisfaction and staff productivity.”

Innovating to Improve Efficiency

As Sprinklr continues to migrate models to AWS Inferentia, it will add more voice-related models, including automatic speech recognition and intent recognition, to help businesses engage with their customers further. Sprinklr expects that deploying these models on AWS Inferentia will give its customers the performance and low latency they need at significantly lower costs.

The goal is always to have lower latency, which means a better customer experience. Using Amazon EC2 Inf1 Instances, we are able to achieve that.

Jamal Mazhar

Vice President of Infrastructure and DevOps, Sprinklr

AWS Services Used

Amazon EC2 Inf1

Amazon EC2 Inf1 instances deliver high-performance ML inference at the lowest cost in the cloud. Inf1 instances are built from the ground up to support machine learning inference applications.

Learn more

AWS Inferentia

AWS Inferentia is Amazon's first custom silicon designed to accelerate deep learning workloads and is designed to provide high performance inference in the cloud, to drive down the total cost of inference, and to make it easy for developers to integrate machine learning into their business applications.

Learn more

Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.

Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages