Observe.AI Cuts Costs by Over 50% with Machine Learning on AWS

Observe.AI developed and open-sourced the One Load Audit Framework on AWS to optimize machine learning model costs, boost developer efficiency, and scale to meet data growth.

Benefits

50%

lower costs by fine-tuning instance sizes

10x

higher data loads supported

Overview

Observe.AI uses conversation intelligence to uncover insights from live and post-customer interactions, helping companies increase contact center agent performance. The company developed and open-sourced the One Load Audit Framework (OLAF), which integrates with Amazon SageMaker to automatically find bottlenecks and performance problems in machine learning services.

Using OLAF to load-test Amazon SageMaker instances, Observe.AI reduced machine learning costs by over 50 percent, lowered development time from one week to hours, and facilitated on-demand scaling to support a tenfold growth in data load size.

About Observe.AI

Observe.AI is a solution for boosting contact center performance through live conversation intelligence. Utilizing a robust 30-billion-parameter contact center large language model (LLM) and a generative AI engine, Observe.AI extracts valuable insights from every customer interaction. Trusted by companies, Observe.AI is a valued partner in accelerating positive results across the entire business landscape.

Opportunity | Predicting ML Data Load Sizes for Enhanced Efficiency

Observe.AI optimizes the customer experience through an artificial intelligence (AI)-powered workforce platform. Employing a large language model (LLM) designed for contact centers, Observe.AI enhances contact center agent performance and extracts insights from customer interactions using conversation intelligence. Each month, the platform processes millions of conversations and generates hundreds of inferences per conversation.

As machine learning (ML) adoption continues to grow across industries, testing the performance of customers’ ML services under varying data loads has become increasingly crucial for Observe.AI. Aashraya Sachdeva, staff engineer in machine learning at Observe.AI, says, "While onboarding new customers, we were assessing our ML system's capability to handle a tenfold increase in data load, corresponding to the tenfold rise in conversations processed daily. Our ML engineers and scientists faced challenges in accurately predicting this capability when transitioning models from research to production."

The company sought to deploy a larger ML model in production for enhanced accuracy. Simultaneously, there was a careful effort to manage latency and control costs associated with the implementation. Achieving an optimal return on investment through fine-tuning its infrastructure was key, and the business wanted a solution compatible with its existing Amazon Web Services (AWS) environment.

"We sought a more straightforward method to identify the optimal infrastructure, assess our readiness for increased load, and determine the associated costs for serving code to customers. We also wanted precise insights into the developer time required for implementation," Aashraya explains.

Solution | Building the One Load Audit Framework on AWS

To address its challenge of predicting ML load sizes, Observe.AI created and open-sourced the One Load Audit Framework (OLAF). Integrated with Amazon SageMaker, a service that builds, trains, and deploys ML models for any use case, OLAF identifies bottlenecks and performance issues in ML services, offering latency and throughput measurements under both static and dynamic data loads. The framework also seamlessly incorporates ML performance testing into the software development lifecycle, facilitating accurate provisioning and cost savings.

Aashraya explains, "OLAF provides our ML engineers and scientists with a plug-and-play model. They simply input their AWS credentials and the Amazon SageMaker endpoint, and the tool conducts load testing, providing latency numbers and expected errors for a particular model or instance."

Following the initial build, Observe.AI integrated Amazon SageMaker features into OLAF, including multi-container deployment and batch inferencing. "We wanted to understand how these incremental features affect scalability in terms of cost," adds Aashraya.

Next, the company incorporated Amazon Simple Queue Service (Amazon SQS), a fully managed message queuing service for microservices, distributed systems, and serverless applications. By downloading Amazon SQS load traces, OLAF users can observe the rate at which ML messages enter the system to predict data load size. Aashraya notes, "This feature assists us in easily testing queue-based array processing systems, which are becoming more prevalent."

Finally, Observe.AI integrated Amazon Simple Notification Service (Amazon SNS), a fully managed service for application-to-application and application-to-person messaging that helps OLAF users replicate specific patterns within Amazon SNS.

Outcome | Optimizing Costs and Boosting Developer Efficiency

Launched in 2022, OLAF by Observe.AI is now actively employed by dozens of ML engineers and researchers for testing and predicting data loads. By using OLAF, Observe.AI has cut LLM costs by conducting load tests on Amazon SageMaker instances, identifying the most suitable configuration aligned with the company’s business metrics.

Aashraya explains, "Our research team encountered higher costs than anticipated when deploying an LLM, as well as other ML models, with specific latency and throughput requirements into production. However, through fine-tuning Amazon SageMaker instance sizes with OLAF while maintaining a constant data input load, we optimized costs for our ML models deployments by over 50 percent. This process ensured the best return on investment."

Previously, Observe.AI developers had to write multiple scripts and construct numerous pipeline workflows, resulting in a complex array of onboarding data transfers and debugging systems. Aashraya notes. "Because OLAF is tightly integrated with AWS, it now only takes developers a few hours to determine the proper instance for use, a task that used to take one week. As a result, developers can allocate more time to testing data loads and creating new features."

With the integration of OLAF, Observe.AI can scale its services to accommodate a tenfold increase in data load. The company can now conduct stress testing more easily and accurately, providing valuable assistance to customers who have augmented their data loads.

Aashraya explains, "If a customer doubles their data load, we now have a clearer understanding of our infrastructure's capacity. Using OLAF and AWS, we can replicate and precisely increase the load by 100 percent, anticipating potential breakpoints or database issues. This not only helps us better prepare our customers for such scenarios but also brings internal cost and development benefits."

Learn More

To learn more, visit aws.amazon.com/ai/machine-learning/.

Through fine-tuning Amazon SageMaker instance sizes with OLAF while maintaining a constant data input load, we optimized costs for our LLM deployment by over 50 percent. This process ensured the best return on investment.

Aashraya Sachdeva

Staff Engineer, Machine Learning at Observe.AI

AWS Services Used

Amazon SageMaker

The next generation of Amazon SageMaker is the center for all your data, analytics, and AI.

Learn more

Amazon Simple Queue Service (Amazon SQS)

Fully managed message queues for microservices, distributed systems, and serverless applications.

Learn more

Amazon Simple Notification Service (Amazon SNS)

Fully managed Pub/Sub service for A2A and A2P messaging.

Learn more

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Observe.AI Cuts Costs by Over 50% with Machine Learning on AWS

Benefits

Overview

About Observe.AI

Opportunity | Predicting ML Data Load Sizes for Enhanced Efficiency

Solution | Building the One Load Audit Framework on AWS

Outcome | Optimizing Costs and Boosting Developer Efficiency

Learn More

AWS Services Used

Amazon SageMaker

Amazon Simple Queue Service (Amazon SQS)

Amazon Simple Notification Service (Amazon SNS)

Get Started

Did you find what you were looking for today?

Learn

Resources

Developers

Help