Skip to main content

Observe.AI achieves 15x increase in production scale on Amazon Nova, processing 100M tokens per minute

Learn how Agentic CX solution provider Observe.AI scales production inference using Amazon Nova for high-volume workloads

Benefits

90,000
requests per minute supported across regions
100
million tokens processed per minute at production scale
15x
increase in AI model consumption
40%
faster delivery of new AI capabilities

Overview

To support the growing scale of its AI-powered contact center intelligence platform, Observe.AI needed to process large volumes of conversational data with predictable performance and cost efficiency. The company built its generative AI workloads on Amazon Web Services (AWS) to run large-scale inference for production AI features.

Using Amazon Nova Lite, Observe.AI achieved a 10–15x increase in AI model consumption over a year, supporting about 90,000 requests per minute and processing around 100 million tokens per minute across regions. Additionally, the company also delivers new AI capabilities 40–50 percent faster while maintaining reliable performance.

About Observe.AI

Observe.AI is an AI agent platform for customer experience, helping enterprises automate customer interactions with natural conversations and predictable outcomes. It combines speech understanding, workflow automation, and governance to support AI agents, copilots, and quality insights at scale.

Opportunity | Sustaining high-volume conversational AI as demand rises

As Observe.AI onboarded customers with larger agent populations, inference request volumes increased significantly across its contact center intelligence platform. Because the platform relies on natural language understanding to analyze customer interactions at scale, maintaining inference performance, consistency, and cost predictability became increasingly critical as deployments expanded across customers and regions.

Observe.AI supports both internally hosted models and external models across its platform, allowing customers to choose the most appropriate model for different AI use cases. As usage expanded and inference volumes increased, managing inference efficiently across multiple model paths required greater operational coordination. Anup Pattnaik, ML Manager, Observe.AI, says, “Managing and scaling our own models took a lot of effort. As we added more use cases, our operational complexity increased and slowed how quickly we could experiment.” As customer deployments grew, Observe.AI sought greater flexibility in how it supported inference across its platform while balancing accuracy, throughput, and cost efficiency.

At the same time, the company needed to accelerate the pace at which it could prototype, test, and roll out new AI-powered features. Supporting higher request volumes and token throughput without increasing operational burden became a key priority, alongside maintaining reliable performance for enterprise customers. To continue scaling its platform efficiently, Observe.AI needed a more streamlined approach to large-scale inference that could support sustained expansion without compromising performance or cost control.

Solution | Establishing large-scale inference for conversational AI

To address the growing demands of large-scale inference while reducing operational complexity, Observe.AI turned to AWS to modernize how it runs AI workloads in production. The company adopted Amazon Bedrock to expand its external model options and simplify how selected AI use cases were developed and deployed in production.

Amazon Bedrock provides access to multiple models through a single API. Within this environment, Observe.AI chose Amazon Nova Lite, a model designed for high-throughput inference, to support its contact center intelligence platform. Amazon Nova Lite offered the balance of performance, cost efficiency, and scalability needed to process large volumes of conversational data consistently. Pattnaik explains, “Using managed models reduced the effort required to run some of our AI workloads. We could focus more on how we applied AI in the product rather than on the underlying infrastructure.”

With this foundation in place, Observe.AI integrated inference through Amazon Bedrock directly into its existing services, reducing the effort required to deploy and operate models across environments. Teams could connect AI capabilities into the platform without rearchitecting core systems, supporting both batch and latency-sensitive use cases. This approach also made it easier to iterate on prompts, workflows, and feature design as requirements evolved.

By establishing inference through Amazon Bedrock with Amazon Nova Lite, Observe.AI introduced a more streamlined approach for running high-volume inference workloads, reducing the operational burden associated with model management and deployment.

Outcome | Supporting a 10–15x increase in AI usage at production scale

With Amazon Nova Lite running through Amazon Bedrock, Observe.AI scaled AI inference to support sustained, high-volume usage across its contact center intelligence platform. The company now supports approximately 90,000 requests per minute and processes around 100 million tokens per minute, meeting its enterprise customers’ throughput requirements across regions.

As customer deployments expanded, Observe.AI experienced a 10–15x increase in AI model usage over roughly one year. This growth required inference performance to remain consistent as volumes rose, without introducing instability or unpredictable operating characteristics. “At the volumes we’re running today, reliability becomes non-negotiable. With Amazon Nova handling inference, our platform sustains high traffic without hindering performance,” Pattnaik says.

In parallel, Observe.AI improved the efficiency of running AI workloads at scale. Teams now deliver new AI capabilities 40–50 percent faster, reducing the time required to move from experimentation into production. “What mattered most was being able to scale confidently. As usage increased, we could focus on expanding AI use cases rather than worrying about whether the platform would keep up,” Pattnaik concludes.

Missing alt text value
At the volumes we’re running today, reliability becomes non-negotiable. With Amazon Nova handling inference, our platform sustains high traffic without hindering performance.

Anup Pattnaik

ML Manager, Observe.AI

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages