Skip to main content
2025

Building a Cost-Effective, AI-Driven Voice Intelligence Solution on AWS with Modulate

Learn how voice technology startup Modulate detects harmful and fraudulent behavior in near real time using AWS serverless technology.

Benefits

50%

reduction in toxicity exposure

300%

return on investment for gaming studios

40

seconds or less to analyze voice chats

2

weeks to market for VoiceVault

Overview

Millions of conversations occur in multiplayer games every day. These communications can enhance the player experience and boost engagement, or, in certain cases, players can be subjected to online harassment and bullying. Voice technology startup Modulate solved this issue by building a voice intelligence solution that can detect harmful speech in near real time.

Delivering on its vision has meant analyzing voice conversations at scale, so Modulate built its foundational voice intelligence engine using serverless technology from Amazon Web Services (AWS). By designing a repeatable, event-driven architecture, Modulate can analyze millions of hours of voice conversations daily without active management of its underlying infrastructure. The company is also broadening its impact with new products beyond moderation, helping financial service institutions and rideshare companies deliver safer experiences for both their users and employees.

About Modulate

Founded in 2019, Modulate is a prosocial voice technology company on a mission to create safer, more respectful experiences for users and employees.

Opportunity | Using AWS Lambda to Build a Voice Intelligence Prototype in 2 Days for Modulate

Modulate was founded at a technological turning point when artificial intelligence (AI) and machine learning (ML) were becoming practical and accurate at scale. The company’s cofounders saw the potential of using AI to innovatively analyze voice conversations, specifically within the online gaming community.

Online communities, particularly video game ones, have always faced challenges in preserving their code of conduct and sense of community—especially in voice chat, which is expensive and difficult to analyze robustly. “One of their biggest issues was that people were harassing one another and creating an environment that was unsafe,” says Carter Huffman, cofounder and chief technology officer at Modulate. But analyzing these conversations was a challenge. Modulate needed to seamlessly scale to massive volumes of traffic and train its AI models to accurately detect harmful speech in complex, dynamic conversations.

Modulate identified a serverless strategy as the key to cost-effective scaling. “We’ve been on AWS since day one,” says Huffman. “This familiarity made it clear that serverless technology would be an important part of our architecture.” So Modulate adopted solutions such as AWS Lambda, which businesses can use to run code without thinking about servers or clusters. In 2 days, the company built its first prototype for ToxMod, a product that is purpose-built for games to help proactively moderate voice chats.

Solution | Reducing Toxicity Exposure for Gaming Studios by up to 50 Percent

ToxMod runs on a modular, event-driven architecture that can automatically process snippets of voice conversations using serverless functions. To detect harmful speech, Modulate pairs the generative AI capabilities of large language models with bespoke audio analysis models. In addition, it uses commercially available technologies, such as Amazon Transcribe, a fully managed, automatic speech recognition service that makes it easy for developers to add speech-to-text capabilities to their applications.

To process incoming requests, Modulate uses Amazon API Gateway, which lets organizations create, maintain, and secure APIs at nearly any scale. The company’s data pipeline is based on custom logic to process jobs in parallel. Modulate automatically queues its jobs using Amazon Simple Queue Service (Amazon SQS), a fully managed message queuing for microservices, distributed systems, and serverless applications. By supporting parallel processing, Modulate can analyze voice snippets and share the results with its end customers—gaming companies—in less than 40 seconds. “Latency is really important to us, and we want to detect harmful speech as it’s happening and stop it as soon as possible.”

After demoing the prototype for ToxMod to several gaming companies, Modulate teamed up with Activision to moderate the voice chats for its Call of Duty players. After deploying ToxMod, Activision reduced toxicity exposure by up to 50 percent in Call of Duty: Modern Warfare III. “That really drove an expansion of what our capabilities are,” says Huffman. “We realized that we could analyze almost any kind of voice conversation to help companies improve the user experience, increasing its effectiveness and safety.”

Modulate then developed a version of ToxMod for rideshare and food delivery companies. The company increased developer agility and accelerated its go-to-market efforts by replicating its serverless architecture. It also gave its developers access to Amazon SageMaker, which companies can use to build, train, and deploy ML and foundation models with fully managed infrastructure, tools, and workflows.

In addition to enhancing the sentiment analysis capabilities of AI, Modulate discovered that it could also use the technology to identify fraudulent behavior. In 2 weeks, Modulate developed VoiceVault, a voice analysis solution that detects attempts at scams and fraud for financial institution, insurance, delivery, and contact center settings. “We can tune our system for different use cases and redeploy the same architecture and services for our different products with relatively little development work,” says Huffman.

Outcome | Delivering Cost-Effective Voice Intelligence Solutions to More Industries

By going serverless, Modulate can deliver its products to customers under a pay-as-you-go model. “We built this architecture on AWS to analyze voice 10–100 times more cost effectively than what we could do with off-the-shelf tools,” says Huffman. And because Modulate’s products can enhance environment safety for users, ToxMod improves retention and engagement—helping gaming studios unlock a 300 percent return on investment.

Modulate will continue to innovate and identify new use cases for intelligent voice moderation, such as detecting synthetic voices. “As voice becomes a medium for interacting not only with other people but also with technology, voice conversations are growing in importance,” says Huffman. “Expect to see a lot more products from us to triage voice interactions intelligently, all powered by our voice engine and serverless architecture on AWS.”

Figure 1. Modulate’s serverless architecture

Missing alt text value
We built this architecture on AWS to analyze voice 10–100 times more cost effectively than what we could do with off-the-shelf tools.

Carter Huffman

Cofounder and Chief Technology Officer, Modulate

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages.