AWS›
Solutions›
Case Studies

2026

Murf AI builds a low latency multilingual text-to-speech model for voice agents

Learn how Murf AI, a voice AI platform, delivers fast, cost-efficient real-time speech using Amazon EC2 G5 and G6 instances.

Benefits

55ms

latency, from 200-300ms

50%

lower costs per minute for calls

99.4%

pronunciation accuracy

3,000+

hours of uninterrupted model training

Overview

Developers and enterprises required a text-to-speech (TTS) system that could support natural, real-time voice interactions with low latency, high accuracy, and multilingual capability. To address this, Murf AI built Falcon—its next-generation TTS model—on Amazon Web Services (AWS) to provide fast, reliable performance across regions. Running Falcon on AWS enables Murf to achieve 55 ms model latency, 130 ms time-to-first-audio, 99.4% pronunciation accuracy, and scale to 10,000 concurrent calls—at a cost of just one cent per minute.

About Murf AI

Murf AI is a voice AI company offering natural text-to-speech, multilingual voice generation, and real-time conversational capabilities for developers and enterprises worldwide.

Opportunity | Responding to rising performance demands in enterprise voice AI

As Murf expanded its voice AI platform to support real-time conversational use cases, the team encountered constraints common across the industry: production latency often rose to 300–500 ms, pronunciation errors weakened user trust, and high inference costs made large-scale deployments difficult to sustain. These gaps prevented many developers and mid-market companies from delivering the natural, sub-second experiences users expect. The challenge grew across India and other multilingual markets, where mixed-language speech required accuracy and fluid switching that mainstream TTS systems could not reliably provide. Regulated customers also needed strict data residency and consistent performance across regions—capabilities that existing providers struggled to meet.

“If a voice bot is slow or mispronounces something, the conversation breaks instantly,” says Ankur Edkie, co-founder and CEO at Murf AI. Murf saw an opportunity to remove these trade-offs entirely by designing a model capable of delivering low latency, high accuracy, and cost-efficient performance at scale.

Solution | Building a real-time voice AI architecture on AWS

Murf built Falcon—a real-time text-to-speech (TTS) model engineered for natural, low-latency, multilingual voice interactions—entirely on AWS. Falcon powers applications such as customer service voice bots, training assistants, and other conversational experiences where sub-second responsiveness and accuracy are essential.

To develop the model, Murf trained large deep learning workloads on Amazon Elastic Compute Cloud (Amazon EC2), using P4d, P5, P5en, P6-B200 instances for training, and G5 and G6 instances for inference. Amazon EC2 Capacity Blocks for ML provided guaranteed GPU availability for scheduled cluster runs, enabling more than 3,000 hours of continuous training without risk of interruption. “Finding guaranteed GPU capacity exactly when we need it is crucial. Amazon EC2 Capacity Blocks let us plan complex training runs with confidence,” says Edkie.

For high-throughput access to multilingual datasets, Murf used Amazon FSx for Lustre, while Amazon Simple Storage Service (Amazon S3) handled checkpointing and version control throughout experimentation. This architecture eliminated the storage limitations and performance bottlenecks the team previously faced.

Once the model was trained, Murf deployed Falcon globally on Amazon EC2 instances to achieve real-time inference with the performance-to-cost efficiency required at scale. AWS global infrastructure delivered consistent latency across 11 Regions and met strict data residency requirements for regulated industries. Amazon Route 53 and Elastic Load Balancing provided low-hop, low-latency routing across geographies. Murf also offers Falcon as an on-premises or customer-VPC deployment using Amazon Elastic Container Registry (Amazon ECR) for customers requiring full control of their environments.

“AWS gives us consistent performance across regions, something we simply didn’t see with other providers. That reliability is essential for real-time voice AI,” Edkie says.

Outcome | Achieving 55 ms model latency, 99.4% accuracy, and global scale

Running Falcon on AWS allows Murf to deliver the level of responsiveness required for real-time voice interactions. The model achieves 55 ms model latency and a globally measured 130 ms TTFA—performance that makes it 44 percent faster than the next closest alternative. “We’re in a competitive market, and we need to consistently be the fastest while scaling on demand. AWS supports us in doing that,” says Edkie.

This performance is matched by accuracy. Falcon reaches 99.4 percent pronunciation accuracy, a threshold that directly shapes how users perceive and trust voice agents. “If a bot mispronounces a name or location, the experience breaks instantly. Accuracy is one of the most important metrics for us,” Edkie adds.

The model’s multilingual capabilities have also expanded significantly. Murf now serves over 150 voices across 35+ languages, with smooth mid-sentence language switching that mirrors real patterns of mixed-language speech. By running Falcon across 11 AWS regions, Murf can offer this quality consistently while meeting strict data residency requirements for customers in regulated industries such as healthcare, education, and customer service.

Scalability has improved as well. Falcon scales to support up to 10,000 concurrent calls while operating at one cent per minute, delivering real-time voice applications at 50 percent lower cost than comparable solutions.

Looking ahead, Murf expects to deepen its use of AWS as it builds new model families. “We’re training models in other domains, and AWS gives us the scalable, cost-effective infrastructure to keep moving quickly,” says Edkie.

AWS gives us consistent performance across regions, something we simply didn’t see with other providers. That reliability is essential for real-time voice AI.

Ankur Edkie

Co-founder and CEO, Murf AI

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.

Learn more

Amazon EC2 Capacity Blocks for ML

Reserve accelerated compute instances in Amazon EC2 UltraClusters to run your ML workloads.

Learn more

Amazon FSx for Lustre

Accelerate AI, ML, and HPC workloads with the fastest storage for GPU instances in the cloud.

Learn more

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.

Learn more

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.

Contact Sales

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Murf AI builds a low latency multilingual text-to-speech model for voice agents

Benefits

Overview

About Murf AI

Opportunity | Responding to rising performance demands in enterprise voice AI

Solution | Building a real-time voice AI architecture on AWS

Outcome | Achieving 55 ms model latency, 99.4% accuracy, and global scale

AWS Services Used

Amazon EC2

Amazon EC2 Capacity Blocks for ML

Amazon FSx for Lustre

Amazon S3

Get Started

Did you find what you were looking for today?

Learn

Resources

Developers

Help