Artificial Intelligence

Building AI-Powered Voice Applications: Amazon Nova Sonic Telephony Integration Guide

Organizations are increasingly seeking to enhance customer experiences through natural, responsive voice interactions across their telephony systems. Amazon Nova Sonic addresses this need as a speech-to-speech generative AI model that delivers real-time voice conversations with low latency and natural turn-taking. It understands speech across different accents and speaking styles, responds with expressive voices in multiple languages, and handles interruptions gracefully. Available through the Amazon Bedrock bidirectional streaming API, Nova Sonic can connect to your business data and external tools and can be integrated directly with telephony systems.

The speech modality makes Amazon Nova Sonic naturally well-suited for telephony applications where preserving conversational nuances and minimizing latency are critical. Nova Sonic is ideal for use cases like automated call centers that need human-like interactions, proactive phone call outreach campaigns, and AI receptionist use cases.

To integrate Amazon Nova Sonic with your telephony architecture, you will need an application server to connect and maintain a persistent bidirectional streaming connection to Nova Sonic. This post will introduce sample implementations for the most common telephony scenarios: Direct Session Initiation Protocol (SIP) integration with traditional phone infrastructure, direct integration with telephony providers like Vonage, Twilio, and Genesys, and open source frameworks for building telephony applications, like Pipecat and LiveKit. These approaches cover the spectrum from legacy PBX systems to modern cloud communications, giving you multiple paths to connect Nova Sonic with phone networks.

Common Amazon Nova Sonic telephony use cases

Nova Sonic can be used for these common telephony use cases:

  • Call center operations: Amazon Nova Sonic can handle customer service calls, technical support inquiries, and routine transactions through natural conversation, operating as the primary agent for inbound calls. It can also replace traditional IVR systems so customers can describe their needs instead of navigating phone menus. For high-volume periods, it can manage overflow calls and escalates complex issues to human agents with full conversation summaries.
  • Receptionist and outreach functions: Amazon Nova Sonic can connect to company systems like CRMs and calendars to handle scheduling, answer company questions, and route calls based on conversation content. For outbound use cases, it can conduct appointment reminders with rescheduling capabilities, follow-up calls for feedback collection, and survey campaigns. The speech-to-speech design maintains natural conversation flow while accessing real-time data to personalize interactions based on customer history.

Amazon Nova Sonic SIP integrations

Integrating Amazon Nova Sonic with Session Initiation Protocol (SIP) infrastructure requires an application server that serves as an intermediary layer. This server manages both SIP signaling and Real-time Transport Protocol (RTP) media streams, while maintaining the connection to the Nova Sonic bidirectional streaming API. The server bridges your existing telephony infrastructure with Nova Sonic to handle call session management and audio routing between both systems.

There are two sample implementations: a Java-based SIP gateway using the mjSIP stack and AWS SDK for Java, and a JavaScript SIP server using Node.js with SIP.js and the AWS SDK for JavaScript. Both samples demonstrate the same core architecture with language-specific implementations.

The core components include a SIP stack for call control signaling, an RTP handler for audio stream processing, and an Amazon Nova Sonic client that maintains persistent connections to Amazon Bedrock. When an inbound call arrives, the SIP Server answers via SIP, establishes RTP media sessions, and creates a corresponding Sonic streaming session. Audio flows bidirectionally:

  • RTP packets from the caller are decoded, converted to the appropriate audio format, and streamed to Nova Sonic
  • The Nova Sonic audio responses are encoded and transmitted back via RTP

For deployment, you can run the SIP Servers on Amazon Elastic Compute Cloud (Amazon EC2) instances with proper security group configuration for SIP signaling (port 5060) and RTP media streams (typically ports 10000-20000), or deploy containerized using Amazon Elastic Container Service (Amazon ECS) with host networking mode to access the required UDP port ranges. Both approaches:

  • Require IAM permissions for Amazon Bedrock access and proper credential management.
  • Support seamless integration with PBX systems, VoIP providers (like Vonage), or traditional telephony networks when you configure your existing telephony infrastructure to route calls to the gateway’s public endpoint

Integrations with telephony providers

Cloud telephony providers like Vonage, Twilio, Genesys, and Amazon Connect offer managed voice services that handle the complexity of traditional telephony infrastructure through simple APIs. Unlike direct SIP integration, these providers abstract the underlying protocols and offer features like global phone number provisioning, automatic failover, call analytics, and compliance capabilities.

Vonage

Vonage is a cloud communications platform that provides voice, messaging, and video APIs for businesses. An Amazon Nova Sonic integration with Vonage was announced in July 2025, providing a direct path to connect phone calls to conversational AI through the Vonage Voice API. With this integration businesses can deploy real-time voice agents across telephony channels without managing complex telephony infrastructure, as Vonage handles call routing, audio streaming, and protocol translation. The integration works by configuring Vonage webhooks that trigger when calls are received or initiated. Your application server receives these webhook events, establishes a Nova Sonic streaming session, and creates a bidirectional audio bridge between the Vonage call and Nova Sonic. Vonage manages the telephony complexities including codec conversion and network transport, while your server handles the AI conversation flow and connects to your business systems and data sources.

For detailed implementation guidance, see the Deploy conversational agents with Vonage and Amazon Nova Sonic blog post and the sample implementation in the aws-samples GitHub repository.

Twilio

Twilio is a cloud-based customer engagement platform that offers voice, SMS, email, and video capabilities. It provides APIs and SDKs for developers to build custom communication solutions, automate messaging, and implement real-time notifications. This platform serves as the foundation for businesses to create and manage their customer communications efficiently. Twilio integrates with AWS to combine communication expertise with cloud infrastructure and AI capabilities. The integration works through webhook-based event processing, real-time media streaming via WebSocket connections. When calls are received or initiated, Twilio webhooks trigger events that the customer’s application server receives. The server then establishes an Amazon Nova Sonic streaming session and creates a media streaming connection for real-time audio processing between Twilio calls and the application server. Twilio handles communication complexities like codec conversion and network transport, while Sonic handles the natural language conversation. This integration enables businesses to deploy AI-powered voice agents, implement predictive analytics, and create personalized customer experiences using comprehensive customer data across both Twilio and AWS.

For detailed implementation guidance, see the Deploy conversational agents with Vonage and Amazon Nova Sonic blog post and the sample implementation in the aws-samples GitHub repository.

Genesys

Genesys is a cloud-based customer experience orchestration platform, providing contact center and customer engagement solutions with omnichannel routing, workforce optimization, and AI-powered analytics. Genesys integrates with Amazon Nova Sonic through the Genesys Cloud platform APIs and the Amazon Bedrock integration available on the Genesys AppFoundry, where incoming calls trigger routing decisions that can direct conversations to Sonic-powered virtual agents. Your application server receives call events from Genesys Cloud, establishes a Nova Sonic streaming session, and creates a bidirectional audio bridge between the Genesys call and Nova Sonic. Genesys handles the contact center complexities including call routing, queue management, and agent orchestration, while your server manages the AI conversation flow and connects to business systems, with seamless transfers to live agents while maintaining complete conversation context and full visibility through Genesys’ reporting dashboards.

For detailed implementation guidance, see the Amazon Nova Sonic Connector on the Genesys AppFoundry.

Integrations with open source frameworks

Open source frameworks like Pipecat and LiveKit provide developers with powerful, community-supported tools that can significantly accelerate the development of conversational AI applications when integrated with Amazon Nova Sonic. These frameworks offer pre-built components, standardized interfaces, and abstraction layers that handle many of the technical complexities involved in building voice-enabled experiences. By using these integrations teams can focus on creating distinctive conversational experiences rather than reinventing fundamental infrastructure components.

Pipecat

Pipecat is an open source python framework designed to simplify the creation of intelligent conversational agents across various channels, including voice and text. It addresses the complexities of developing AI-powered communication systems providing developers with a unified framework for designing and managing conversational experiences. Pipecat supports flexible pipeline architecture which represents the flow of data and processing steps that transform user inputs into intelligent responses.It also offers seamless integration with advanced speech-to-speech models to enable high-quality voice interactions, including with Amazon Nova Sonic. The Sonic-Pipecat integration establishes a bidirectional audio streaming channel that handles all aspects of voice-based interactions. When a call arrives, Pipecat streams the audio directly to Nova Sonic, which processes the speech and generates voice responses in real-time. Pipecat manages the audio transport, buffering, and connection handling, while Nova Sonic handles the voice intelligence. The technical complexities happen automatically behind the scenes, letting developers focus on designing great conversations rather than managing infrastructure.

For detailed guidance, please refer to the blog posts Building intelligent AI voice agents with Pipecat and Amazon Bedrock Part 1 and Part 2 blog posts.

LiveKit

LiveKit is an open source platform for building real-time audio and video applications that provides developers with WebRTC infrastructure and APIs for creating interactive communication experiences with scalable, low-latency media streaming capabilities. With the Amazon Nova Sonic and LiveKit integration developers can build sophisticated conversational AI applications where LiveKit manages the real-time audio streaming and participant connections while Sonic handles the AI-powered conversation processing. This combination supports seamless voice-based interactions where LiveKit streams audio to Nova Sonic for processing, receives the AI-generated responses, and delivers them back to participants with minimal latency. The integration supports multi-party conversations and can scale to handle concurrent voice sessions, making it suitable for applications like virtual meetings with AI assistants and call center use cases.

For detailed implementation guidance, see the Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit blog post.

Clean up

To avoid incurring ongoing charges after implementing your Amazon Nova Sonic telephony solution, remember to delete all resources you created:

  • Terminate any EC2 instances used for hosting SIP Servers or application servers
  • Delete ECS tasks and services if you deployed containerized applications
  • Remove IAM permissions created specifically for this integration
  • Delete test phone numbers and configurations from telephony providers (Vonage, Twilio, Genesys)
  • Clean up any deployed sample applications from the aws-samples GitHub repositories

The specific resources to clean up will depend on your chosen integration approach. Always verify through your AWS Billing Dashboard that you’ve successfully removed all billable resources.

Conclusion

The speech-to-speech capabilities of Amazon Nova Sonic open new possibilities for building natural, responsive voice applications across diverse telephony architectures. Whether you’re working with legacy SIP infrastructure, modern cloud telephony providers, or open source frameworks, the integration paths covered in this guide provide flexible options to match your technical requirements and organizational constraints. The direct SIP integration approach gives you maximum control and works seamlessly with existing PBX systems and traditional telephony networks. Cloud telephony providers like Vonage, Twilio, Genesys, and Amazon Connect offer managed services that abstract infrastructure complexity while providing enterprise-grade reliability and global reach. Open source frameworks like Pipecat and LiveKit accelerate development by providing pre-built components and standardized interfaces for conversational AI applications. Each integration approach has its strengths: SIP integration for direct control and legacy compatibility, cloud providers for managed infrastructure and rapid deployment, and open-source frameworks for development velocity and community support. By understanding these options, you can select the path that best aligns with your use case, existing infrastructure, and team capabilities. To get started, explore the sample implementations linked throughout this guide, experiment with the integration approach that fits your needs, and use the low-latency, multilingual capabilities of Amazon Nova Sonic to create voice experiences that feel truly conversational. As you build, remember that these integration patterns can be combined and customized to meet your specific requirements. For your reference, here are key resources to help you get started with Amazon Nova Sonic:


About the authors

Reilly Manton is a Solutions Architect in AWS Telecoms specializing in AI & ML. He builds innovative AI solutions for customers, with a particular focus on speech-to-speech generative AI that enables more natural and intuitive human-machine interactions.

Dexter Doyle is a Senior Solutions Architect at Amazon Web Services, where he guides customers in designing secure, efficient, and high-quality cloud architectures. A lifelong music enthusiast, he loves helping customers unlock new possibilities with AWS services, with a particular focus on audio workflows.

Madhavi Evana is a Solutions Architect at Amazon Web Services (AWS), where she guides Enterprise customers through their cloud transformation journeys. She specializes in Artificial Intelligence and Machine Learning, with focus in Speech-to-speech translation and synthesis, and Natural Language Processing (NLP) technologies.

Kalindi Vijesh Parekh is a Solutions Architect at Amazon Web Services. As a Solutions Architect, she combines her expertise in analytics and data streaming with a commitment to helping customers realize their AWS potential.