Make your web apps hands-free with Amazon Nova Sonic

Graphical user interfaces have carried the torch for decades, but today’s users increasingly expect to talk to their applications. Amazon Nova Sonic is a state-of-the-art foundation model from Amazon Bedrock, that helps enable this shift by providing natural, low-latency, bidirectional speech conversations over a simple streaming API. Users can collaborate with the applications through voice and embedded intelligence rather than merely operating them.

In this post we show how we added a true voice-first experience to a reference application—the Smart Todo App—turning routine task management into a fluid, hands-free conversation.

Rethinking user interaction through collaborative AI voice agents

Important usability enhancements are often deprioritized—not because they aren’t valuable, but because they’re difficult to implement within traditional mouse-and-keyboard interfaces. Features like intelligent batch actions, personalized workflows, or voice-guided assistance are frequently debated but deferred due to UI complexity. This is about voice as an additional, general-purpose interaction mode—not a replacement for device-specific controls or an accessibility-only solution. Voice enables new interaction patterns, it also benefits users of assistive technologies, such as screen readers, by offering an additional, inclusive way to interact with the application.

Amazon Nova Sonic goes far beyond one-shot voice commands. The model can plan multistep workflows, call backend tools, and keep context across turns so that your application can collaborate with the users.

The following table shows voice interactions from different application domains, like task management, CRM, and help desk.

Voice interaction (example phrase)	Intent / goal	System action / behavior	Confirmation / UX
Mark all my tasks as complete.	Bulk-complete tasks	Find user’s open tasks → mark complete → archive if configured	All 12 open tasks are marked complete.
Create a plan for preparing the Q3 budget: break it into steps, assign owners, and set deadlines.	Create multistep workflow	Generate plan → create tasks → assign owners → set deadlines → surface review options	Plan created with 6 tasks. Notify owners?
Find enterprise leads in APAC with ARR over $1M and draft personalized outreach.	Build targeted prospect list and draft outreach	Query CRM → assemble filtered list → draft personalized messages for review	Drafted 24 personalized outreach messages. Review and send?
Prioritize all P1 tickets opened in the last 24 hours and assign them to on-call.	Triage and assign	Filter tickets → set priority → assign to on-call → log changes	12 P1 tickets prioritized and assigned to the on-call team.

Amazon Nova Sonic understands the intent, invokes the required APIs, and confirms the results—no forms required. This helps to create an environment where productivity is multiplied, and context becomes the interface. It’s not about replacing traditional UI, it’s about unlocking new capabilities through voice.

The sample application at a glance

With the Smart Todo reference application, users can create to-do lists and manage notes within those lists. The application offers a focused yet flexible interface for task tracking and note organization. With the addition of voice, the application becomes a hands-free experience that unlocks more natural and productive interactions. In Smart Todo App, users can say:

“Add a note to follow up on the project charter.”
“Archive all completed tasks.”

Behind each command are focused actions—like creating a new note, organizing content, or updating task status—executed through speech in a way that feels natural and efficient.

How Amazon Nova Sonic bidirectional APIs work

Amazon Nova Sonic implements a real-time, bidirectional streaming architecture. After a session is initiated with InvokeModelWithBidirectionalStream, audio input and model responses flow simultaneously over an open stream:

Session Start – Client sends a sessionStart event with model configuration (for example, temperature and topP).
Prompt and Content Start – Client sends structured events indicating whether upcoming data is audio, text, or tool input.
Audio Streaming – Microphone audio is streamed as base64-encoded audio input events.
Model Responses – As the model processes input, it streams the following responses asynchronously:
- Automatic speech recognition (ASR) results
- Tool use invocations
- Text responses
- Audio output for playback
Session Close – Conversations are explicitly closed by sending contentEnd, promptEnd, and sessionEnd events.

Nova Sonic Architecture Diagram

You can use this event-driven approach to interrupt the assistant (barge-in), enable multi-turn conversations, and support real-time adaptability.

Solution architecture

For this solution, we use a serverless application architecture pattern, where the UI is a React single page application. The React single page application is integrated with backend web APIs running on server-side containers. The Smart Todo App is deployed using a scalable and security-aware AWS architecture that’s designed to support real-time voice interactions. The following image provides an architecture overview of AWS services working together to support bidirectional streaming needs of a voice enabled application.

Key AWS services include:

Amazon Bedrock – Powers real-time, bidirectional speech interactions through the Amazon Nova Sonic foundation model.
Amazon CloudFront – A content delivery network (CDN) that distributes the application globally with low latency. It routes /(root) traffic to the React application hosted on an Amazon S3 bucket and /api and /novasonic traffic to the Application Load Balancer.
AWS Fargate for Amazon Amazon Elastic Container Service (Amazon ECS) – Runs the backend containerized services for WebSocket handling and REST APIs capable of supporting long lived bidirectional streams.
Application Load Balancer (ALB) – Forwards web traffic /api (HTTPS REST API calls) to backend ECS services, handling Smart Todo App APIs, and /novasonic (WebSocket connections) to ECS services managing real-time voice streaming with Amazon Nova Sonic.
Amazon Virtual Private Cloud (Amazon VPC) – Provides network isolation and security for backend services. The Public Subnets host the Application Load Balancer (ALB) and Private Subnets host ECS Fargate tasks running WebSocket and REST APIs.
NAT Gateway allows Amazon ECS tasks in private subnets to more securely connect to the internet for operations like Cognito JWT token verification endpoints.
Amazon Simple Storage Service (Amazon S3) –Hosts React frontend for user interactions
AWS WAF – Helps protect the Application Load Balancer (ALB) from malicious traffic and enforces security rules at the application layer.
Amazon Cognito – Manages authentication and issues tokens.
Amazon DynamoDB – Stores application data such as to-do lists and notes.

The following image illustrates how the user requests are served with support for low-latency bidirectional streaming.

Request Workflow

Deploying the solution

To evaluate this solution, we provided sample code of a Smart Todo App available at GitHub repository.

Smart Todo App consists of multiple independent Node.js projects, including a CDK infrastructure project, a React frontend application, and backend API services. The deployment workflow makes sure that the components are correctly built and integrated with AWS services like Amazon Cognito, Amazon DynamoDB, and Amazon Bedrock.

Prerequisites

AWS account with appropriate permissions that facilitate security best practices, including least-privilege permissions.
Docker Engine installed locally and running to build container image locally.
AWS CLI configured with AWS admin credentials.
Node.js >= 20.x and npm installed.
Amazon Nova Sonic enabled in Amazon Bedrock. For more information, see Add or remove access to Amazon Bedrock foundation models.

Deployment steps

Clone the following repository:

git clone https://github.com/aws-samples/sample-amazon-q-developer-vibe-coded-projects.git
cd NovaSonicVoiceAssistant

For first-time deployment, use the following automated script:

npm run deploy:first-time

This script will:

Install the dependencies using npm (node package manager)
Build the components and container image using locally installed docker engine
Deploy the infrastructure using CDK (CDK BootStrap ==> CDK Synth ==> CDK Deploy)
Update environment variables with Amazon Cognito settings
Rebuild the UI with updated environment variables
Deploy the final infrastructure (CDK Deploy)

Verifying deployment

After deployment is successful, complete the following steps:

Access the Amazon CloudFront URL provided in the CDK outputs.
Note: The URL shown in the image is for reference only, every deployment will get a unique URL.

Successful deployment screen shot
Create a new user by signing up using the Create Account section.

Create User and Log in
Test the voice functionality to verify the integration with Amazon Nova Sonic. The following image illustrates a conversation between the signed-in user and the Amazon Bedrock agent. The AI agent is able to invoke existing APIs, and the UI is updated in real time to reflect agent’s actions.

Granting Microphone access to the application

Voice interaction in Smart Todo App

Clean up

You can remove the stacks with the following command.

# move to the infra folder, assuming you are in the project’s root folder
cd infra
# Removes the AWS stack
npm run destroy

Next steps

Voice isn’t just an accessibility add-on—it’s becoming the primary interface for complex workflows.
Turns out talking is faster than selecting—especially when your app talks back.

Try these resources to get started.

Sample Code repo – A working Amazon Nova Sonic integration
you can run locally. See how real-time voice interactions, intent handling, and multistep flows are
implemented end to end.
Amazon Nova Sonic hands-on workshop – A guided lab that walks you
through deploying Amazon Nova Sonic in your AWS account and testing voice-native features.
Amazon Nova Sonic docs – Provides API reference, streaming examples, and best
practices to help you design and deploy voice-driven workflows.
Contact your AWS account team to learn more about how AI-driven solutions can transform your operations.

About the authors

Manu Mishra is a Senior Solutions Architect at AWS, specializing in artificial intelligence, data and analytics, and security. His expertise spans strategic oversight and hands-on technical leadership, where he reviews and guides the work of both internal and external customers. Manu collaborates with AWS customers to shape technical strategies that drive impactful business outcomes, providing alignment between technology and organizational goals.

AK Soni is a Senior Technical Account Manager with AWS Enterprise Support, where he empowers enterprise customers to achieve their business goals by offering proactive guidance on implementing innovative cloud and AI/ML-based solutions aligned with industry best practices. With over 19 years of experience in enterprise application architecture and development, he uses his expertise in generative AI technologies to enhance business operations and overcome existing technological limitations.

Raj Bagwe is a Senior Solutions Architect at Amazon Web Services, based in San Francisco, California. With over 6 years at AWS, he helps customers navigate complex technological challenges and specializes in Cloud Architecture, Security and Migrations. In his spare time, he coaches a robotics team and plays volleyball. He can be reached at X handle @rajesh_bagwe.

Artificial Intelligence

Make your web apps hands-free with Amazon Nova Sonic

Rethinking user interaction through collaborative AI voice agents

The sample application at a glance

How Amazon Nova Sonic bidirectional APIs work

Solution architecture

Deploying the solution

Prerequisites

Deployment steps

Verifying deployment

Clean up

Next steps

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help