Artificial Intelligence
Make your web apps hands-free with Amazon Nova Sonic
Graphical user interfaces have carried the torch for decades, but today’s users increasingly expect to talk to their applications. Amazon Nova Sonic is a state-of-the-art foundation model from Amazon Bedrock, that helps enable this shift by providing natural, low-latency, bidirectional speech conversations over a simple streaming API. Users can collaborate with the applications through voice and embedded intelligence rather than merely operating them.
In this post we show how we added a true voice-first experience to a reference application—the Smart Todo App—turning routine task management into a fluid, hands-free conversation.
Rethinking user interaction through collaborative AI voice agents
Important usability enhancements are often deprioritized—not because they aren’t valuable, but because they’re difficult to implement within traditional mouse-and-keyboard interfaces. Features like intelligent batch actions, personalized workflows, or voice-guided assistance are frequently debated but deferred due to UI complexity. This is about voice as an additional, general-purpose interaction mode—not a replacement for device-specific controls or an accessibility-only solution. Voice enables new interaction patterns, it also benefits users of assistive technologies, such as screen readers, by offering an additional, inclusive way to interact with the application.
Amazon Nova Sonic goes far beyond one-shot voice commands. The model can plan multistep workflows, call backend tools, and keep context across turns so that your application can collaborate with the users.
The following table shows voice interactions from different application domains, like task management, CRM, and help desk.
| Voice interaction (example phrase) | Intent / goal | System action / behavior | Confirmation / UX |
|---|---|---|---|
| Mark all my tasks as complete. | Bulk-complete tasks | Find user’s open tasks → mark complete → archive if configured |
All 12 open tasks are marked complete. |
| Create a plan for preparing the Q3 budget: break it into steps, assign owners, and set deadlines. |
Create multistep workflow | Generate plan → create tasks → assign owners → set deadlines → surface review options |
Plan created with 6 tasks. Notify owners? |
| Find enterprise leads in APAC with ARR over $1M and draft personalized outreach. |
Build targeted prospect list and draft outreach |
Query CRM → assemble filtered list → draft personalized messages for review |
Drafted 24 personalized outreach messages. Review and send? |
| Prioritize all P1 tickets opened in the last 24 hours and assign them to on-call. |
Triage and assign | Filter tickets → set priority → assign to on-call → log changes |
12 P1 tickets prioritized and assigned to the on-call team. |
Amazon Nova Sonic understands the intent, invokes the required APIs, and confirms the results—no forms required. This helps to create an environment where productivity is multiplied, and context becomes the interface. It’s not about replacing traditional UI, it’s about unlocking new capabilities through voice.
The sample application at a glance
With the Smart Todo reference application, users can create to-do lists and manage notes within those lists. The application offers a focused yet flexible interface for task tracking and note organization. With the addition of voice, the application becomes a hands-free experience that unlocks more natural and productive interactions. In Smart Todo App, users can say:
- “Add a note to follow up on the project charter.”
- “Archive all completed tasks.”
Behind each command are focused actions—like creating a new note, organizing content, or updating task status—executed through speech in a way that feels natural and efficient.
How Amazon Nova Sonic bidirectional APIs work
Amazon Nova Sonic implements a real-time, bidirectional streaming architecture. After a session is initiated with InvokeModelWithBidirectionalStream, audio input and model responses flow simultaneously over an open stream:
- Session Start – Client sends a
sessionStartevent with model configuration (for example, temperature and topP). - Prompt and Content Start – Client sends structured events indicating whether upcoming data is audio, text, or tool input.
- Audio Streaming – Microphone audio is streamed as base64-encoded audio input events.
- Model Responses – As the model processes input, it streams the following responses asynchronously:
- Automatic speech recognition (ASR) results
- Tool use invocations
- Text responses
- Audio output for playback
- Session Close – Conversations are explicitly closed by sending
contentEnd,promptEnd, andsessionEndevents.
Nova Sonic Architecture Diagram
You can use this event-driven approach to interrupt the assistant (barge-in), enable multi-turn conversations, and support real-time adaptability.
Solution architecture
For this solution, we use a serverless application architecture pattern, where the UI is a React single page application. The React single page application is integrated with backend web APIs running on server-side containers. The Smart Todo App is deployed using a scalable and security-aware AWS architecture that’s designed to support real-time voice interactions. The following image provides an architecture overview of AWS services working together to support bidirectional streaming needs of a voice enabled application.

Key AWS services include:
- Amazon Bedrock – Powers real-time, bidirectional speech interactions through the Amazon Nova Sonic foundation model.
- Amazon CloudFront – A content delivery network (CDN) that distributes the application globally with low latency. It routes /(root) traffic to the React application hosted on an Amazon S3 bucket and
/apiand/novasonictraffic to the Application Load Balancer. - AWS Fargate for Amazon Amazon Elastic Container Service (Amazon ECS) – Runs the backend containerized services for WebSocket handling and REST APIs capable of supporting long lived bidirectional streams.
- Application Load Balancer (ALB) – Forwards web traffic
/api(HTTPS REST API calls) to backend ECS services, handling Smart Todo App APIs, and/novasonic(WebSocket connections) to ECS services managing real-time voice streaming with Amazon Nova Sonic. - Amazon Virtual Private Cloud (Amazon VPC) – Provides network isolation and security for backend services. The Public Subnets host the Application Load Balancer (ALB) and Private Subnets host ECS Fargate tasks running WebSocket and REST APIs.
- NAT Gateway allows Amazon ECS tasks in private subnets to more securely connect to the internet for operations like Cognito JWT token verification endpoints.
- Amazon Simple Storage Service (Amazon S3) –Hosts React frontend for user interactions
- AWS WAF – Helps protect the Application Load Balancer (ALB) from malicious traffic and enforces security rules at the application layer.
- Amazon Cognito – Manages authentication and issues tokens.
- Amazon DynamoDB – Stores application data such as to-do lists and notes.
The following image illustrates how the user requests are served with support for low-latency bidirectional streaming.
Request Workflow
Deploying the solution
To evaluate this solution, we provided sample code of a Smart Todo App available at GitHub repository.
Smart Todo App consists of multiple independent Node.js projects, including a CDK infrastructure project, a React frontend application, and backend API services. The deployment workflow makes sure that the components are correctly built and integrated with AWS services like Amazon Cognito, Amazon DynamoDB, and Amazon Bedrock.
Prerequisites
- AWS account with appropriate permissions that facilitate security best practices, including least-privilege permissions.
- Docker Engine installed locally and running to build container image locally.
- AWS CLI configured with AWS admin credentials.
- Node.js >= 20.x and npm installed.
- Amazon Nova Sonic enabled in Amazon Bedrock. For more information, see Add or remove access to Amazon Bedrock foundation models.
Deployment steps
- Clone the following repository:
- For first-time deployment, use the following automated script:
This script will:
- Install the dependencies using npm (node package manager)
- Build the components and container image using locally installed docker engine
- Deploy the infrastructure using CDK (CDK BootStrap ==> CDK Synth ==> CDK Deploy)
- Update environment variables with Amazon Cognito settings
- Rebuild the UI with updated environment variables
- Deploy the final infrastructure (CDK Deploy)
Verifying deployment
After deployment is successful, complete the following steps:
- Access the Amazon CloudFront URL provided in the CDK outputs.
Note: The URL shown in the image is for reference only, every deployment will get a unique URL.
Successful deployment screen shot
- Create a new user by signing up using the Create Account section.
Create User and Log in
- Test the voice functionality to verify the integration with Amazon Nova Sonic. The following image illustrates a conversation between the signed-in user and the Amazon Bedrock agent. The AI agent is able to invoke existing APIs, and the UI is updated in real time to reflect agent’s actions.
Granting Microphone access to the application
Voice interaction in Smart Todo App
Clean up
You can remove the stacks with the following command.
Next steps
Voice isn’t just an accessibility add-on—it’s becoming the primary interface for complex workflows.
Turns out talking is faster than selecting—especially when your app talks back.
Try these resources to get started.
- Sample Code repo – A working Amazon Nova Sonic integration
you can run locally. See how real-time voice interactions, intent handling, and multistep flows are
implemented end to end. - Amazon Nova Sonic hands-on workshop – A guided lab that walks you
through deploying Amazon Nova Sonic in your AWS account and testing voice-native features. - Amazon Nova Sonic docs – Provides API reference, streaming examples, and best
practices to help you design and deploy voice-driven workflows. - Contact your AWS account team to learn more about how AI-driven solutions can transform your operations.
About the authors
Manu Mishra is a Senior Solutions Architect at AWS, specializing in artificial intelligence, data and analytics, and security. His expertise spans strategic oversight and hands-on technical leadership, where he reviews and guides the work of both internal and external customers. Manu collaborates with AWS customers to shape technical strategies that drive impactful business outcomes, providing alignment between technology and organizational goals.
AK Soni is a Senior Technical Account Manager with AWS Enterprise Support, where he empowers enterprise customers to achieve their business goals by offering proactive guidance on implementing innovative cloud and AI/ML-based solutions aligned with industry best practices. With over 19 years of experience in enterprise application architecture and development, he uses his expertise in generative AI technologies to enhance business operations and overcome existing technological limitations.
Raj Bagwe is a Senior Solutions Architect at Amazon Web Services, based in San Francisco, California. With over 6 years at AWS, he helps customers navigate complex technological challenges and specializes in Cloud Architecture, Security and Migrations. In his spare time, he coaches a robotics team and plays volleyball. He can be reached at X handle @rajesh_bagwe.