Overview
Grok
The xAI Voice APIs offer a range of powerful voice capabilities, all powered by Grok, with enterprise-grade reliability and sub-second latency. Grok Voice excels at complex, ambiguous, multi-step workflows across customer support, sales, and enterprise applications. It is especially well-suited for high-stakes scenarios that demand precise data entry and high-volume tool calling to address the user's request.
Grok Voice combines top-tier intelligence with low response latency and organic conversational ability. Our model prioritizes snappy responses and unparalleled cost effectiveness without compromising on accuracy or tool orchestration. The result is a model that lets teams confidently deploy complex, multi-turn voice experiences across almost any conceivable use case: Customer support, phone sales, appointment booking, restaurant reservations, and more.
The model has been battle-tested in the toughest real-world conditions: telephony audio, background noise, heavy accents, and frequent interruptions. It natively supports 25+ languages, making it ideal for global deployments.
Highlights
- Voice Agent API: Build real-time, speech-to-speech voice agents over WebSockets, with low-latency turn-taking and tool use. For client-side apps, use Ephemeral Tokens to connect securely without exposing your API key.
- Text-to-Speech (TTS): Convert text to spoken audio in 5 expressive voices. Inline speech tags (laughter, whispers, pauses) and output formats from high-fidelity MP3 to telephony u-law. Unary requests or WebSocket streaming.Speech-to-Text (STT): Transcribe audio files in a single call or stream over WebSocket. 12 audio formats, word-level timestamps, multichannel, speaker diarization, Smart Turn end-of-turn detection, and 25 languages.
- Custom Voices: Clone a voice from a short reference clip, then use the resulting voice_id anywhere a built-in voice works.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/unit |
|---|---|---|
realtime_voice_min | Realtime / $ per minute | $0.05 |
realtime_voice_hour | Realtime / $ per hour | $3.00 |
realtime_text | Realtime Text / $ per message | $0.004 |
tts_chars | Text-to-Speech / $ per 1M chars | $15.00 |
stt_rest_hr | Speech-to-Text / $ per hr (REST) | $0.10 |
stt_streaming_hr | Speech-to-Text / $ per hr (Streaming) | $0.20 |
Vendor refund policy
All fees are non-refundable and non-cancellable except as required by law.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
Please contact xAI sales at for information about custom agreements and specialized pricing. Technical support requests should be directed towards
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.