Overview
AssemblyAI offers Speech AI models via an API that product teams and developers can use to build powerful AI solutions based on voice data for their users. Thousands of developers build on AssemblyAI's Speech AI models every day to run Speech-to-Text on multilingual speech, and harness the power of Large Language Models to extract the full value from that voice data - including answering questions from voice data, generating content, and extracting metadata in seconds. AssemblyAI offers async transcription, with most audio files completing in well under 45 seconds regardless of audio duration, as well as real-time transcription with high accuracy and <600 ms of latency.
AssemblyAI gives you access to state-of-the-art Speech AI models and capabilities for real-world use cases, so you can build smarter applications in a fraction of the time. Models and features include:
- Speech recognition
- Speaker diarization
- Auto punctuation and casing
- Auto language detection
- Summarization
- Content moderation
- Sentiment analysis
- Auto highlights
- PII redaction
- Topic detection (IAB classification)
- Entity detection
- Auto chapters
- Custom spelling
- Custom vocabulary
- Dual channel transcription
- Export SRT or VTT caption files
- Filler word filtering
- Profanity filtering
In addition, LeMUR, which allows users to leverage the capabilities of Large Language Models, can quickly process audio transcripts for single or multiple audio files for tasks like summarization, question & answer, and AI coaching feedback.
Our Speech AI products support 33 different audio and video file types and 99+ languages. Our models are used by thousands of breakthrough startups and dozens of global enterprises for mission-critical workloads.
In Pricing, one unit is equivalent to one hour and for Enterprise Pricing please contact sales: www.assemblyai.com/contact
Highlights
- Human-level accuracy: Our latest multilingual AI model for speech recognition Universal-1 achieves state-of-the-art accuracy on a wide variety of academic and real-world datasets compared to other ASR models, and is 93% accurate.
- More than just a model: Designed for real-world applications, our API includes critical features that help you understand human speech. Our API processes terabytes of audio data every day with over 99.9% uptime and success, and is compliant with SOC 2 Type 2, PCI DSS, and GDPR.
- Build smarter apps: Summarize, diarize, detect sentiment, moderate content, redact PII, and more with our set of Audio Intelligence models. Or leverage LeMUR, our framework to build LLM-powered apps on spoken data.
Details
Pricing
Dimension | Description | Cost/month |
---|---|---|
Pay As You Go | State-of-the-art production-ready AI models | $0.00 |
The following dimensions are not included in the contract terms, which will be charged based on your usage.
Dimension | Cost/unit |
---|---|
Async Transcription (core) | $0.37 |
Nano Speech-to-Text (core) | $0.12 |
Real-Time Transcription (core) | $0.47 |
Auto Chapters (Audio Intelligence) | $0.08 |
Content Moderation (Audio Intelligence) | $0.15 |
Entity Detection (Audio Intelligence) | $0.08 |
Key Phrases (Auto Highlights) | $0.01 |
PII Redaction (Audio Intelligence) | $0.08 |
PII Audio Redaction (Audio Intelligence) | $0.05 |
Sentiment Analysis (Audio Intelligence) | $0.02 |
Vendor refund policy
All fees are non-refundable and non-cancellable except as required by law.
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
Support is available via chat and email 24/7. support@assemblyai.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Customer reviews
Accurate and consistent ASR
Overall, the Perfect Product!
Assembly has kept up with our startup's high volume of requests. We grew quickly from 0-180,000 users within 7 months and used Assembly from our very first MVP into our full scale production versions now. We have had practicaly no issues throughout this process - and if we did - support was quick to provide a solution.
In addition to all of the above, the price for the product is perfect.
The other downside is the number of customizable outputs from the API. We wanted our Speech to Text output to deliver us an SRT file that only displays one word at a time on the screen (since this is a popular format on social media). Assembly does not support these kinds of customizations - however, they did offer a solution for this that requires custom code post-processing of the API call which we appreciated.
Therefore, while there are a few small drawbacks, we have been able to accurately deliver speech-to-text recognition to our 180,000+ users thanks to Assembly.