Amazon Polly - AI Voice Generator

Deploy high-quality, natural-sounding human voices in dozens of languages

What is Amazon Polly?

Amazon Polly is a fully-managed service that generates voice on demand, converting any text to an audio stream. Using deep learning technologies to convert articles, web pages, PDF documents, and other text-to-speech (TTS). Polly provides dozens of lifelike voices across a broad set of languages for you to build speech-activated applications that engage and convert. Meet diverse linguistic, accessibility, and learning needs of users across geographies and markets. Powerful neural networks and generative voice engines work in the background, synthesizing speech for you. Integrate the Amazon Polly API into your existing applications to become voice-ready quickly. 

Capabilities

Amazon Polly has a variety of capabilities including some listed below

Lifelike voices

Deliver conversational user experiences in consistently fast response times

When requesting Amazon Polly output, you can choose from dozens of lifelike voices and various languages. Each voice is created using native speakers, with voice-to-voice variations even within the same language. Most languages include one or more male and female voices, so you can choose the best fit for your use case.

Woman on bridge with phone in yellow jacket

Customizable output

Customize and control speech output as needed

Amazon Polly allows you to create custom text-to-speech output that attracts and holds your audience's attention. Use custom lexicons to modify the pronunciation of acronyms, company names, internal terminology, or any other words you choose. Amazon Polly’s Speech Synthesis Markup Languages (SSML) tags also allow you to adjust emphasis, intonation, phrasing, and style. Generate voice AI output that best suits your business.

Image of men working in an office

Gen AI power

Access built-in gen AI capabilities at a fraction of the cost

Amazon Polly supports multiple voice engines that you can choose from to convert text-to-speech. The engine deploys a billion-parameter transformer to generate voices in an incremental, streamable manner. This AI voice generator creates synthetic speech that is assertive, emotionally engaged, and highly colloquial, similar to a real human voice.

Young business people working together on new project

Control and security

Securely store and redistribute speech in standard formats 

Store your text-to-speech output in standard audio files like MP3 and OGG for redistribution, analysis, archiving, or any other use case at no extra cost. Cache your files for faster retrieval if needed. Your content's security, trust, and privacy are AWS’s highest priorities. Amazon Polly does not retain the content of your text submissions.

Image of a person’s hands while working on a PC

Use cases

Add speech to applications with a global audience, such as RSS feeds, websites, or videos. Make your mobile and IoT applications voice-ready for the future.

Learn more about speech generation.

Store and replay Amazon Polly speech output to prompt callers through interactive or automated voice response systems. Use AI capabilities to generate voices that emotionally connect with your customers.

Learn more about voice engines

Create voiceovers for animations, games, and other media directly from your scripts. Use SSML, a W3C standard XML-based markup language, to adjust phrasing, emphasis, and intonation to match the scene. Automatically adjust speech duration to facilitate multilingual dubbing.

Learn more about SSML

FAQs

Yes. Amazon Polly offers free text-to-speech AI services for one year after you sign up - up to a minimum usage threshold. The threshold varies from 100 thousand characters to 5 million characters depending on the voice engine you choose. For more details, see Amazon Polly pricing.
Amazon Polly offers 60+ male and female standard voices in 40+ language and language variants. AWS is constantly updating and adding to our voice capabilities.
Amazon Polly produces MP3, ogg, and other standard audio file formats sampled at 8,000 Hz, 16,000 Hz, and 22,050 Hz.
No. Alexa and Amazon Polly are different technologies. Alexa is a virtual voice assistant that communicates directly with the user. Amazon Polly is a text-to-speech convertor that organizations use to build voice AI apps at scale.
No. Amazon Polly is a fully-managed cloud AI service. You communicate with it using APIs in your code. You cannot download or deploy Amazon Polly source code in your environment. However, you can use Amazon Polly for free (up to a pre-determined usage threshold limit) for 12 months from your start. For more details, see Amazon Polly pricing.

Explore more of AWS