Amazon Polly - AI Voice Generator

Deploy high-quality, natural-sounding human voices in dozens of languages

What is Amazon Polly?

Amazon Polly is a fully-managed service that generates voice on demand, converting any text to an audio stream. Using deep learning technologies to convert articles, web pages, PDF documents, and other text-to-speech (TTS). Polly provides dozens of lifelike voices across a broad set of languages for you to build speech-activated applications that engage and convert. Meet diverse linguistic, accessibility, and learning needs of users across geographies and markets. Powerful neural networks and generative voice engines work in the background, synthesizing speech for you. Integrate the Amazon Polly API into your existing applications to become voice-ready quickly. 

Capabilities

Amazon Polly has a variety of capabilities including some listed below

Lifelike voices

Deliver conversational user experiences in consistently fast response times

When requesting Amazon Polly output, you can choose from dozens of lifelike voices and various languages. Each voice is created using native speakers, with voice-to-voice variations even within the same language. Most languages include one or more male and female voices, so you can choose the best fit for your use case.

Woman on bridge with phone in yellow jacket

Customizable output

Customize and control speech output as needed

Amazon Polly allows you to create custom text-to-speech output that attracts and holds your audience's attention. Use custom lexicons to modify the pronunciation of acronyms, company names, internal terminology, or any other words you choose. Amazon Polly’s Speech Synthesis Markup Languages (SSML) tags also allow you to adjust emphasis, intonation, phrasing, and style. Generate voice AI output that best suits your business.

Image of men working in an office

Gen AI power

Access built-in gen AI capabilities at a fraction of the cost

Amazon Polly supports multiple voice engines that you can choose from to convert text-to-speech. The engine deploys a billion-parameter transformer to generate voices in an incremental, streamable manner. This AI voice generator creates synthetic speech that is assertive, emotionally engaged, and highly colloquial, similar to a real human voice.

Young business people working together on new project

Control and security

Securely store and redistribute speech in standard formats 

Store your text-to-speech output in standard audio files like MP3 and OGG for redistribution, analysis, archiving, or any other use case at no extra cost. Cache your files for faster retrieval if needed. Your content's security, trust, and privacy are AWS’s highest priorities. Amazon Polly does not retain the content of your text submissions.

Image of a person’s hands while working on a PC

Use cases

Add speech to applications with a global audience, such as RSS feeds, websites, or videos. Make your mobile and IoT applications voice-ready for the future.

Learn more about speech generation.

Store and replay Amazon Polly speech output to prompt callers through interactive or automated voice response systems. Use AI capabilities to generate voices that emotionally connect with your customers.

Learn more about voice engines

Create voiceovers for animations, games, and other media directly from your scripts. Use SSML, a W3C standard XML-based markup language, to adjust phrasing, emphasis, and intonation to match the scene. Automatically adjust speech duration to facilitate multilingual dubbing.

Learn more about SSML