Amazon Polly

Amazon Polly features

Simple-to-Use API

Amazon Polly provides an API that enables you to quickly integrate speech synthesis into your application. You simply send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so your application can begin streaming it directly or store it in a standard audio file format, such as MP3.

Sampling rate	Sample Code
"Hi. My name is Joanna."	from boto3 import client polly = client("polly", region_name="us-east-1") response = polly.synthesize_speech( Text="Hi. My name is Joanna.", OutputFormat="mp3", VoiceId="Joanna")

Wide Selection of Voices and Languages

Amazon Polly includes dozens of lifelike voices and support for a variety of languages, so you can select the ideal voice and distribute your speech-enabled applications in many countries. In addition to Standard and Neural Text-to-Speech (NTTS) voices, Amazon Polly now offers Long-Form and Generative voices that improve speech quality for more natural and human-like voices. Danielle, Gregory, Ruth, Patrick, Alba, and Raúl are voices available in a long-form variant. Ruth, Matthew, Amy, Joanna, Danielle, Stephen, Olivia, Ayanda, Lucia, Lupe, Léa, Mía, and Vicki, Bianca, Kajal, Pedro, Andrés, Sergio, Daniel, Rémi, Salli, Isabelle, Céline, Liam, Gabrielle, Ola, Ewa, Laura, Lisa, Nimah, Hannah, Camila, Seoyeon, Brian, Aria, Jasmine, Tiffany, Ambre, Florian, Sabrina, Lennart, Beatrice and Lorenzo are available in a generative variant.

Language or Language variant	Female	Male
Arabic-MSA	Zeina (Standard)
Arabic - Gulf	Hala (Neural)	Zayd (Neural)
Cantonese	Hiujin (Neural)
Catalan	Arlet (Neural)
Danish	Sofie (Neural)	Mads (Standard)
Danish	Naja (Standard)
Dutch (Netherland)	Laura (Neural)	Ruben (Standard)
Dutch (Netherland)	Lotte (Standard)
Dutch (Netherland)	Laura (Generative)
Dutch (Flemish) - Belgium	Lisa (Generative)
Dutch (Flemish) - Belgium	Lisa (Neural)
English - India	Kajal (Neural)
English - India	Raveena (Standard)
English - India	Aditi (Standard)
English - India	Kajal (Generative)
English - Ireland	Niamh (Neural)
English - New Zealand	Aria (Generative)
English - New Zealand	Aria (Neural)
English - Singapore	Jasmine (Generative)
English - Singapore	Jasmine (Neural)
English - South Africa	Ayanda (Generative)
English - South Africa	Ayanda (Neural)
English – UK	Amy (Generative)	Brian (Generative)
English – UK	Amy (Neural)	Brian (Neural)
English – UK	Amy (Standard)	Brian (Standard)
English – UK	Emma (Neural)	Arthur (Neural)
English – UK	Emma (Standard)
English – US	Ruth (Generative)	Patrick (Long-Form)
English – US	Ruth (Long-Form)	Gregory (Long-Form)
English – US	Ruth (Neural)	Gregory (Neural)
English – US	Danielle (Generative)	Stephen (Generative)
English – US	Danielle (Long-Form)	Stephen (Neural)
English – US	Joanna (Generative)	Matthew (Generative)
English – US	Joanna (Neural)	Matthew (Neural)
English – US	Tiffany (Generative)
English - Ireland	Niamh (Generative)
English – US	Joanna (Standard)	Matthew (Standard)
English – US	Salli (Generative)
English – US	Salli (Neural)	Justin (Neural)
English – US	Salli (Standard)	Justin (Standard)
English – US	Kendra (Neural)	Joey (Neural)
English – US	Kendra (Standard)	Joey (Standard)
English – US	Kimberly (Neural)
English – US	Kimberly (Standard)
English – US	Ivy (Neural)
English – US	Ivy (Standard)
English - Wales		Geraint (Standard)
English - Australia	Olivia (Generative)	Russell (Standard)
English - Australia	Olivia (Neural)
English - Australia	Nicole (Standard)
Finnish	Suvi (Neural)
French - Belgium	Isabelle (Generative)
French - Belgium	Isabelle (Neural)
French - Canada	Gabrielle (Generative)	Liam (Generative)
French - Canada	Gabrielle (Neural)	Liam (Neural)
French - Canada	Chantal (Standard)
French - France	Léa (Generative)	Mathieu (Standard)
French - France	Léa (Neural)	Rémi (Generative)
French - France	Léa (Standard)	Rémi (Neural)
French - France	Céline (Generative)
French - France	Céline (Neural)
French - France	Ambre (Generative)	Florian (Generative)
German - Austria	Hannah (Generative)
German - Austria	Hannah (Neural)
German - Germany	Vicki (Generative)	Daniel (Generative)
German - Germany	Vicki (Neural)	Daniel (Neural)
German - Germany	Vicki (Standard)	Hans (Standard)
German - Germany	Marlene (Standard)	Lennart (Generative)
German - Swiss	Sabrina (Generative)
German - Swiss	Sabrina (Neural)
Hindi - India	Kajal (Neural)
Hindi - India	Aditi (Standard)
Icelandic	Dóra (Standard)	Karl (Standard)
Italian	Bianca (Generative)
Italian	Bianca (Neural)	Adriano (Neural)
Italian	Bianca (Standard)	Giorgio (Standard)
Italian	Carla (Standard)
Italian	Lorenzo (Generative)	Beatrice (Generative)
Japanese	Kazuha (Neural)	Takumi (Neural)
Japanese	Tomoko (Neural)	Takumi (Standard)
Japanese	Mizuki (Standard)
Korean	Seoyeon (Generative)
Korean	Seoyeon (Neural)
Korean	Jihye (Neural)
Korean	Seoyeon (Standard)
Mandarin	Zhiyu (Neural)
Norwegian	Zhiyu (Standard)
Norwegian	Ida (Neural)
Norwegian	Liv (Standard)
Polish	Ola (Generative)
Polish	Ola (Neural)	Jacek (Standard)
Polish	Ewa (Generative)
Polish	Ewa (Standard)	Jan (Standard)
Polish	Maja (Standard)
Portuguese - Brazil	Vitória (Neural)	Ricardo (Standard)
Portuguese - Brazil	Vitória (Standard)	Thiago (Neural)
Portuguese - Brazil	Camila (Generative)
Portuguese - Brazil	Camila (Neural)
Portuguese - Brazil	Camila (Standard)
Portuguese - Portugal	Inês (Neural)	Cristiano (Standard)
Portuguese - Portugal	Inês (Standard)
Romanian	Carmen (Standard)
Russian	Tatyana (Standard)	Maxim (Standard)
Spanish - Mexico	Mia (Generative)	Andrés (Generative)
Spanish - Mexico	Mia (Neural)	Andrés (Neural)
Spanish - Mexico	Mia (Standard)
Spanish - Spain	Alba (Long-Form)	Raúl (Long-Form)
Spanish - Spain	Lucia (Generative)	Sergio (Neural)
Spanish - Spain	Lucia (Neural)	Enrique (Standard)
Spanish - Spain	Lucia (Standard)	Sergio (Generative)
Spanish - US	Conchita (Standard)
Spanish - US	Lupe (Generative)	Pedro (Generative)
Spanish - US	Lupe (Standard)	Pedro (Neural)
Spanish - US	Lupe (Neural)	Miguel (Standard)
Swedish	Elin (Neural)
Spanish - US	Penélope (Standard)
Swedish	Astrid (Standard)
Turkish	Filiz (Standard)
Turkish	Burcu (Neural)
Welsh	Gwyneth (Standard)

Synchronize Speech for an Enhanced Visual Experience

Amazon Polly makes it easy to request an additional stream of metadata that provides information about when particular sentences, words and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, you can now build your applications with an enhanced visual experience, such as speech-synchronized facial animation or karaoke-style word highlighting.

Please visit the documentation to learn more about how to use Speech Marks.

Optimize Your Streaming Audio

With Amazon Polly, you can stream all kinds of information through your application to users in near real time. You can also choose from various sampling rates to optimize bandwidth and audio quality for your application. Amazon Polly supports MP3, Vorbis, and raw PCM audio stream formats.

Sampling rate	MP3 size	OGG size	PCM size
24.00 kHz	19.31 kB	18.11 kB	N/A
22.05 kHz	19.33 kB	17.62 kB	N/A
16.05 kHz	16.22 kB	15.48 kB	100.68 kB
8.00 kHz	13.26 kB	9.72 kB	50.34 kB

Adjust Speaking Style, Speech Rate, Pitch, and Loudness

Amazon Polly supports Speech Synthesis Markup Language (SSML), a W3C standard, XML-based markup language for speech synthesis applications, and supports common SSML tags for phrasing, emphasis, and intonation. Custom Amazon SSML tags provide unique options, such as the ability to make certain voices speak in a Newscaster speaking style. This flexibility helps you create lifelike speech that will attract and hold the attention of your audience.

Sample	SSML
This is how I speak normally.	(none)
I can also speak in a Newscaster style, as if I were reading a news article or delivering a flash briefing.	<speak><amazon:domain name="news">I can also speak in a Newscaster style, as if I were reading a news article or delivering a flash briefing.</amazon:domain></speak>
I can speak in a higher pitched voice, or I can speak in a lower pitched voice.	<speak>I can speak in a <prosody pitch="high">higher pitched voice</prosody>, or I can speak <prosody pitch="low">in a lower pitched voice</prosody></speak>
I can speak really slowly, or I can speak really fast.	<speak>I can speak <prosody rate="x-slow">really slowly</prosody>, or I can speak <prosody rate="x-fast">really fast</prosody></speak>
I can also speak very loudly, or I can speak very quietly.	<speak>I can also speak <prosody volume="x-loud">very loudly</prosody>, or I can speak <prosody volume="x-soft">very quietly</prosody>. </speak>
I can whisper.	<speak>I have a secret to tell you, I will whisper it to you.<amazon:effect name="whispered">'<prosody rate="x-slow"> <prosody volume="loud">I am not human.</prosody></prosody></amazon:effect>Can you believe it?</speak>

To learn more, visit the Amazon Polly documentation on SSML tags.

Newscaster Speaking Style

Amazon Polly can be used to synthesize speech as if it is were spoken by a TV or Radio newscaster. This can be a great way to read news articles or deliver flash briefing updates. The Newscaster style is currently available for the US English (en-US) Matthew and Joanna voices, British English (en-GB) Amy and US Spanish (es-US) Lupe voice using Neural text-to-speech. Listen to an audio sample in US English, British English or US Spanish.

Adjust the Maximum Duration of Speech

Amazon Polly enables you to automatically adjust the speech rate based on a maximum allotted amount of time you define with a feature called time-driven prosody. This is beneficial for many use cases, especially when it comes to localization.

For example, suppose you have US English speech embedded in your training video and want to localize this video into German. Let’s say you translate the text using Amazon Translate and voice it with Polly. It is essential that the localized German speech streams in corresponding frames of the video, so the German speech cannot be longer than the US English speech. You can use this feature to more easily facilitate the dubbing process.

Platform and Programming Language Support

Amazon Polly supports all the programming languages included in the AWS SDK (Java, Node.js, .NET, PHP, Python, Ruby, Go, and C++) and AWS Mobile SDK (iOS/Android). Polly also supports an HTTP API so you can implement your own access layer.

Speech Synthesis via API, Console, or Command Line

Amazon Polly can be accessed via the Polly API (and various language-specific SDKs), AWS Management Console, and the AWS command-line interface (CLI). You have full control over all the capabilities of Amazon Polly, whether you use the service through the console, the API, or the CLI. Generative engine offers now also Bidirectional Streaming API allowing for streaming input and output at the same time. This functionality is available through AWS SDKs. Please visit the documentation to learn more about how to use it.

Custom Lexicons

With Amazon Polly’s custom lexicons, or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of Nguyen by providing a phoneme using this XML:

Nguyen (before)
<lexeme>

<grapheme>Nguyen</grapheme>

<grapheme>nguyen</grapheme>

<grapheme>NGUYEN</grapheme>

</lexeme>
Nguyen (after)

Brand Voice

Brand Voice is a custom engagement where you work with the Amazon Polly team to build an Neural Text-to-Speech (NTTS) voice for the exclusive use of your organization. Brand Voice allows you to differentiate your products and applications with a unique vocal identity in a wide variety of use cases, including Amazon Connect Customer and Alexa Skills integrations. We work with you throughout the entire process to identify the persona, identify an actor or actress and record their speech, and ultimately build and train a model to produce the voice. The voice is then made available to your AWS account ID(s).

Listen to National Australia Bank Brand Voice

Listen to Bank of New Zealand Brand Voice

If you are interested in building a Brand Voice using Polly, please reach out to your AWS Account Manager or contact us for more information.

Contact center integrations

Amazon Connect Customer

Amazon Polly is natively integrated with Connect Customer, AWS’ cloud-based contact center solution that you use to set up and manage a customer contact center and provide reliable customer engagement at any scale. To learn more about adding text to speech prompts to your conversational interactive voice response system, please see how to use Polly voices within Connect Customer.

Genesys Cloud CX

Genesys Cloud CX is a cloud contact center solution that unifies customers and agent experiences across multiple channels such as phone, text and chat. You can deploy your voice bots using any of the existing Polly voices. Please refer to Genesys Cloud documentation for more information.

Amazon Chime SDK

The Amazon Chime SDK is a set of real-time communications components that developers can use to quickly add audio calling, video calling, and screen sharing capabilities to their own web, mobile or telephony applications. The Amazon Chime SDK supports native integration with Amazon Polly, making it easy for builders to create applications that turn text and numerical data into lifelike speech and automatically play the output to a phone caller.

AWS Contact Center Intelligence (CCI)

Amazon Polly is used by several AWS CCI partners, so you can seamlessly create self-service customer service virtual agents, informational bots or application bots. Amazon Polly partners include Genesys, Vonage, and Accenture. To learn more about partners, visit AWS CCI and AWS CCI Partners page.