AWS Machine Learning Blog

Create softer speech with the new Amazon Polly phonation tag

Speech Synthesis Markup Language (SSML) is a standardized markup language that enables developers to modify Text-to-Speech (TTS) audio. With SSML, you can control various vocal characteristics of TTS output, such as pronunciation, speech rate, and other elements, to produce a more natural-sounding voice experience.

Today, we are excited to announce a new phonation SSML tag that you can use with Amazon Polly. The new phonation tag enables you to produce a softer dialogue.

Using the new phonation tag

The new amazon:effect tag coupled with the phonation=“soft” tag allows Amazon Polly to generate softer speech. Notice in the sample below, that amazon:effect requires a closing tag. In this case, the first portion of the synthesized speech is spoken with a normal voice, whereas the portion using the phonation tag is spoken more softly.

<speak>
     This is Matthew speaking in my normal voice. <amazon:effect phonation="soft"> This is Matthew speaking in my softer voice. </amazon:effect>
</speak>
Listen now

Voiced by Amazon Polly

Copy the example above and paste it into the Amazon Polly console, and try it with any of the Amazon Polly voices.

Amazon Polly supports standard SSML tags such as prosody, which enables you to control the volume, rate, and pitch of the delivery of the text. Amazon Polly also has unique tags you can use for cool effects, such whispered voice, dynamic range compression, and vocal tract length, which further enhance your ability to modify Amazon Polly voices to best suit your needs.


About the Author

Binny Peh is a Sr. Product Marketing Manager for AWS machine learning solutions. In her spare time, she indulges in too much television and is an aspiring foodie. Binny’s glass is always half-full, and she believes in the power of positive thinking.