Amazon Polly Introduces a new Whispered Voice Tag, and Speech Marks for Synchronization with Visual Animation

Posted on: Apr 19, 2017

You can now add a whispered speech effect in your Text-to-Speech output with Amazon Polly, and synchronize speech with visual animation using Speech Marks. To create voices with a whispered effect, you simply use the SSML tag “whispered” to mark the text input to be spoken in a hushed or whispered voice. This tag can be applied to any of the 47 voices in Amazon Polly’s Text-to-Speech portfolio. Visit the Amazon Polly documentation for more information on how to use the new “whispered” SSML tag.

In addition, the new Speech Marks feature in Amazon Polly allows you to build visual animations that are synchronized with speech output. With Speech Marks, you can request a stream of metadata with information about the offset of specific elements of text in the generated speech, including sentences, words, SSML tags, and visemes (which describe facial cues that correspond to the sounds that are spoken). Using this metadata stream, in combination with the synthesized speech audio stream, you can now build applications with an enhanced visual experience, such as avatars with speech-synchronized facial animations, or karaoke-style word highlighting. Speech Marks metadata requires a separate API request and incurs the same per-character pricing as for speech output, at a rate of $4.00 per 1 million characters, when outside of the free tier. Visit the Amazon Polly pricing page for details. You can still take advantage of the free tier which includes 5 million characters per month, for the first 12 months, starting from the first request for speech.  

Visit the Amazon Polly documentation on Speech Marks for more information, and try both new features through the Amazon Polly console, today!