AWS Machine Learning Blog

How Amazon Polly Breathed Life into Dan Brown’s Digital Assistant

This is a guest post by Damian Dutton, CEO and Founder of Beeliked. Beeliked is, in their own words, “a digital marketing platform offering a wide range of campaigns to help brands engage with their existing audiences and reach new customers through the viral and social nature of the contests and games.”

To support the October 3 release of Dan Brown’s new novel, Origin, Doubleday, a division of Penguin Random House, asked BeeLiked to develop a digital campaign that would appeal to Dan’s large social media fan base. We knew we had to create something compelling and unique. Initially, we just planned to hold a contest to design a cover for a limited edition of the novel. We knew that that would appeal to artists, but we wanted to involve all of Dan’s six million Facebook fans. So we decided to reward those who voted on their favorite cover design with a video virtual book signing by Dan Brown himself.

We wanted to create a really intimate feel, something that combined the fantasy of a Robert Langdon novel with a private meeting with Dan. Using Amazon Polly and our video platform, we created exactly the experience that we wanted. After voting on a book cover, fans are transported to their own private book signing where Brian, Amazon Polly’s British English male voice, welcomes them by name into Dan’s workspace with the following words:

‘Welcome {first_name}, we’ve been looking forward to your visit’.

Then Dan signs a personalized message in a copy of the book with their chosen cover. The video is short, simple, and focused.

Reasons for Choosing Amazon Polly

The voice was important because we wanted to emulate the novel, which includes a personal assistant with a British accent.  To preserve the illusion that we were creating, the voice also needed to be authentic and natural. Clearly, a robotic voice would have distracted from the experience.

We investigated a number of ways to add a natural, personalized voice, including using an actor. However, we soon realized that using an actor isn’t scalable. Since it is impossible to preempt and record all the names of the contest participants in advance. Amazon Polly provided us with a very simple and flexible text-to-speech solution that we were able to quickly demo and evaluate.

Implementing the Solution

To ensure that Brian’s voice has the perfect intonation, we tuned the string for the spoken sentence used in the video with SSML (Speech Synthesis Markup Language). SSML includes tags that allow you to change the speed and intonation of speech so that the spoken sentence sounds more natural.

To generate the personalized welcome sentence when an individual enters the campaign, we used the Amazon Polly API . When an entrant votes on a cover and submit their name along with the entry, it triggers an automated a call to the Speech method in the Amazon Polly API, sending the SSML string, with the entrant’s first name and the selected voice for speech synthesis. The Speech method returns the rendered audio in milliseconds. We store the recording locally as an MP3 file in a cache and reuse it for people with the same name.

In the video module, we added options that allow us to easily switch between the voices used in the audio, i.e.  Dan Brown’s voice recording and Brian – English British Male – voiced by Amazon Polly. Finally, when generating the video, the MP3 file is passed as a parameter to the video renderer, so it is included in the video at the appropriate placeholder.

Getting the Desired Speech Output

To check for bad language before we pass the entrant’s first name to Amazon Polly, we pass the name through the WebPurify application. If the name fails, we replace the first name with a generic name, such as Friend.

To make the phrase sound more natural, we tuned the voice using SSML. We tried many different settings until we found the perfect result. Ultimately, we chose to use two different speech rates (prosody rates) to control intonation, as well as a longer pause after the comma.

The original audio file:

<speak><prosody rate="100%">Welcome {{entrant_first_name}}.</prosody> <prosody rate="100%">We've been looking forward to your visit</prosody></speak>

A variation with a longer pause after the comma:

<speak><prosody rate="130%">Welcome {{entrant_first_name}}. <break strength="strong"/> </prosody> <prosody rate="100%">We've been looking forward to your visit</prosody></speak>

We achieved exactly what we wanted with the following settings:

<speak><prosody rate="130%">Welcome {{entrant_first_name}}.</prosody> <prosody rate="110%">We've been looking forward to your visit</prosody></speak>

#DanBrownOrigin Campaign

The video was shared widely under the hashtag #DanBrownOrigin and helped create a real buzz for the book launch. We got a lot of positive feedback from users. Many were impressed that their name was pronounced correctly.

Even Dan Brown got involved in the campaign. He personalized a video for his cat and put it on Facebook:

Our Overall Experience with Amazon Polly

We were impressed with how easy it was to add Amazon Polly to our platform. It’s is a very powerful marketing tool, and we’ve already used it for another campaign. The fact that we could offer this service with such a short turnaround time is a big plus.

Additional Reading

Learn how Duolingo uses Amazon Polly to power language learning for more than 170 million users.