AWS Machine Learning Blog
Whooshkaa and Amazon Polly: Combining Eyes and Ears to Widen Publishing Horizons
This is a guest blog post by Robert Loewenthal, CEO & Founder of Whooshkaa.
Based in Australia, Whooshkaa is a creative audio-on-demand podcast platform that helps publishers and advertisers reach their audiences. We’re always trying new products and techniques, and combining them to pioneer new solutions for our customers.
The Amazon Polly Text-To-Speech (TTS) feature is a great example of this. Already, we have top-tier publishers, sporting bodies, and Australia’s biggest telecommunications company using Amazon Polly to augment their established delivery methods.
Those traditional information-providers are finding that today’s customers don’t want to just read information, they want to listen to it. With Amazon Polly TTS, Whooshkaa gives information providers the ability to speak to their audiences – in any of 52 voices and 25 languages.
Earlier this year, Amazon Polly gave a voice to The Australian, our country’s premier national newspaper. Amazon Polly will read aloud newspaper stories, recipes, or sports scores to subscribers while they drive, exercise, or otherwise keep their hands and eyes busy.
Powered by Amazon Polly, Whooshkaa makes it easy for selected partners to pick any news story and convert it into a podcast episode in seconds. We also provide the tools to merge multiple stories and to customize the sound by changing its accent, pitch, rate, and volume.
Whooshkaa has an extensive distribution network, which means that listeners can choose from a wide range of possibilities to consume content. The most obvious choice is through their favorite podcasts app. However, because Whooshkaa has a unique partnership with Facebook, our podcast episodes can be played through their native audio player. Our customizable web player is also supported on Twitter, but it can be embedded on any website.
We believe that when the technology is ready, publishers will be able to make their news stories available in any language, in every part of the world. News stories could be customized to their listeners’ preferences and needs.
We are also working with Australia’s largest telecommunications company, Telstra, and the National Rugby League, to deliver live sport results of a user’s favorite team through any connected smart speaker. Our users can simply ask their device for the current score, and they’ll get it read back to them instantly.
Our developer Christian Carlsson believes the immediacy of Amazon Polly TTS and the range of languages bring limitless opportunities to any type of publisher.
“By integrating artificial intelligence with Whooshkaa’s already-powerful platform it’s now possible to create a fully automated podcast episode from text in less than 30 seconds – and this is just the beginning,” Carlsson says.
Technical implementation of the AFL integration
The Australian Football League (AFL) wanted their fans to be able to follow their favorite team through voice commands to a smart speaker. To do so Whooshkaa needed to create an RSS feed that got updated every 2 minutes with the latest results. The following diagram shows a simplified overview of our implementation.
To trigger a crawl of AFL’s API that contained the data we needed, we set up a simple AWS Lambda function that would call our API. The Whooshkaa API would fetch the data, parse it, convert the text to speech, and publish the newly created RSS feed to Amazon S3.
First, we got the serverless.yml file that is responsible for initializing the requests every 2 minutes. Nothing fancy here.
Serverless.yml:
This triggers the following code:
WhooshkaaAPI.js
Next, the createAFLFeedByTeamID method sends a POST request to our endpoint which does the following:
- Fetches the data from the AFL API.The data normalization is abstracted to a separate AFL package to make this method as readable as possible. A few different conditions determine what data to parse. A team’s match data is fetched if the team is playing or has played in the last 24 hours, otherwise we default to the latest news about the team.
- Makes sure that the returned data is new by storing its hash in Amazon S3.$this->publisher is once again an abstracted class that contains three different storage adapters: local, Whooshkaa S3 bucket, and AFL S3 bucket. When working with the data we use the local adapter, we store the hash in the Whooshkaa S3 bucket, and we publish the generated RSS feed to the AFL S3 bucket.
- Takes the text and converts it to an audio stream through Amazon Polly.You can see in the makeAudio method how we manipulate some of the words so they sound the way we expect them to. For example, MCG, which is a sports stadium, was interpreted as ‘McGee,’ so we tell Amazon Polly to spell it out instead.
- Creates the RSS feed and publishes it to AFL’s S3 bucket.
AFLController.php:
Technical implementation of the Australian’s ‘Daily News’
The Australian is a newspaper publisher under the News Corp umbrella. They wanted their top 10 news headlines of the day available to their listeners in audio. Since their requirements were that the headlines should be updated five times a day as a podcast episode, our integration with Amazon Polly made it fairly easy to implement. The following diagram provides a simplified overview of our implementation.
This implementation has striking similarities with the AFL integration, but with one exception. Instead of generating an RSS feed, we instead publish the episode to a specified show on the Australian’s account on Whooshkaa. This makes the episode almost immediately available on iTunes, Pocket Casts, or any other podcast player.
To build this implementation, we set up a AWS Lambda function, as we did for the AFL implementation because we need to trigger our ‘Daily News’ endpoint at specific times throughout the day.
Serverless.yml
WhooshkaaAPI.js
Next, the createDailyNewsStory handler calls the createDailyNewsStory function which triggers the dailyNews endpoint on our API, as follows.
NewsCorpController.php
The DailyNewsStory extends a StoryBase class that in turn has a dependency injection of a NewsCorpApi class. The values from DailyNewsStory are passed through to the NewsCorpApi class where they fetch and normalize the data.
Next, we’ll generate the audio for all of the stories we have fetched and publish it as a single episode. This is done in the StoryBuilder class, as follows..
StoryBuilder.php
We loop through $this->story->getBody() because it’s an array containing all ten stories previously mentioned. This creates a continuous audio stream from Amazon Polly. The audio stream is then uploaded as an mp3 file to our S3 bucket, and the filename, with the rest of the information, is saved to the database and returned in the request.
Many of our customers generate significant amounts of rich content. We give them a platform, powered by Amazon Polly, to convert their content to audio and then distribute, analyze, and commercialize. One news publisher plans to make its recipe library available through Whooshkaa and Amazon Polly text-to-voice.
Whooshkaa is always looking for ways to innovate with audio. We seek new markets and technology to give our creators the widest distribution network possible. We’ve found that traditional publishers and Amazon Polly are a winning combination.
About the Author
Robert Loewenthal is the CEO & Founder of Whooshkaa. Based in Sydney Australia, Whooshkaa is a full service, audio on-demand company that help creators and brands produce, host, share, track and monetize content.