AWS Developer Tools Blog
Cross-Platform Text-to-Speech for C++ with Amazon Polly
Amazon Polly launched at re:invent 2016. Because C++ allows us direct access to sound drivers, we decided to try using Amazon Polly for cross-platform text-to-speech applications. The result of our experiment is the new text-to-speech library for the AWS SDK for C++.
Let’s look at some Code Examples.
List available output devices
#include <aws/core/Aws.h>
#include <aws/text-to-speech/TextToSpeechManager.h>
#include <iostream>
using namespace Aws::Polly;
using namespace Aws::TextToSpeech;
static const char* ALLOCATION_TAG = "PollySample::Main";
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
{
auto client = Aws::MakeShared<PollyClient>(ALLOCATION_TAG);
TextToSpeechManager manager(client);
std::cout << "available devices are: " << std::endl;
auto devices = manager.EnumerateDevices();
for (auto& device : devices)
{
std::cout << "[" << device.first.deviceId << "] " << device.first.deviceName << " Driver: "
<< device.second->GetName() << std::endl;
}
}
Aws::ShutdownAPI(options);
return 0;
}
Here, the manager lists all output devices and drivers that are installed by default on your system. Then you can iterate those devices and select the best output device for your application.
List available voices
#include <aws/core/Aws.h>
#include <aws/text-to-speech/TextToSpeechManager.h>
#include <iostream>
using namespace Aws::Polly;
using namespace Aws::TextToSpeech;
static const char* ALLOCATION_TAG = "PollySample::Main";
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
{
auto client = Aws::MakeShared<PollyClient>(ALLOCATION_TAG);
TextToSpeechManager manager(client);
std::cout << "available voices are: " << std::endl;
for (auto& voice : manager.ListAvailableVoices())
{
std::cout << voice.first << " language: " << voice.second << std::endl;
}
}
Aws::ShutdownAPI(options);
return 0;
}
In this example, the manager lists all available voices from Amazon Polly and lists them to the standard output.
Finally, after we’ve selected an audio output device and a voice to use, we can send text from Amazon Polly. The text will be played directly to our audio output.
#include <aws/core/Aws.h>
#include <aws/text-to-speech/TextToSpeechManager.h>
#include <iostream>
using namespace Aws::Polly;
using namespace Aws::TextToSpeech;
static const char* ALLOCATION_TAG = "PollySample::Main";
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
{
auto client = Aws::MakeShared<PollyClient>(ALLOCATION_TAG);
TextToSpeechManager manager(client);
//iterate devices and select the device and capabilities you'd like to play to
//...
manager.SetActiveDevice(device, deviceInfo, capability);
//iterate voices and select the one you wish to use.
//...
manager.SetActiveVoice(selectedVoice);
//this is a callback for handling the result since SendTextToOutputDevice is an
//asynchronous operation.
SendTextCompletedHandler handler;
manager.SendTextToOutputDevice("Hello World", handler);
}
Aws::ShutdownAPI(options);
return 0;
}
We’ve also created an Amazon Polly sample console application to demonstrate how to use this API.
Platform Support
We’ve provided default implementations for various platforms.
- On Windows, we use the WaveForm Audio API. This should work for both desktop and mobile Windows applications.
- For most POSIX systems, we’ve provided a PulseAudio implementation. To use this in your builds, you need to install the header files for PulseAudio. Also be sure your deployment targets have a Pulse server installed and configured. The development packages can most likely be installed via
apt-get install libpulse-dev
oryum install pulseaudio-libs-devel
. - On Apple platforms, we’ve integrated with the Core Audio frameworks. This works out of the box for OSX and iOS devices.
Of course, we’ve also provided a way for you to use your own audio driver implementations. All you need to do is pass your own implementation of Aws::TextToSpeech::PCMOutputDriverFactory
to the constructor for Aws::TextToSpeech::TextToSpeechManager
.
We’re really excited to see what kinds of innovative applications our users will apply this to. Currently, we’ve only provided the capability to use raw audio. Depending on the use cases we see customers implement, we’ll likely go back and add MP3 and OGG Vorbis support. Please let us know how you’re using this text-to-speech library now and how you would like to use it!