Building your personal translator with Amazon Translate and Amazon Polly
The most common challenge we can face when traveling abroad is the language barrier. Whether lost or not, we’ll have to say at least one of these: “Where is the best place to eat and drink?”, “Where is this hotel?”, and “Where is the bathroom?”
Now imagine a more difficult scenario: We’re traveling to Spain for business soon and can’t prepare our bank of translated text ahead of time and learn it. If we don’t speak a common language in the meeting, we need to rely on human translators. They’d do an admirable job, but what if we could work around the language barrier and make this a bit more interactive?
Amazon Translate is a machine translation service that delivers fast, high-quality, and affordable language translation. Amazon Polly is a text-to-speech service that uses advanced deep-learning technologies to synthesize speech that sounds like a human voice. With it, you can create applications that talk, and you can build entirely new categories of speech-enabled products.
This tutorial makes the following assumptions:
- You already have an active Amazon Web Services (AWS) account
- You have access to the AWS Tools for PowerShell console
- You have an AWS Identity and Access Management (IAM) user account with the permissions that enable you to create a test IAM user account
The tutorial uses PowerShell to showcase Amazon Translate and Amazon Polly, but you can use the AWS CLI or any supported AWS SDK to achieve the same result. I list the AWS CLI equivalents where relevant, but the code is exclusively PowerShell. It’s available via open source for every system.
For quick but limited testing, you can run the code on any Windows Server instance, .NET Core with Amazon Linux 2 LTS Candidate, or .NET Core with Ubuntu Server 16.04. These Amazon Machine Images (AMIs) are preconfigured with AWS Tools for PowerShell. There’s no sound service to support Amazon Polly, but Amazon Translate will still do a great job.
Step 1: Create an IAM user account and set permissions
To start, we need an IAM user account that has all the necessary permissions to interact with Amazon Translate and Amazon Polly.
This exercise requires the following policies:
- translate:TranslateText – Grants permissions to use the TranslateText action
- comprehend:DetectDominantLanguage – Allows Amazon Comprehend to enable automatic language detection
- polly:DescribeVoices – Returns the list of voices that are available for use
- polly:SynthesizeSpeech – Synthesizes input to a stream of bytes
All of these policies, among others, are available in the AWS managed policies
You can use the IAM console to create a new user that has the preceding policies. Because we want to focus on PowerShell, you can instead execute the following commands from a PowerShell console.
First, make sure you’re using an account that has permissions to add IAM users:
Next, save your AccessKeyId and SecretAccessKey into a new user profile and switch to it:
Step 2: Specify the source and target languages
Get-AWSCmdletName -Service Translate command tells us that there’s only one Amazon Translate cmdlet available:
- ConvertTo-TRNTargetLanguage (AWS CLI: translate-text). That is all we need.
Note – If you want to use the AWS CLI, start with the aws translate translate-text help command.
ConvertTo-TRNTargetLanguage requires the following parameters: the source language code, the target language code, and the text to be translated.
At this time, Amazon Translate supports translation between English and the following languages: Arabic, Chinese (simplified), French, German, Portuguese, and Spanish. Six additional languages will be supported soon: Japanese, Russian, Italian, traditional Chinese, Turkish, and Czech. We use English and Spanish in our examples.
You’ve likely noticed that we use two-letter codes to designate language in Amazon Translate. Valid values are en, es, de, fr, pt, ar, and zh. People who have used Amazon Polly before will immediately notice the difference. Language codes in Amazon Polly look like es-ES, de-DE, fr-FR, and pt-PT. For now, only these four of the languages intersect between the two services, although Amazon Polly includes over 50 lifelike voices and support for 25 languages. To see all of Amazon Polly’s voices, run Get-POLVoice | Format-Table (AWS CLI: describe-voices).
Step 3: Translate using the TranslateText API
Let’s go back to our meeting in Spain. Person A, Justin, speaks only English. Person B, Penelope, speaks only Spanish. Rather than one complex script to handle everything for both people, you can create one simple script for each person. The full script and a video of it in action is included at the end of this post. Use the following code to capture all of the text input and passing to Amazon Translate:
It doesn’t get any simpler. The only difference is the swapped language positions, and the only variable is the text to translate.
Looks good, and it’s pretty easy to pull off in real time, as shown in this image.
Step 4: Use the SynthesizeSpeech API in Amazon Polly
You can make the conversation more interactive by adding speech to the translated texts so Justin and Penelope don’t have to look at the screen the whole time. For that, you need Amazon Polly.
Amazon Polly comes with a variety of voices. It has four available for Spanish and 16 for English:
Justin needs to “speak” in Spanish, and he’ll use the voice of Enrique. Penelope will have Joanna speak her translations to English.
Now for the fun part.
Get-POLSpeech (AWS CLI: synthesize-speech) calls the Amazon Polly
SynthesizeSpeech API operation and synthesizes the text into a stream of bytes. The longer the translation, the longer the stream. Some limitations are the size of the input text and the length of the output audio stream (the synthesis): 3000 characters and 10 minutes, respectively. You won’t get close to either limit in this exercise. For the current limits in Amazon Polly, see Limits in Amazon Polly.
The following code copies the synthesis into an .mp3 file in the temporary location and makes it ready for playback:
Step 5: Set up automatic audio playback
You can use any player you want, but VLC offers useful benefits for this kind of translation. The
--play-and-exit command-line parameters tell it to run in the background, without taking away the application focus, and to automatically exit when the playback is complete:
And that’s it! Type what you want to say, and Amazon Translate gives Amazon Polly the translation to speak out loud for you. Keep typing responses to enjoy a functional, real-time conversation in a language you don’t speak.
This video is an example of how it may look like.
Step 6: Clean up the AWS resources
The last command in the preceding script, Remove-Item, cleans up the .mp3 file after you’re done with it. Now that the exercise is complete, it’s a good idea to remove all of the AWS resources you’ve created.
Your footprint is pretty light. All you need to do is the following:
- Remove the user profile from your system
- Detach the AWS managed policies from the IAM user account and remove the access keys
- Remove the IAM user account
Use the Set-AWSCredential cmdlet to return to your main IAM user account and run the following commands:
Note that you’re intentionally not using the -Force parameter to skip confirmation prompts anywhere in the removal process. This way, you have to confirm every action and make sure that the user account you’re removing is the one you want.
Most people can type pretty fast. Maybe not at the same speed as they can talk, but it’s usually close enough. But now you can type in your own language and have the translation come out as a natural voice in the language of your business partner. It’s like having your own Babel fish, thanks to Amazon Translate and Amazon Polly.
In this post, we’ve shown how to use machine translation to turn your language into another one that could be completely unknown to you. Then you used deep-learning technologies to synthesize that new text into lifelike speech, allowing real-time communication between people who share no common language.
You’ll probably still need to do some hand-waving when you go into souvenir shops. After all, nobody comes prepared with questions like “Can I see that green elephant made of glass…? There, on the far left of your second shelf.” Or will you? Bring the laptop and let Amazon Translate and Amazon Polly help you have an AWSome vacation! You can bring your tablet instead and establish a connection to your Amazon WorkSpace. It will run PowerShell and speak for you! Take a look.
The business meeting discussed here, the tourist in the foreign land, and help with learning a new language are all just the tip of an iceberg. This quick demonstration of the power of machine learning will surely have you coming up with other real-world scenarios that you can address in a similar fashion.
For the full script of this exercise with all the added options, see Building your personal translator with Amazon Translate and Amazon Polly. Here is how it looks in action.
Please note that the script is provided as-is. Although all it does is make the API calls, it hasn’t been tested on every operating system. You might need to tweak the code, which is licensed under the Apache License, Version 2.0. See the license for the specific language governing permissions and limitations.
About the author
Sinisa Mikasinovic is a cloud support engineer who helps AWS customers using automation tools such as AWS CloudFormation and AWS OpsWorks for Puppet Enterprise as well as Amazon WorkSpaces (as a subject matter expert) and Amazon EC2 Windows. He’s a PowerShell enthusiast and the owner of a Windows phone. Happily married and the father of a young, loud boy, he spends a lot of free time drowned in audiobooks.