AWS Machine Learning Blog

Translate documents in real time with Amazon Translate

A critical component of business success is the ability to connect with customers. Businesses today want to connect with their customers by offering their content across multiple languages in real time. For most customers, the content creation process is disconnected from the localization effort of translating content into multiple target languages. These disconnected processes delay the business ability to simultaneously publish content in multiple languages, inhibiting their outreach efforts which negatively impacts time to market and revenue.

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Now, Amazon Translate offers real-time document translation to seamlessly integrate and accelerate content creation and localization. You can submit a document from the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK and receive the translated document in real time while maintaining the format of the original document. This feature eliminates the wait for documents to be translated in asynchronous batch mode.

Real-time document translation currently supports plain text and HTML documents. You can use other Amazon Translate features such as custom terminology, profanity masking, and formality as part of the real-time document translation.

In this post, we will show you how to use this new feature.

Solution overview

This post walks you through the steps required to use real-time document translation with the console, AWS CLI, and Amazon Translate SDK. As an example, we will translate this sample text file from English to French.

Use Amazon Translate via the console

Follow these steps to try out real-time document translation on the console:

  1. On the Amazon Translate console, choose Real-time translation in the navigation pane.
  2. Choose the Document tab.
  3. Specify the language of the source file as English.
  4. Specify the language of the target file as French.

Note: Source or Target language should be English for real-time document translation.

  1. Select Choose file and upload the file you want to translate.
  2. Specify the document type.

Text and HTML formats are supported at the time of this writing.

  1. Under Additional settings, you can use other Amazon Translate features in conjunction with real-time document translation.

For more information about Amazon Translate features, refer to the following resources:

  1. Choose Translate and Download.

The translated file is automatically saved to your browser’s downloaded folder, usually to Downloads. The target language code will be prefixed to the translated file’s name. For example, if your source file name is lang.txt and your target language is French (fr), then the translated file will be named fr.lang.txt.

Use Amazon Translate with the AWS CLI

You can translate the contents of a file using the following AWS CLI command. In this example, the contents of source-lang.txt will be translated into target-lang.txt.

aws translate translate-document --source-language-code en --target-language es 
--document-content fileb://source-lang.txt 
--document ContentType=text/plain 
--query "TranslatedDocument.Content" 
--output text | base64 
--decode > target-lang.txt

Use the Amazon Translate SDK (Python Boto3)

You can use the following Python code to invoke Amazon Translate SDK API to translate text or HTML documents synchronously:

import boto3
import argparse

# Initialize parser
parser = argparse.ArgumentParser()
parser.add_argument("SourceLanguageCode")
parser.add_argument("TargetLanguageCode")
parser.add_argument("SourceFile")
args = parser.parse_args()


translate = boto3.client('translate’)

localFile = args.SourceFile
file = open(localFile, "rb")
data = file.read()
file.close()


result = translate.translate_document(
    Document={
            "Content": data,
            "ContentType": "text/html"
        },
    SourceLanguageCode=args.SourceLanguageCode,
    TargetLanguageCode=args.TargetLanguageCode
)
if "TranslatedDocument" in result:
    fileName = localFile.split("/")[-1]
    tmpfile = f"{args.TargetLanguageCode}-{fileName}"
    with open(tmpfile,  'w', encoding='utf-8') as f:
     
    f.write(str(result["TranslatedDocument"]["Content"]))

    print("Translated document ", tmpfile)

This program accepts three arguments: source language, target language, and file path. Use the following command to invoke this program:

python syncDocumentTranslation.py en es source-lang.txt

Conclusion

The real-time document translation feature in Amazon Translate can expedite time to market by enabling easy integration with content creation and localization. Real-time document translation improves content creation and the localization process.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs.


About the Authors

Sathya Balakrishnan is a Senior Consultant in the Professional Services team at AWS, specializing in data and ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

RG Thiyagarajan is a Senior Consultant in Professional Services at AWS, specializing in application migration, security, and resiliency with US federal financial clients.

Sid Padgaonkar is the Senior Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends, you will find him playing squash and exploring the food scene in the Pacific Northwest.