AWS Machine Learning Blog

Translate multiple source language documents to multiple target languages using Amazon Translate

Enterprises need to translate business-critical content such as marketing materials, instruction manuals, and product catalogs across multiple languages to communicate with a global audience of customers, partners, and stakeholders. Identifying the source language in each document before calling a translate job creates complexities and adds another step to your workflow. For example, an international product company with its customer support operations located in their corporate office requires their agents to translate emails or documents to support customer requests. Previously, they had to set up workflows to identify dominant language in each document, group them by language type, and set up a batch translate job for each source language. Now, Amazon Translate’s automatic language detection feature for batch translation jobs allows you to translate a batch of documents in various languages with a single translate job. This removes the need for you to orchestrate the document translate workflow that required dominant language identification and grouping. Amazon Translate also allows translation to multiple target languages for translation (up to 10 languages). A single translation job can translate documents to multiple target languages. This feature eliminates the need to create separate batch jobs for individual target languages. Customers can now create documentation in multiple languages, all with a single API call.

In this post, we demonstrate how to translate documents into multiple target languages in a batch translation job.

Solution overview

Automatic detection of source language for batch translate jobs allows you to translate documents written in various supported languages in a single operation. You can also provide up to 10 languages as targets. The job processes each document, identifies the dominant source language, and translates it to the target language. Amazon Translate uses Amazon Comprehend to determine the dominant language in each of your source documents, and uses it as the source language.

In the following sections, we demonstrate how to create a batch translation job via the AWS Management Console or the AWS SDK.

Create a batch translation job via console

In this example, we configure Amazon Translate batch translation to automatically detect the source language and translate it to English and Hindi, using the input and output Amazon Simple Storage Service (Amazon S3) bucket locations provided.

create translation job

Next, we create an AWS Identity and Access Management (IAM) role that gets provisioned as part of the configuration. The role is given access to the input and output S3 buckets.

After the job is created, you can monitor the progress of the batch translation job in the Translation jobs section.

translation jobs section

When the translation job is complete, you can navigate to the output S3 bucket location and observe that the documents have been translated to their target language. Our input consisted of two files, sample-doc.txt and sample-doc-2.txt, in two different languages. Each document was translated into two target languages, for a total of four documents.

output S3 bucket

Create a batch translation job via the AWS SDK

The following Python Boto3 code uses the batch translation call to translate documents in your source S3 bucket. Specify the following parameters:

  • InputDataConfig – Provide the S3 bucket location of your input documents
  • OutputDataConfig – Provide the S3 bucket location of your output documents
  • DataAccessRoleArn – Create an IAM role that gives Amazon Translate permission to access your input and output S3 buckets
  • SourceLanguageCode: Use auto
  • TargetLanguageCodes: Choose up to 10 target languages
import boto3

client = boto3.client('translate')


def lambda_handler(event, context):

    response = client.start_text_translation_job(
        JobName='auto-translate-multi-language-sdk',
        InputDataConfig={
            'S3Uri': 's3://<<REPLACE-WITH-YOUR-INPUT-BUCKET>>/input-sdk',
            'ContentType': 'text/plain'
        },
        OutputDataConfig={
            'S3Uri': 's3://<<REPLACE-WITH-YOUR-OUTPUT-BUCKET>>/output-sdk',
        },
        DataAccessRoleArn='<<REPLACE-WITH-THE-IAM-ROLE-ARN>>',
        SourceLanguageCode='auto',
        TargetLanguageCodes=[
            'en', 'hi'
        ]
    )

Clean up

To clean up after using this solution, complete the following steps:

  1. Delete the S3 buckets that you created.
  2. Delete IAM roles that you set up.
  3. Delete any other resources that you set up for this post.

Conclusion

With today’s need to have a global reach with limited resources, Amazon Translate helps you simplify your multi-language processing workflows. With the introduction of automatically detecting the dominant language in your source document for batch translation jobs, and translating them to up to 10 target languages, you can focus on your business logic rather than dealing with the operational burden of sorting documents and managing multiple batch translation jobs.

We strive to add features to our service that make it easier for our customers innovate. Try this solution and let us know how this helped simplify your document processing workloads.


About the authors

Kishore Dhamodaran is a Senior Solutions Architect at AWS. Kishore helps strategic customers with their cloud enterprise strategy and migration journey, leveraging his years of industry and cloud experience.

Sid Padgaonkar is the Sr. Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends you will find him playing squash and exploring the food scene in the Pacific NW.