AWS Machine Learning Blog
Introducing Amazon Translate Custom Terminology
Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Today, we are introducing Custom Terminology, a feature that customers can use to customize Amazon Translate output to use company- and domain-specific vocabulary. By uploading and invoking Custom Terminology with translation requests, customers have the ability to ensure that their unique content, such as brand names, character names, and model names, is translated exactly the way they need it, regardless of context and the Amazon Translate algorithm’s decision.
To Illustrate, consider the following example. “Amazon Family” is a collection of benefits that offers Amazon Prime members exclusive offers, such as up to 20% off subscriptions to diapers, baby food, and more. This is very useful if you have a couple of diaper-wearers at home like I do. In France, we call it “Amazon Famille.” If I try to translate “Amazon Family” into French using Amazon Translate without any additional context, I get the output “Famille Amazon.” This is an accurate translation, but it is not what the team in France needs. Now, if I try adding context, for example “Have you ever shopped with Amazon Family?”, the service determines that the program name does not need to be translated, and leaves it as is: “Avez-vous déjà fait des achats avec Amazon Family?”. This is a good translation too but still not what our team is looking for. To solve for this and similar problems, we are introducing the Custom Terminology feature. By adding an entry that says that the term “Amazon Family” should be translated as “Amazon Famille” to their Custom Terminology, the team can make sure that “Amazon Family” is translated into “Amazon Famille,” regardless of context. “Amazon Family” will now be translated into “Amazon Famille” and “Have you ever shopped with Amazon Family?” will now be translated into “Avez-vous déjà fait des achats avec Amazon Famille?”
Why is this important?
All of our customers want accurate and fluent translations regardless of where and how they use Amazon Translate. But some customers tell us that when they use the service to translate company-authored content like product documentation, website strings, functional content, knowledge bases, and help pages, they have another requirement. They need translations to adhere to the company’s specific vocabulary, and in some cases to the industry or domain jargon. In tests we ran, we saw that customizing output with Custom Terminology more than doubled the amount of times the service gets specific terminology right. To our customers, this means more accurate translations that translate (no pun intended) into better engagement with applications built with Amazon Translate powering multilingual content. This means fewer translations that need to be edited by professional translators, thus cutting costs and time to market.
How does it work?
Generally speaking, the engine works as follows: When a translation request comes in, Amazon Translate reads the source sentence, creates a semantic representation of the content (simply put — “understands it”), and generates a translation into the target language word after word.
When a Custom Terminology is invoked as part of the translation request, the engine scans the terminology file before returning the final result. When it identifies an exact match between a terminology entry and a string in the source text, it locates the appropriate string in the proposed translation and replaces it with the terminology entry. In the Amazon Family example, it first generates the translation “Avez-vous déjà fait des achats avec Amazon Family?” but stops and replaces “Amazon Family” with “Amazon Famille”, before providing the response.
When should I use Custom Terminology?
First, note that Amazon Translate is trained on billions of parallel words, from a wide range of domains. As in the Amazon Family example, in many cases, Amazon Translate can distinguish named entities and handle them as required “out of the box”. Second, understand that, at this point, the Custom Terminology feature is an override mechanism. It does NOT train a custom model based on your organization’s terminology. It finds a match and replaces it. It does not transform content in any way, nor does it behave differently depending on the context. For example, in the Amazon Family case, if I had references to the Amazon Family brand and also to the Amazon family of employees (and for some reason the word Family was capitalized in the latter) within the same body of text, applying the terminology would have degraded the translation quality. Therefore, while we do not limit the acceptable types of input, we strongly recommend that users follow the following best practices. Any deviation from them is likely to result in translation quality degradation.
Best practices
- Do keep your terminology minimal. Only include completely unambiguous words that you want to control/preserve. These should be words that you want to be translated in only one way. Ideally, you should limit the list to proper names, like brand names and product names.
- For every term, do include any transformations of the source phrase you want to control for separately. E.g., for plural and possessive in that language (e.g., Amazon, Amazon’s) or capitalization (e.g., AMAZON, amazon).
- Do NOT include different translations for the same source phrase (e.g., entry #1 — EN: Amazon, FR: Amazon, entry #2 – EN: Amazon FR: Amazone).
- Some languages do not change the shape of a word based on sentence context. Applying Custom Terminology per these guidelines is most likely to improve overall translation quality. Other languages have extensive word shape changes. We do NOT recommend applying the feature to those languages, but do not restrict you from doing so. The following list of languages can help guide you:
Languages Compatibility East Asian Languages (e.g., Chinese, Japanese, Korean, Indonesian) Compatible Germanic Languages (German, Dutch, English, Swedish, Danish) Compatible Romance Languages (Italian, French, Spanish, Portuguese) Compatible Hebrew Compatible Slavic Languages (Russian, Polish, Czech) Incompatible Finno-Ugric Languages (Finnish) Incompatible Arabic Incompatible Turkish Incompatible
How do I use it?
Get started with Custom Terminology by reviewing the documentation pages here to understand best practices and the formatting requirements to ensure your files are readable. Then create and upload your terminology using the console or supported SDKs. Once your terminology file is accepted, you can make translation requests to the service coupled with Custom Terminology. When matches are found, the translation results will automatically replace applicable content with terminology entries. For more details, visit the documentation page.
To get started with Amazon Translate go to Getting Started with Amazon Translate or check out this 10 minute video tutorial.
About the Author
Yoni Friedman is a Sr. Technical Product Manager in the AWS Artificial Intelligence team where he leads product management for Amazon Translate. He spends his free time reading, running, playing ball, and doing other stuff his two toddlers ask him to.