AWS Machine Learning Blog
Protagonist adopts Amazon Translate to expand analytics to multilingual content
This is a guest blog post by Bryan Pelley, COO of Protagonist. Protagonist, in their own words “helps organizations communicate more effectively through a data-driven understanding of public discourse.”
Protagonist is a pioneer of the art and science of understanding narratives. We define narratives as the beliefs that an audience holds that are composed of an interrelated set of concepts, themes, images, and ideas that coalesce into a story. Narratives matter because they reflect the deeply held needs, wants, and desires that weigh heavily, both consciously and unconsciously, on human decision-making. Using Amazon Translate, Protagonist can analyze narratives in languages other than English, which enables us to win global customers.
The Protagonist Narrative Analytics platform uses natural language processing (NLP) and machine learning (ML), guided by human expertise, to surface, measure, and track the narratives that matter to our customers across traditional, social, and other types of online media. The following diagram illustrates our Narrative Analytics solution.
Protagonist has been limited, with a few exceptions, to analyzing English-only content, which we’ve seen as a limitation on the long-term growth of our business. Numerous customers and prospective customers have expressed serious interest in projects involving international narratives. To create these narratives we would need to work with native language content.
In the past, we were able to do a small number of projects in foreign languages, primarily French and Spanish, thanks to fluent speakers on staff. In these cases, our team would either run the analysis on the content without translation, which limited the range of NLP tools we were able to use. Or, we manually translated a sample set of the overall corpus of content and ran our full suite of tools on the translated set. Sometimes we used a combination of both processes. However, this staff-based manual solution didn’t scale, and it was not efficient. Manually translating a sample of 1,000 media articles took about two weeks. This was a significant delay in providing timely narrative analysis to our customers.
Amazon Translate has changed that for us, enabling us to quickly and effectively translate multilingual content into English for analysis on our narrative platform. We tried a few other machine translation services in the past, but were unhappy with the performance, cost, and, in some cases, the requirement to commit to a long-term contract. Amazon Translate gives us the right combination of speed, accuracy of translation, cost effectiveness, and on-demand flexibility to meet our needs. What used to take two weeks or more to translate now can be done in minutes using Amazon Translate.
We piloted the Amazon Translate service in 2018 on a project for one of our customers, Omidyar Network (ON). One of ON’s major focus areas is property rights. They want to address the fact that a large percentage of the world’s population has limited or nonexistent protections for their property and resources. Naturally, to address a global issue like this, ON wants to understand the narratives that local populations around the world have about their rights, or lack of rights, to land and other property. Using international English language media sources, we were able to help ON gain an understanding of the narratives at play. As the following illustration indicates, analysis of English-only content showed that property rights narratives differed significantly by region, which prompted a desire for a deeper analysis of content in local native languages. For this reason, we saw ON’s property rights work as an ideal place to test Amazon Translate.
Peter Rabley, Venture Partner at Omidyar Network, describes their property rights efforts and the role of Protagonist:
“More than one billion people around the world lack legal rights to their land and property. However, it’s an issue that not enough people pay attention to because it seems too complicated, too complex to wrap your head around. We believe that by simplifying language and telling human interest stories, those in the field can raise greater awareness of the need—sparking innovative solutions, more financing and greater overall engagement. We needed a way to see what the initial conversations around property rights looked like globally in more than one language, and understand how better storytelling may have impacted those conversations over time. This is what Protagonist’s Narrative Analytics allows us to do, helping underscore the value of our investments and unlocking valuable insights for all of us working to advance property rights around the world. Importantly, Protagonist has been able to provide its Narrative Analytics in multiple languages including Spanish.”
As Peter notes, we initially chose to work with Amazon Translate on Spanish language content. We had experience working with Spanish content in the past and access to fluent Spanish speakers, so we could double-check the Amazon Translate outputs and easily identify and troubleshoot issues as they arose. Ultimately, that was not needed because the accuracy of the translations performed by the Amazon Translate service performed was high.
The performance of the Amazon Translate service met or exceeded our expectations. Initially, the API’s rate limit caused some concurrency issues for us because we kept unknowingly exceeding the limit. Since our pilot of the Amazon Translate service, AWS has added a metrics dashboard to the AWS Management Console that makes it easy for us to know if we’re exceeding the rate limit and make adjustments as necessary.
We noticed that AWS has been very thoughtful in keeping Amazon Translate API parameters very flexible, so that when languages are added we can easily integrate the newly supported languages in our data workflows. Specifically, AWS keeps the Python package Boto3 very stable, which allows us to update to the latest version of Boto3 without the worry of breaking existing functionality.
Overall, using Amazon Translate provided several advantages over our previous human-based translation solution. We were able to eliminate the need for time-consuming manual translation. Amazon Translate was able to complete in a matter of minutes translation tasks that would have taken us 60 hours or more in the past. This meant we could expand the amount of content we analyzed with our full suite of tools from a sample of a few hundred articles to tens or hundreds of thousands of articles. We were able to effectively leverage our NLP tools that were trained with only English language corpuses, such as Narrative Richness, cluster analysis, sentiment scoring, and topic modeling. The ability to accurately analyze large amounts of foreign language content using our English-language-trained NLP tools on the translated materials represents a significant cost and time savings for us.
But perhaps most importantly, Amazon Translate provides cost-effective access to a range of languages that we haven’t been able to work with before, including Arabic, Chinese, and Russian. This opens up a wide range of customers and opportunities that we couldn’t have supported before. We’re in active discussions with several large customers on global narrative projects that would make extensive use of the Amazon Translate capabilities. We’re excited to continue working with Amazon Translate and exploring the new opportunities that the service brings.