The Wall Street Journal Empowers Readers with Search Tool Using Amazon Kendra
As the 2020 US presidential election approached, The Wall Street Journal (WSJ) wanted to empower readers to more easily access and understand what candidates said, as well as candidates’ positions on topics that matter to readers. A June 5, 2019, Pew Research Center study found that almost 80 percent of Americans say they have checked the facts in news stories themselves to find the original source of information.
The WSJ’s Product and Technology team turned to Amazon Web Services (AWS) to build a new customer experience. Through collaboration with a global team of experts from the AWS Digital Innovation program and AWS Professional Services, a global team of experts that can help businesses realize their desired outcomes on AWS, the WSJ team was able to accelerate the development of Talk2020, an intelligent search tool that helps readers quickly search and analyze 30 years of public statements made by presidential candidates. It enables deeper investigation into issues over time by exploring speech patterns and performing text analyses. Key to WSJ’s success was its use of Amazon Kendra, a highly accurate intelligent search service powered by machine learning.
AWS helped us build a solution that met our timelines. Having that direct access to experts enabled us to put the right services around Amazon Kendra and deliver the level of quality that we wanted.”
Head of WSJ Technology and Architecture
Empowering Readers with Accurate Information
The Wall Street Journal is a global news organization that provides news, information, commentary, and analysis, engaging readers across print, digital, mobile, social, audio, and video platforms. Building on its heritage as a source of global business and financial news, WSJ includes coverage of US and world news, politics, arts, culture, lifestyle, sports, and health and holds 38 Pulitzer Prizes for outstanding journalism. To that end, it saw an opportunity to deliver functionality and reach new audience members by enabling readers to explore a database of transcripts. “We wanted to build something that readers could use to look up what Joe Biden, Donald Trump, and their running mates said verbatim and draw their own conclusions,” says Dion Bailey, VP, Head of WSJ Technology and Architecture.
WSJ journalists writing investigative stories already use Factiva, Dow Jones’s global news database, for research and fact-checking. Factiva aggregates content from more than 32,000 sources and enables users to search by free text, region, subject, author, and metadata. The WSJ’s R&D team had worked with journalists in Washington, DC, to build an effective search tool for these transcripts. With Talk2020, the WSJ wanted to make this tool simpler to use and available to a broader audience to help inform their decision-making during the 2020 presidential election. The publication wanted readers to be able to pose natural language questions—such as “What did Trump say about healthcare?”—and receive results that directly answered them. A well-structured solution would also have the potential to increase site traffic and attract new subscribers.
The WSJ team, which was already using AWS, regularly engaged AWS Professional Services during the build through daily stand-ups, weekly meetings, and architectural deep dives. “AWS helped us build a solution that met our timelines,” says Bailey. “Having that direct access to experts enabled us to put the right services around Amazon Kendra and deliver the level of quality that we wanted.”
Marrying Content Strategy and Product Strategy
Using AWS, the WSJ team quickly built Talk2020 and met its goal of launching in September 2020, prior to the first presidential debate. The solution used Amazon Kendra to provide reliable enterprise search capabilities. “The fact that Amazon Kendra could do the natural language processing in real time was a big draw for us,” says Bailey. The search solution’s front end consisted of an API gateway and Amazon CloudFront, a fast, highly secure, and programmable content delivery network. When users conduct a search, Amazon Kendra returns with an identified topic and related quotes that are then further augmented by cross-referencing the identification with the cleaned Factiva transcripts stored in Amazon DynamoDB—a NoSQL database service that supports key-value and document data structures.
Managing data flow between AWS services is AWS Lambda, a serverless compute service that lets users run code without provisioning or managing servers. “We had to create an ingestion layer between Factiva and the data layer,” says Bailey. AWS Lambda functions trigger requests to cleanse and format the transcripts—identifying quotes, the speaker, and the topic—before sending them to Amazon Kendra and Amazon DynamoDB. “Relying on Lambda functions for those tasks means we can shut down the process when we’re not using them, so it’s cost efficient,” adds Bailey.
Data from the Talk2020 tool showed spikes in usage during and after the presidential debates, the vice presidential debate, and town hall events. Many people even used the search tool as a second screen during debates to research statements that candidates had made in the past. Engagement with Talk2020 was strong, with individual users often asking multiple questions and browsing several topics during the same visit. “That shows us we created a tool that met our readers’ needs, and we have an opportunity to keep experimenting with new ways to engage our users,” says Bailey.
Inspiring Future Intelligent Search Use Cases
Engaging the AWS team and using innovative services like Amazon Kendra helped WSJ launch Talk2020 in just 5 months, driving site traffic, encouraging engagement, and attracting new subscribers. “The AWS team was available anytime we needed,” says Bailey, “and it helped us resolve every issue that arose.”
About The Wall Street Journal
Founded in 1889 and owned by Dow Jones & Company, The Wall Street Journal is a New York–based global news organization focusing on business, finance, economics, and global forces. It engages readers across print, digital, mobile, social, audio, and video platforms. Winner of over three dozen Pulitzer Prizes, The Wall Street Journal’s circulation is in the millions.
Benefits of AWS
- Launched Talk2020 search tool in 5 months
- Created a search tool with natural language processing
- Increased engagement
AWS Services Used
Amazon Kendra is an intelligent search service powered by machine learning. Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization.
Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment.
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multi-region, multi-active, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.
AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.