The Wall Street Journal Empowers Readers with Search Tool Using Amazon Kendra
Overview
Customer Quote
Dion Bailey
Head of WSJ Technology and Architecture

Empowering Readers with Accurate Information
The Wall Street Journal is a global news organization that provides news, information, commentary, and analysis, engaging readers across print, digital, mobile, social, audio, and video platforms. Building on its heritage as a source of global business and financial news, WSJ includes coverage of US and world news, politics, arts, culture, lifestyle, sports, and health and holds 38 Pulitzer Prizes for outstanding journalism. To that end, it saw an opportunity to deliver functionality and reach new audience members by enabling readers to explore a database of transcripts. “We wanted to build something that readers could use to look up what Joe Biden, Donald Trump, and their running mates said verbatim and draw their own conclusions,” says Dion Bailey, VP, Head of WSJ Technology and Architecture.
WSJ journalists writing investigative stories already use Factiva, Dow Jones’s global news database, for research and fact-checking. Factiva aggregates content from more than 32,000 sources and enables users to search by free text, region, subject, author, and metadata. The WSJ’s R&D team had worked with journalists in Washington, DC, to build an effective search tool for these transcripts. With Talk2020, the WSJ wanted to make this tool simpler to use and available to a broader audience to help inform their decision-making during the 2020 presidential election. The publication wanted readers to be able to pose natural language questions—such as “What did Trump say about healthcare?”—and receive results that directly answered them. A well-structured solution would also have the potential to increase site traffic and attract new subscribers.
The WSJ team, which was already using AWS, regularly engaged AWS Professional Services during the build through daily stand-ups, weekly meetings, and architectural deep dives. “AWS helped us build a solution that met our timelines,” says Bailey. “Having that direct access to experts enabled us to put the right services around Amazon Kendra and deliver the level of quality that we wanted.”
Marrying Content Strategy and Product Strategy
Using AWS, the WSJ team quickly built Talk2020 and met its goal of launching in September 2020, prior to the first presidential debate. The solution used Amazon Kendra to provide reliable enterprise search capabilities. “The fact that Amazon Kendra could do the natural language processing in real time was a big draw for us,” says Bailey. The search solution’s front end consisted of an API gateway and Amazon CloudFront, a fast, highly secure, and programmable content delivery network. When users conduct a search, Amazon Kendra returns with an identified topic and related quotes that are then further augmented by cross-referencing the identification with the cleaned Factiva transcripts stored in Amazon DynamoDB—a NoSQL database service that supports key-value and document data structures.
Managing data flow between AWS services is AWS Lambda, a serverless compute service that lets users run code without provisioning or managing servers. “We had to create an ingestion layer between Factiva and the data layer,” says Bailey. AWS Lambda functions trigger requests to cleanse and format the transcripts—identifying quotes, the speaker, and the topic—before sending them to Amazon Kendra and Amazon DynamoDB. “Relying on Lambda functions for those tasks means we can shut down the process when we’re not using them, so it’s cost efficient,” adds Bailey.
Data from the Talk2020 tool showed spikes in usage during and after the presidential debates, the vice presidential debate, and town hall events. Many people even used the search tool as a second screen during debates to research statements that candidates had made in the past. Engagement with Talk2020 was strong, with individual users often asking multiple questions and browsing several topics during the same visit. “That shows us we created a tool that met our readers’ needs, and we have an opportunity to keep experimenting with new ways to engage our users,” says Bailey.
Inspiring Future Intelligent Search Use Cases
Engaging the AWS team and using innovative services like Amazon Kendra helped WSJ launch Talk2020 in just 5 months, driving site traffic, encouraging engagement, and attracting new subscribers. “The AWS team was available anytime we needed,” says Bailey, “and it helped us resolve every issue that arose.”