C-SPAN Case Study

2017

C-SPAN is a not-for-profit service funded by the United State cable television industry. Its mission is to open Washington, D.C. to the public and make government more transparent by broadcasting and archiving government proceedings. The C-SPAN Archives records eight C-SPAN networks 24/7, including live coverage of the Senate and House of Representatives, as well as non-broadcast, online-only streams. Programs are extensively indexed, creating a unique public resource for education, research, review, and citizen access. Visitors to C-SPAN.org can search more than 200,000 hours of footage by person or phrase. Journalists and fact checkers also use C-SPAN to verify statements and historical timelines.

The Challenge

The C-SPAN Archives is responsible for indexing speakers in C-SPAN video content. In the past, this manual process was performed by a team of two to three indexers. “Because it was so time-consuming, we could only index a few key programs to the level of when specific individuals were speaking on camera,” says Alan Cloutier, technical manager of the C-SPAN Archives.

For several years, the team at C-SPAN Archives had been evaluating a tool that could match an image captured from a video against a collection of known people. In early 2016, C-SPAN developed a tool that could perform such a task, but it was relatively slow. Two human indexers used this tool, plus closed-captioning data, to index faces in about 50 percent of C-SPAN’s footage, at a rate of about one hour of work per hour of video.

Why Amazon Web Services

Amazon Rekognition is a highly scalable image-analysis service. It uses proven deep-learning technology developed by Amazon’s computer-vision scientists to analyze billions of images daily for Amazon Prime Photos. By using Amazon Rekognition’s simple API, companies can add sophisticated visual search and image classification to applications.

Cloutier immediately recognized the potential of Amazon Rekognition for C-SPAN. “I saw Amazon Rekognition mentioned in the keynote for AWS re:Invent 2016, and I was testing it within an hour,” says Cloutier. He searched for “John McCain” and found that Rekognition could pick the right face from a collection of more than 100,000 images with a high degree of certainty.

Within three weeks of the announcement, the team had a working solution using Amazon Rekognition. Only one line of code is required to add an image or do a comparison, so very little development was required. The initial image collection—consisting of 97,000 known individuals—was indexed by Amazon Rekognition in less than two hours.

The solution uploads screen shots taken at six-second intervals from all eight C-SPAN feeds and matches them against an image collection. The six-second metric was chosen because it is common for the same speaker to appear on camera for extended periods of time in C-SPAN videos. To reduce costs, shot detection is used to eliminate analysis of duplicate images.

The team also uses Amazon Simple Queue Service (Amazon SQS), a fully managed message-queuing service, to communicate when a new image is ready to be checked. Amazon SQS is triggered by Amazon Simple Storage Service (Amazon S3) whenever an image is uploaded. By selecting an image-matching confidence interval of 96 percent, the C-SPAN Archives team has been able to achieve a high degree of accuracy.

The Benefits

“We weren't expecting the high degree of facial-recognition accuracy we’re getting,” says Cloutier. “It’s very exciting—and setting up Amazon Rekognition was shockingly easy.” says Cloutier. C-SPAN found that the service was as accurate as the human indexers but at least twice as fast, which will enable C-SPAN to index 100 percent of its live content. “Before using Rekognition, it took an hour of labor to index an hour of video,” says Cloutier. “We should be able to get that down to an hour of video in 20 minutes of work.”

C-SPAN is already examining additional use cases for Amazon Rekognition, such as employing the Label Detect feature to identify posters and other exhibits shown during legislative and judicial proceedings. Additionally, C-SPAN plans to use Rekognition to index 100,000 hours of archive footage.

About C-SPAN

C-SPAN is a not-for-profit service funded by the United State cable television industry.

Benefits of AWS

More than doubled the amount of video that could be indexed annually, from 3,500 to 7,500 hours
Reduced labor required to index an hour of video from 1 hour to 20 minutes
Uploaded 97,000 images in less than two hours
Highly accurate facial recognition with unlimited scalability at affordable cost
Deployed in less than three weeks

AWS Services Used

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon Rekognition

Easily add intelligent image and video analysis to your applications.

Learn more >>

Amazon SQS

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications.