Search inside videos using AWS media and AI/ML services

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.

In today’s content-driven world, it’s difficult to search for relevant content without losing productivity. Content always requires metadata to give it context and make it searchable. Tagging is a means to classify content to make it structured, indexed, and—ultimately—useful. Much time is spent on managing content and tagging it.

Manually assigning content tags is possible when you produce little new content or are interested in making content less searchable. However, manual tagging becomes impractical when you are creating lots of new content.

Content gets generated in various formats, such as text, audio, and video. A text-based search engine ranks relevancy based on tags or text inside the content. However, searches on audiovisual (AV) files are based only on the tags on the associated text, and not by what is being said during the playback.

To unfold the true power of AV content, we must use audio dialogue to provide a more granular level of metadata for the content—and make it searchable.

With an AV search solution, you should be able to perform the following tasks:

Tag the dialogue.
Make your AV contents searchable based on the content tagging.
Jump directly to the position in the content where the searched keyword was used.

In this post, I describe how to search within AV files using the following AWS media and AI/ML services:

Amazon Transcribe: An automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capability to your applications.
Amazon Elastic Transcoder: Media transcoding in the cloud. It is designed to be a highly scalable, easy-to-use, and cost-effective way for you to convert (or “transcode”) media files from a source format. You can create versions that play back on devices like smartphones, tablets, and PCs.
Amazon CloudSearch: A managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application.

Other AWS Services:

AWS Lambda: Lets you run code without provisioning or managing servers. You pay only for the compute time that you consume.
Amazon API Gateway: A fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.
Amazon S3: An object storage service that offers industry-leading scalability, data availability, security, and performance.

Solution overview

This solution gets created in two parts:

Ingesting the AV file
- Uploading the MP4 file to an S3 bucket.
- Transcoding the media file into audio format.
- Transcribe the audio file into text.
- Indexing the contents into Amazon CloudSearch.
- Testing the index in CloudSearch.
Searching for content
- Creating a simple HTML user interface for querying content.
- Listing the results and rendering them from an S3 bucket using CloudFront.

Prerequisites

To walk through this solution, you need the following AWS resources:

An S3 bucket (videosearchdemo) with the following folder structure:
/inputaudio
/inputvideo
/static
An CloudFront distribution with the S3 bucket as the origin. Make note of the domain name URL (for example, cloudfront.net).
An S3 bucket policy that allows the contents to be served using the CloudFront distribution only. Use the following policy:

{
   "Version": "2008-10-17",
   "Id": "PolicyForCloudFrontPrivateContent",
   "Statement": [
   {
   "Sid": "1",
   "Effect": "Allow",
   "Principal": {
   "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity xxxxxxxxxxx"
   },
   "Action": "s3:GetObject",
   "Resource": "arn:aws:s3:::<<bucketname>>/*"
   }
   ]
}

An Elastic Transcoder pipeline (VideoSearchDemo). Make a note of the pipeline ID.
A CloudSearch domain (videosearch-domain). Make a note of the search endpoint.
A Lambda execution role that has access to relevant services like Elastic Transcoder, Transcribe, CloudSearch, and S3.

Step 1: Uploading a video to the S3 bucket

Upload a video file in MP4 format into the /inputvideo folder.

Step 2: Transcoding the video into audio format

Create a Lambda function (VideoSearchDemo-Transcoder) with the runtime of your choice.
Associate the Lambda execution role with the function to access the S3 bucket and Amazon CloudWatch Logs.
Add an S3 trigger to the Lambda function on all ObjectCreated events in the /inputvideo folder. For more information, see Event Source Mapping for AWS Services.

Use the following Node.js 6.10 Lambda code to initiate the Elastic Transcoder job and put the transcoded audio file to the /inputaudio folder.

'use strict';
var aws = require("aws-sdk");
var s3 = new aws.S3();

var eltr = new aws.ElasticTranscoder({
region: "us-east-1"
});

exports.handler = (event, context, callback) => {
   console.log('Received event:', JSON.stringify(event, null, 2));

var docClient = new aws.DynamoDB.DocumentClient();
   var bucket = event.Records[0].s3.bucket.name;
   var key = event.Records[0].s3.object.key;

   var pipelineId = "ElasticTranscoder Pipeline ID";
   var audioLocation = "inputaudio";

   var newKey = key.split('.')[0];
   var str = newKey.lastIndexOf("/");
   newKey = newKey.substring(str+1);

     var params = {
   PipelineId: pipelineId,
   Input: {
   Key: key,
   FrameRate: "auto",
   Resolution: "auto",
   AspectRatio: "auto",
   Interlaced: "auto",
   Container: "auto"
   },
   Outputs: [
   {
   Key:  audioLocation+'/'+ newKey +".mp3",
   PresetId: "1351620000001-300010" //mp3 320
   }
   ]
   };

   eltr.createJob(params, function(err, data){
   if (err){
   console.log('Received event:Error = ',err);
   } else {
   console.log('Received event:Success =',data);
   }
   });
};

Step 3: Transcribing the audio file into JSON format

Add another S3 trigger to the Lambda function on all ObjectCreated events for the /inputaudio folder, which invokes the transcription job. Fill in values for the inputaudiolocation and outputbucket.

'use strict';
var aws = require('aws-sdk');
var s3 = new aws.S3();
var transcribeservice = new aws.TranscribeService({apiVersion: '2017-10-26'});

exports.handler = (event, context, callback) => {
   console.log('Received event:', JSON.stringify(event, null, 2));
   var bucket = event.Records[0].s3.bucket.name;
   var key = event.Records[0].s3.object.key;
   var newKey = key.split('.')[0];
   var str = newKey.lastIndexOf("/");
   newKey = newKey.substring(str+1);

    var inputaudiolocation = "https://s3.amazonaws.com/<<bucket name>>/<<input file location>>/";
var mp3URL = inputaudiolocation+newKey+".mp3";
   var outputbucket = "<<bucket name>>";
   var params = {
   LanguageCode: "en-US", /* required */
   Media: { /* required */
   MediaFileUri: mp3URL
   },
   MediaFormat: "mp3", /* required */
   TranscriptionJobName: newKey, /* required */
   MediaSampleRateHertz: 44100,
   OutputBucketName: outputbucket
   };
   transcribeservice.startTranscriptionJob(params, function(err, data){
   if (err){
   console.log('Received event:Error = ',err);
   } else {
   console.log('Received event:Success = ',data);
   }
   });
};

The following is a transcribed JSON file example:

{
   "jobName":"JOB ID",
   "accountId":" AWS account Id",
   "results":{
   "transcripts":[
   {
   "transcript":"In nineteen ninety four, young Wall Street hotshot named Jeff Bezos was at a crossroads in his career"
   }
   ],
   "items":[
   {
   "start_time":"0.04",
   "end_time":"0.17",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"In"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"0.17",
   "end_time":"0.5",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"nineteen"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"0.5",
   "end_time":"0.71",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"ninety"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"0.71",
   "end_time":"1.11",
   "alternatives":[
   {
   "confidence":"0.7977",
   "content":"four"
   }
   ],
   "type":"pronunciation"
   },
   {
   "alternatives":[
   {
   "confidence":null,
   "content":","
   }
   ],
   "type":"punctuation"
   },
   {
   "start_time":"1.18",
   "end_time":"1.59",
   "alternatives":[
   {
   "confidence":"0.9891",
   "content":"young"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"1.59",
   "end_time":"1.78",
   "alternatives":[
   {
   "confidence":"0.8882",
   "content":"Wall"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"1.78",
   "end_time":"2.01",
   "alternatives":[
   {
   "confidence":"0.8725",
   "content":"Street"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"2.01",
   "end_time":"2.51",
   "alternatives":[
   {
   "confidence":"0.9756",
   "content":"hotshot"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"2.51",
   "end_time":"2.75",
   "alternatives":[
   {
   "confidence":"0.9972",
   "content":"named"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"2.76",
   "end_time":"3.07",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"Jeff"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"3.08",
   "end_time":"3.56",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"Bezos"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"3.57",
   "end_time":"3.75",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"was"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"3.75",
   "end_time":"3.83",
   "alternatives":[
   {
   "confidence":"0.9926",
   "content":"at"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"3.83",
   "end_time":"3.88",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"a"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"3.88",
   "end_time":"4.53",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"crossroads"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"4.53",
   "end_time":"4.6",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"in"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"4.6",
   "end_time":"4.75",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"his"
   }
   ],
   "type":"pronunciation"
   },
   {
   "start_time":"4.75",
   "end_time":"5.35",
   "alternatives":[
   {
   "confidence":"1.0000",
   "content":"career"
   }
   ],
   "type":"pronunciation"
   }
   "status":"COMPLETED"
   }

Step 4: Indexing the contents into Amazon CloudSearch

The outputbucket value from the previous step is the S3 bucket parent folder that parses the JSON output of the transcription job. Use that value to ingests the JSON output into the Amazon ES search collection. Add another S3 trigger for the Lambda function on all ObjectCreated events for that folder.

'use strict';
var aws = require("aws-sdk");
const https = require("https");
var s3 = new aws.S3();

exports.handler = (event, context, callback) => {
console.log('Received event:', JSON.stringify(event, null, 2));
var cloudsearchdomain = new aws.CloudSearchDomain({endpoint: '<<Mention CloudSearch End Point>>'});

   var bucket = event.Records[0].s3.bucket.name;
   var key = event.Records[0].s3.object.key;
   var newKey = key.split('.')[0];
   var str = newKey.lastIndexOf("/");
   newKey = newKey.substring(str+1);

   var params = {
   Bucket: bucket,
   Key: key
   }

    s3.getObject(params, function (err, data) {
   if (err) {
   console.log(err);
   } else {
   console.log(data.Body.toString()); //this will log data to console
   var body = JSON.parse(data.Body.toString());

   var indexerDataStart = '[';
   var indexerData = '';
   var indexerDataEnd = ']';
   var undef;
   var fileEndPoint = "http://<<CloudFront End Point URL>>/inputvideo/"+newKey+".mp4";

for(var i = 0; i < body.results.items.length; i++) {

   if (body.results.items[i].start_time != undef &&
   body.results.items[i].end_time != undef &&
   body.results.items[i].alternatives[0].confidence != undef &&
   body.results.items[i].alternatives[0].content != undef &&
   body.results.items[i].type != undef &&
   fileEndPoint != undef
   ) {
   if (i !=0){
   indexerData = indexerData + ',';
   }
   indexerData = indexerData + '{\"type\": \"add\",';
   indexerData = indexerData + '\"id\":\"'+i+'\",';
   indexerData = indexerData + '\"fields\": {';
   indexerData = indexerData + '\"start_time\":'+'\"'+body.results.items[i].start_time+'\"'+',';
   indexerData = indexerData + '\"end_time\":'+'\"'+body.results.items[i].end_time+'\"'+',';
   indexerData = indexerData + '\"confidence\":'+'\"'+body.results.items[i].alternatives[0].confidence+'\"'+',';
   indexerData = indexerData + '\"content\":'+'\"'+body.results.items[i].alternatives[0].content+'\"'+',';
   indexerData = indexerData + '\"type\":'+'\"'+body.results.items[i].type+'\"'+',';
   indexerData = indexerData + '\"url\":'+'\"'+fileEndPoint+'\"';
   indexerData = indexerData + '}}';
   }
   }
   var csparams = {contentType: 'application/json', documents : (indexerDataStart+indexerData+indexerDataEnd) };

   cloudsearchdomain.uploadDocuments(csparams, function(err, data) {
   if(err) {
   console.log('Error uploading documents to cloudsearch', err, err.stack);
   } else {
   console.log("Uploaded Documents to cloud search successfully!");
   }
   });
   }
})
};

Step 5: Testing the index in CloudSearch

In the CloudSearch console, on your domain dashboard, validate that the contents are indexed in the CloudSearch domain. The Searchable documents field should have a finite number.

Run a test search on the CloudSearch domain, using a keyword that you know is in the video. You should see the intended results, as shown in the following test search screenshot:

Step 6: Querying contents with a simple HTML user interface

Now it’s time to search for the content keywords. In API Gateway, create an API to query on the CloudSearch domain using the following Lambda code:

var aws = require('aws-sdk');

var csd = new aws.CloudSearchDomain({
endpoint: '<<CloudSearch Endpoint>>',
apiVersion: '2013-01-01'
});

exports.handler = (event, context, callback) => {
   var query;
   if (event.queryStringParameters !== null && event.queryStringParameters !== undefined) {
   if (event.queryStringParameters.query !== undefined &&
   event.queryStringParameters.query !== null &&
   event.queryStringParameters.query !== "") {
   console.log("Received name: " + event.queryStringParameters.query);
   query = event.queryStringParameters.query;
   }
   }
   console.log('Received event: EventName= ', query);

   var params = {
   query: query /* required */
   };

   csd.search(params, function (err, data) {
   if (err) {console.log(err, err.stack);callback(null, null);} // an error occurred
   else {
   var response = {
   "statusCode": 200,
   "body": JSON.stringify(data['hits']['hit']),
   "headers": { "Access-Control-Allow-Origin": "*", "Content-Type": "application/json" }
   };
   callback(null, response);
   } // successful response
   });
};

Deploy and test the API in API Gateway. The following screenshot shows an example execution diagram.

Host a static HTML user interface component in S3 with public access under the /static folder. Use the following code to build the HTML file, replacing the value for api_gateway_url with the CloudFront URL.

When you test the page, enter a keyword in the search box and choose Search. The CloudSearch API is called.

<!DOCTYPE html>
<html lang='en'>
  <head>
   <meta charset='utf-8'>
   <meta http-equiv='X-UA-Compatible' content='IE=edge'>
   <meta name='viewport' content='width=device-width, initial-scale=1'>
   <title>CloudSearch - Contents</title>
   <link rel='stylesheet' href='https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css' integrity='sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u' crossorigin='anonymous'>
  </head>
  <body>
   <div class='container'>
   <h1>Video Search - Contents</h1>
   <p>This is video search capability with CloudSearch, Lambda, API Gateway and static web hosting </p>

   <div class="form-group">
   <label for="usr">Search String:</label>
   <input type="text" class="form-control" id="query">
   <button id="search">Search</button>

</div>

   <div class='table-responsive'>
   <table class='table table-striped' style='display: none'>
   <tr>
   <th>Content</th>
   <th>Confidence</th>
   <th>Start Time</th>
   <th>Video</th>
   </tr>
   </table>
   </div>
   </div>
   <script src='https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js'></script>
   <script>

$(search).click(function() {
   $('table.table').empty();
   var query = $( "#query" ).val();
   if (!query) {
   alert('Please enter search string');
   return false;
   }
   var api_gateway_url = 'https://<<Mention API Endpoint>>/prod/?query='+query;
   var rows = [];
   $.get( api_gateway_url, function( data ) {
   if (data.length>0){
   rows.push(` <tr> \
   <th>Content</th> \
   <th>Confidence</th> \
   <th>Start Time</th> \
   <th>Video</th> \
   </tr> \
   <hr> `);

   data.forEach(function(item) {
   console.log('your message'+item);
   var start = item['fields']['start_time'];
   var source = item['fields']['url']+"#t="+start;
   rows.push(`<tr> \
   <td>${item['fields']['content']}</td> \
   <td>${item['fields']['confidence']}</td> \
   <td>${item['fields']['start_time']}</td> \
   <td><video controls id="myVideo" width="320" height="176"><source src=${source} type="video/mp4"></video> </td> \

   </tr>`);
   });
   // show the now filled table and hide the "loading" message
   $('table.table').append(rows.join()).show();
   }

});

   });
   </script>
  </body>
</html>

Step 6: Listing the resulting videos and rendering them from the S3 location using CloudFront

The results page should look something like the following:

The query response JSON has a link to the CloudFront video and the position in video where the keyword was mentioned. The following HTML video control tag, already included in the static HTML code earlier, lets you invoke the video from the point where the keyword was mentioned.

var source = item['fields']['url']+"#t="+start;
<td><video controls id="myVideo" width="320" height="176"><source src=${source} type="video/mp4"></video> </td> \

Cleanup

To avoid incurring future costs after you’re done with this walkthrough, delete the resources that you created:

For AWS services and resources that invoke your Lambda function directly, first delete the trigger in the service where you originally configured it. For this example, delete the Lambda function trigger in API Gateway. Then, delete the Lambda function in the Lambda console.

Conclusion

This post showed you how to build an AV search solution using various AWS media and AI/ML services such as Transcribe, Elastic Transcoder, CloudSearch, Lambda, S3, and CloudFront.

You can easily integrate the solution with any existing search functionality. Give your viewers the ability to search within AV files and find relevant audio content without losing productivity.

Some representative use cases for this solution could include the following:

EdTech content search. The education tech industry publishes the majority of its content in video format for schools, colleges, and competitive exams. Tagging each file is a time-consuming task and may not even be comprehensive. This video search functionality could improve the productivity of both content authors and end users.
Customer service and sentiment analysis. The most common use of media analytics is to mine customer sentiment to support marketing and customer service activities. Customer service call recording gives great insights to customer sentiments. This solution can improve the productivity of call center agents by giving them ability to search historic audio contents and learn how to provide a better customer experience.

If you have any comments or questions about this post, please let us know in the comments.

AWS for M&E Blog