AWS Media Blog

Start Building: Make Content Archives Searchable by On-camera Talent, Dialog, Activity, and More With This New Solution from AWS

For news, sports, and other content owners with vast media archives, your content library is only as valuable as it is discoverable and searchable. For example, a sports producer looking to create a highlight reel around a specific player would need to sit through hours of game tape to find the specific moments of that one player. Maybe they can watch footage on fast forward to speed things up, but if they also want to capture soundbites from the sportscasters, then they also have to listen as well as watch.

Ideally these producers would want to go right to the point of the game where the player did an exciting play that caused the crowd and sportscasters to go wild! Until recently, however, generating metadata such as spoken dialog as text or appearance of a specific celebrity on camera was a highly manual, time-consuming process.

Through machine learning however, content owners and distributors can automate many of these processes. Over the past two years, AWS has launched several individual services to tackle specific metadata tasks such as:

  • Amazon Rekognition for object and scene detection, on-field player pathing, and celebrity recognition
  • Amazon Transcribe to convert spoken dialog (currently for English and Spanish) into text
  • Amazon Comprehend to discover insights and relationships in text

We are happy to announce the launch of our Media Analysis Solution bringing these individual services together to automate metadata extraction from video, audio, and text. With one-click deployment, your developers can start running video or audio files to automatically generate JSON files for time-coded speech-to-text and people, object, and action pathing. These JSON files can then plugged into your production process for subtitling or your digital asset management platform to make them searchable (e.g., go to exactly the part of the video where a certain actor says a specific line).

Testing is easy. You only pay for the services you use. For 1080p HD video, cost is less than $8 per hour of footage run through the Media Analysis Solution to generate time-coded people/object tags and the speech-to-text transcript. This solution is available in various sizes based on the amount of metadata that will be indexed for your media library. These sizes range from $300-$1000/month, which primarily consists of Amazon Elasticsearch service compute and storage for indexing, searching, and storing media metadata. Check out our solution page below:

Try the Media Analysis Solution