AWS for M&E Blog

Add Hive content moderation to your Amazon IVS video streams

Amazon Interactive Video Service (Amazon IVS), a managed live streaming solution that is designed to provide quick and easy setup to let you build interactive video experiences, handles interactive video content from ingestion to delivery. With the IVS Player SDK, it is easier than ever for platforms of all sizes to integrate low-latency live streaming capabilities into their applications.

Platforms of all types have expressed a growing need for content moderation to protect their users, their brands, and the brands of their advertisers. And from a moderation perspective, live streaming is the most challenging and unpredictable type of user-generated content. Instead of ready-made content hosted on the web, the action happens in near-real-time. Live video is synchronous, linear, and long form, making it uniquely difficult for people to moderate. Even for platforms that do have human moderators flipping between channels, it is impossible to be everywhere.

Enter Hive, a leading cloud-based artificial intelligence (AI) model provider that is already used by hundreds of the largest content platforms in the world. Hive offers robust machine learning (ML) approaches to moderating user-generated content, including both visual and audio moderation capabilities. This post describes how platforms built on Amazon IVS can use Hive APIs to pass streaming content to moderation models and use model predictions as a real-time signal to automate moderation decisions with some simple starter code.

In this post, we’ll cover how to set up a Hive project, how to submit classification tasks to Hive APIs, and how you might use model responses in enforcement decisions. We also preview Hive’s new Moderation Dashboard as an interface-based alternative for visual moderation.

Prerequisites

Getting started with Hive

Once you have created a project with Hive, log on to your project dashboard to access your API key. The API key is a unique authorization token for your project’s end point, and you will need to include the API key as a header in each request made to the API to submit tasks (protect it like a password!).

Accessing the API key from a Hive project dashboard

On the project overview page as shown in Screenshot 1, select the “API KEY” button in the menu bar to bring up a modal with your project’s API key. Then, you can copy the token into your program. The above screenshot shows where to access the API key in the project dashboard. Note that the API key for each of your Hive projects—visual, audio, and/or Moderation Dashboard—will be unique.

Recording your Amazon IVS stream content to Amazon S3

Amazon IVS provides native solutions for both recording stream content to an Amazon S3 bucket and sampling thumbnails—for example, image frames—from stream video. This provides an easy way to send stream content to Hive APIs for classification.

To start, you’ll need to initiate recording on all stream channels you want to moderate. You can do this with an Amazon IVS console command, through command line interface (CLI) instructions, or by a call to the Amazon IVS API. You can learn how to initiate stream recording to Amazon S3 and customize recording configurations in this IVS guide.

To moderate on thumbnail images extracted from your video streams, initiate thumbnail capture in the recording configuration and specify the interval to sample the video stream. The default interval is 1 image per minute. You can configure the interval up to 12 images per minute for more comprehensive monitoring.

When recording begins, video segments and thumbnail images are saved to the Amazon S3 bucket specified in the recording configuration for the channel. You can find more details on storage contents, storage schemes, and file paths for the auto-record-to-Amazon S3 feature of Amazon IVS in this user guide.

How to submit stream content to Hive moderation

For near-real-time moderation, we recommend submitting live video content as either short video clips—for example, maximum 30 seconds—or as thumbnails sampled from the video stream (visual only). You can find more details on recording stream clips with Amazon IVS capabilities in this demo implementation.

When submitting stream content, you’ll need to provide a public or signed URL for your content that is readable by Hive APIs. We recommend setting up Amazon CloudFront distribution with Hive designated as an open archives initiative (OAI) to provide access to your Amazon S3 bucket as needed. More details are available at the bottom of this user guide.

Then, you can define the following functions to (1) automatically populate requests to Hive APIs with Amazon CloudFront links to your stream content in response to recording events, (2) process Hive’s API response, and (3) shut down streams where inappropriate content is detected.

Code snippet for the example implementation

Screenshot 2 is a code snippet to show how to implement as an example. You can find a full example implementation for these functions in Hive’s documentation. The example calls the Visual Moderation API using Hive’s synchronous submission protocol, which is recommended for near-real-time moderation needs.

Submitting content works similarly for audio moderation and Moderation Dashboard, simply use the correct API key for the project that you want to submit to.

Model responseHive APIs

For a more complete understanding of this solution, we should explain Hive’s moderation models and show the API responses and moderation workflow in more detail.

Usually, Hive does not remove content or ban users itself. Instead, Hive APIs return classification scores from models that describe any sensitive subject matter in your content. Hive customers then use these classification metrics in their own moderation logic designed based on their content policies.

Let’s look at some example API requests and model responses to show how to build this for Amazon IVS video streams.

Visual moderation: Example task and response

Hive’s visual classifier includes a set of submodels, called model heads, that each identify different types of sensitive visual subject matter—for example, explicit content, weapons, or drugs. The model response includes predictions from each model head as a confidence score for each class.

To see this in action, here’s a relatively tame example task that you can run with a short clip from the James Bond movie Die Another Day.

https://d3jyesz2uwgbpr.cloudfront.net/bond-sub30.mp4 

Because we’re submitting a video clip, our backend will split the video into representative frames, sampled at a default rate of 1 FPS, and run the visual model on each frame. Note that Hive’s sample rate for videos is configurable on request.

We’ll look at an abbreviated visual moderation API response object for a couple different frames from the Bond clip, focusing on two relevant visual moderation categories: suggestive content and smoking.

Simplified JSON response from the Hive API after submitting the Bond clip

The Hive API divides the video clip into representative frames and runs the visual model on each frame. For a frame when the character Jinx surfaces in a swimsuit, Hive’s visual model classifies the image into the “suggestive” and “female swimwear” moderation categories with high confidence scores as shown in Screenshot 3.

The above image depicts a frame where the character Jinx first emerges from the water in a swimsuit, approximately 4 seconds into the clip. This frame scores close to zero in ‘general_nsfw’, which captures explicit sexual content, but very highly in the milder ‘general_suggestive’ and ‘yes_female_swimwear’ classes.

Now, here’s the same example response object for another frame, about 22 seconds in, where Mr. Bond takes a puff on his cigar.

Simplified JSON response from the Hive API after submitting the Bond clip

The Hive API divides the video clip into representative frames and runs the visual model on each frame. For a later frame when Bond takes a puff on his cigar, Hive’s visual model classifies the image into the “smoking” moderation category with a high confidence score as shown in Screenshot 4.

No matter Pierce Brosnan’s looks, the fact that he is fully clothed takes scores in the suggestive classes down to near zero. Instead, we receive a high confidence score, 0.96, in the ‘yes_smoking’ class.

As you can see, each model head generates confidence scores independently and scores for each model head sum up to one. We consider a threshold confidence score of 0.90 a good starting point for flagging content in sensitive classes. You can always adjust the threshold that you base your sensitivity on and how the visual model performs on your content data.

NOTE: For clarity, this example truncated many of the classes and scores that would appear in an actual response object. A real API response is longer but takes the same format. You can see a full list of classes recognized by Hive’s visual models here.

For video, the API response object will combine the set of scores for each sampled frame into a single response. In this case, the time response field represents a time stamp for that frame, relative to the start of the clip. You can reproduce the full API response by either

  • populating the request syntax with your visual moderation API key and the URL above and sending a task to Hive;
  • uploading the clip as a task in your project dashboard; or
  • uploading the clip to Hive’s Visual Moderation Demo.

If you instead submit thumbnail images recorded to Amazon S3 rather than clips, the API response will be a bit simpler: you’ll receive a model response object with one set of class scores for each image that you submit, and time is set to 0 by default.

Speech moderation: Example task and response

Hive’s speech moderation API combines our speech-to-text transcription and text moderation functionality into a single end point. Our backend extracts audio from an input video segment, transcribes speech into text, and then sends the transcript to our text moderation model. For audio, the model response object includes a time-stamped transcript of detected speech, as well as classifications for different categories of sensitive speech.

For moderation purposes, you’ll want to focus on the classifications and severity scores of detected speech that are returned in the model response object. Our audio models identify speech in four main classes: ‘sexual’, ‘hate’, ‘violence’, and ‘bullying’. Below is a simplified audio response object from an example task. A full version is available for download at our Audio Moderation demo.

Simplified JSON response from the Hive audio moderation API after submitting a short audio example file (“You are so ugly and lame”)

The Hive API transcribes the audio into a text string and then runs a text classifier on the transcript. The input scores a 2 in the “bullying” class and a 0 in “sexual,” “hate,” and “violence” classes.

As shown in the above image, Hive models will return a separate score corresponding to each class, an integer value ranging from zero (clean) to three (most severe). This allows your moderation actions to align with the level of severity. As an example, speech that scores a three for ‘hate’ might include a racial slur while speech that scores a two might include negative stereotypes without overtly racialized language.

How to process our API responses and take moderation actions

Now we’re ready to look at how to use the Hive API response to moderate your Amazon IVS stream content.

Here’s a simple implementation of moderation logic for visual moderation based on the example task that we sent in above. We’ll simply define restricted classes and check if scores for one of those classes exceeds a threshold of 0.9.

Example code

This example function returns “true” for either of the example responses from the Bond clip: the first frame exceeds the 0.9 threshold in the ‘general_suggestive’ class; the second exceeds the threshold on ‘yes_smoking’.

If our hypothetical content policy allows either of these content types, we can simply omit them from our list of restricted classes. This moderation logic is, of course, fully customizable based on your content policy—for example, you can use different thresholds for different classes of sensitive content or build in different moderation actions for different classes.

If stream content scores highly in restricted classes, we can call the “suspend_stream” function on the channel to stop stream playback and revoke the stream key to prevent reconnection. This uses the Stop Stream API call as an example, but you can also use other Amazon IVS API calls to label stream sessions as containing mature content or ban a user by deleting a channel. Again, you can find a complete, commented version of our example beginning-to-end moderation implementation here.

Alternate approach: Moderating stream video with Moderation Dashboard

Alternatively, Amazon IVS customers can moderate stream video content with Moderation Dashboard, an interface solution for setting up moderation policies.  Sending content to Moderation Dashboard works the same way as the visual and audio API requests shown above—simply use your dashboard API key in the request header instead of a visual or audio API key.

With Moderation Dashboard, there’s no need to directly process Hive’s API response. Instead, you can set up rules, classification thresholds, and enforcement actions directly in the interface. Moderation Dashboard also allows you to route certain content to a review feed for a final decision by your Trust and Safety team. For a more complete walk through of how to configure Moderation Dashboard, check out our separate quickstart guide.

Task viewer in Moderation Dashboard

Moderation Dashboard renders the video and shows any moderation categories flagged in the video clip under “Moderation Results.” The “Moderation Results” section also shows a timestamp for the first occurrence of each moderation class in the video to simplify human moderator review.

As shown in the above image, Moderation Dashboard renders the submitted video and identifies any flagged moderation categories with linked timestamps for the corresponding frames.  And with Amazon IVS, platforms can set up specific callbacks to call the Amazon IVS API directly to take actions, like shutting down a stream or banning a user when your moderation rules are activated or manual review decisions are made by moderators.

Final thoughts

Together, Hive and Amazon IVS make it easy to build robust, automated content moderation solutions for live video into your application. We hope that this guide has helped illuminate what’s possible with Hive models and provides a starting point to customize these solutions to your moderation needs. If you have questions, please feel free to reach out to sales@thehive.ai and we would be happy to help you design your solution.

This blog was co-authored by Kevin Guo, CEO, Hive, Max Chang, Senior Product Manager, Hive, and Jenny Oshima and Chris Zhang at AWS.

Jenny Oshima

Jenny Oshima

Jenny Oshima is a Technical Account Manager for AWS in Northern California. She works with AWS customers to help them build highly reliable, resilient, and cost effective systems and achieve operational excellence for their workloads on AWS. She enjoys writing about various innovative technologies around AWS.

Chris Zhang

Chris Zhang

Chris Zhang is a Solutions Architect for AWS Elemental