How-to: Capturing and amplifying user engagement through real-time sentiment analysis

Often times in the evening, I come across a random television commercial that has nothing to do with my interests and it totally missed the mark. Several weeks back, I began thinking, what if there was a way for advertisers to be able to dynamically adjust content based on how I was feeling? What if there was a way for advertisements to be targeted based on your mood?

In 2017, Adobe hosted a panel of experts on the “Future of Advertising” during the Advertising Week conference in New York City. The major area of focus for the panelists was discussing the future impact of targeted advertisements and personalization. Kelly Andresen, Senior Vice President at USA Today said, “Agencies, brands and publishers alike, will all have teams of psychologists on staff to really take on understanding of what is that human connection, what is the emotion.” The panel agreed by 2022 media firms will need to rely on artificial intelligence, specifically facial recognition to provide sentiment analysis that deliver advertisements that resonate with their target audiences.

“Sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event.” —Wikipedia

Through extensive research, Netflix has found that their users on average spend 1.8 seconds when considering what content to watch. This year, they are on track to spend upwards of $8B on content. If their users are truly judging a book by its cover then they’ll need a creative solution to ensure their content is being properly showcased. Today, they use sentiment analysis via computer vision to determine which image from their content is most likely to resonate with their audience when choosing content in the Netflix user interface. This spring, I gave a talk about sentiment analysis and how AWS can help customers address this requirement.

In this blog, I am going to share with you some ways to work with an off-the-shelf camera to capture sentiment near real-time.

Solution Components

Amazon Rekognition

Amazon Rekognition is a fully managed AWS service that has been developed by our computer vision scientists that allows you to extract contextual metadata from a video or image. It is extremely easy to use and it provides scene detection, facial recognition, facial analysis, person tracking, unsafe content detection, and much more. We will be using Rekognition to provide sentiment analysis for our content.

Camera Capture

Amazon Rekognition requires an image or a video asset to begin extracting metadata. The camera you use to perform this job is ultimately up to you but I was looking for a camera with simple setup and a fully documented REST API, like AWS DeepLens. For the sake of this post I am going to be using the Amcrest ProHD 1080P (~$70 USD on Amazon.com) for the ease of setup (not covered in this post) as well as this camera has a fully documented REST API to help capture content and other interesting features such as motion detection, audio output, and the ability to capture video to a file share. [Editor’s note: Links provided for convenience and should not be construed as an endorsement of Amcrest]

Code

To simplify interacting with both the Amcrest camera and Rekognition, I’ve gone ahead and written my own custom Python classes that ultimately make interacting with the camera less than 10 lines of code.

from amcrest import AmcrestCamera
from rekognition import Rekognition
import boto3
import json

camera = AmcrestCamera("192.168.86.107", "admin", "XXXXXX")
image = camera.snapshot()
rek = Rekognition()
emotion = rek.get_emotion(image)
print json.dumps(emotion, indent=4, sort_keys=True)

As you can see above, after importing the custom classes (AmcrestCamera and Rekognition), you also see standard classes like the AWS Boto SDK and JSON which is used to parse any output.

Breaking it down the following is happening:

Login to the camera using camera credentials (default is admin/admin).
Take a picture using the camera using the Amcrest REST API.
Extract sentiment from Rekognition’s output after analyzing the picture.

Note that I am smiling here in the picture. Take a look at what Rekognition returns below.

[
    { 
        "Confidence": 99,
        "Emotion": "HAPPY"
    },
    {
        "Confidence": 0,
        "Emotion": "ANGRY"
    },
    {
        "Confidence": 0,
        "Emotion": "DISGUSTED"
    }
]

Let’s try one more but now expressing confusion.

This time around we see:

[
    {
        "Confidence": 81,
        "Emotion": "CONFUSED"
    },
    {
        "Confidence": 66,
        "Emotion": "HAPPY"
    },
    {
        "Confidence": 1,
        "Emotion": "SAD"
    }
]

The Rekognition confidence score is a percentage, 0-100%, that indicates accuracy. The Rekognition API allows you to pass an additional parameter that will set the confidence threshold to something you are more comfortable with.

How It Works

The Amcrest camera I am working with is not hooked up to the Internet, it is effectively an internal IoT device that I am connecting to. There are several ways to solve this connectivity problem. I chose to use Amazon Simple Queue Service (SQS) and a local docker instance on my laptop which resides on the same network. I also explored using a Raspberry Pi Zero W board to act as my local compute platform but once the small Python application I wrote to poll SQS was written and placed inside of the Docker container it no longer mattered where it was being executed from.

I’m using an AWS IoT 1-Click Enterprise Button to kick off my workflow. Every time the button is clicked, it will publish a message onto SQS via a Lambda function that I have created. From there, my small Python app running in the Docker container sees the message and issues a REST API call to the Amcrest camera. The Amcrest camera takes a picture, and I send the data off to S3 for storage. The local docker instance calls out to Amazon Rekognition to capture my current sentiment and subsequently stores that metadata into an Amazon DynamoDB table. Finally, that emotional sentiment is captured and sent off to our speech to text service, Amazon Polly, which uses a synthetic voice to tell you how you are feeling via the camera’s built in speaker.

Additionally, you could certainly adjust this architecture to use an Alexa device instead of an AWS IoT button to call the same API.

Diving Deeper

Lambda Function that IoT Button Calls

import boto3
import sys
import logging


logger = logging.getLogger()
logger.setLevel(logging.INFO)

message = sys.argv[1]

def send_message(message):
    sqs = boto3.resource('sqs', region_name='us-west-2')
    
    #SQS Queue You Wish To Use
    queue = sqs.get_queue_by_name(QueueName='XXXXX')

    response = queue.send_message(MessageBody=message)

    logger.info("MessageId created: {0}".format(response.get('MessageId')))
    logger.info("MD5 created: {0}".format(response.get('MD5OfMessageBody')))

send_message(message)

IAM Execution Role Policy for Lambda

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "arn:aws:logs:us-west-2:XXXXXXXXX:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:us-west-2:XXXXXXXX:log-group:/aws/lambda/Amcrest_Take_Picture:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "sqs:GetQueueUrl",
                "sqs:SendMessage"
            ],
            "Resource": "arn:aws:sqs:us-west-2:XXXXXXXXXX:Amcrest"
        }
    ]
}

Docker Instance Code

from amcrest import AmcrestCamera
from rekognition import Rekognition
import boto3
import json

sqs = boto3.resource('sqs', region_name='us-west-2')
queue = sqs.get_queue_by_name(QueueName='Amcrest')

def take_picture(ip, user, password):
    '''Take a picture on the Amcrest Camera'''
    camera = AmcrestCamera(ip, user, password)

    # Take Picture
    image = camera.snapshot()
    # Move Camera to Say Picture Was Taken
    camera.pan("up")
    camera.pan("down")

    # Analyze with Rek
    rek = Rekognition()
    emotion = rek.get_emotion(image)
    labels = rek.get_labels(image)
    text = rek.get_text(image)
    face = rek.get_face(image)

    return json.dumps(emotion, indent=4, sort_keys=True)

while 1:
    print("... waiting for messages ...")
    messages = queue.receive_messages(WaitTimeSeconds=20)
    for message in messages:
        # Add your logic here for your scripts
        # Could be a subprocess call or similar.
        if message.body == "TakePicture":
            print "Taking Picture Now"
            sentinment = take_picture("192.168.86.107", "admin", "XXXXXXXX")
            print sentinment
        print("Message: {}".format(message.body))
        message.delete()

Above code will show you real-time sentiment analysis without going through the entire S3 PutObject pipeline.

Amcrest Control Class

import requests
import time
from requests.auth import HTTPDigestAuth
import os
import subprocess

class AmcrestCamera(object):
    '''Control the Amcrest Camera'''

    def __init__(self, host, user, password):

        self._host = host
        self._user = user
        self._password = password

    def go(self, uri):
        '''Capture URI for Specific Action for Amcrest API'''

        url = "http://" + self._host + "/" + uri
        response = requests.get(url, auth=HTTPDigestAuth(self._user, self._password))
        return response

    def snapshot(self):
        '''Provide the filename that you want to save the snapshot to'''

        uri = "cgi-bin/snapshot.cgi"

        localtime   = time.localtime()
        timeString  = time.strftime("%Y-%m-%d-%H-%M-%S", localtime)

        file = "snapshot_" + timeString + ".jpg"

        #directory to save Snapshot
        cwd = os.getcwd()
        filename = cwd + "/" + file

        response = self.go(uri)

        if response.status_code == 200:
            with open(file, 'wb') as f:
                f.write(response.content)

            return response.content

    def pan(self, direction):
        '''Pan Camera In Specific Direction'''
        # url = 'http://192.168.86.107/cgi-bin/ptz.cgi?action=start&channel=0&code=Right&arg1=0&arg2=1&arg3=0'
        if(direction == "left"):
            direction = "Left"
            uri = "cgi-bin/ptz.cgi?action=start&channel=0&code={}&arg1=0&arg2=1&arg3=0".format(direction)
        elif(direction == "right"):
            direction = "Right"
            uri = "cgi-bin/ptz.cgi?action=start&channel=0&code={}&arg1=0&arg2=1&arg3=0".format(direction)
        elif(direction == "up"):
            direction = "Up"
            uri = "cgi-bin/ptz.cgi?action=start&channel=0&code={}&arg1=0&arg2=1&arg3=0".format(direction)
        elif(direction == "down"):
            direction = "Down"
            uri = "cgi-bin/ptz.cgi?action=start&channel=0&code={}&arg1=0&arg2=1&arg3=0".format(direction)

        response = self.go(uri)

        time.sleep(0.5)
        stop_uri = "cgi-bin/ptz.cgi?action=stop&channel=0&code={}&arg1=0&arg2=1&arg3=0".format(direction)
        stop = self.go(stop_uri)

    def get_video(self, snapshot_time=10):
        '''Capture Video for 10 Seconds and Encode Using ffmpeg'''

        localtime   = time.localtime()
        timeString  = time.strftime("%Y-%m-%d-%H-%M-%S", localtime)

        video = "snapshot_" + timeString + ".mp4"

        command = '/usr/local/bin/ffmpeg -loglevel debug -t {} -i "rtsp://{}:{}@{}/cam/realmonitor?channel=1&subtype=0" -r 24 -crf 28 {}'.format(snapshot_time, self._user, self._password, self._host, video)

        subprocess.call(command, shell=True)

Rekognition Class

import boto3
import base64

class Rekognition(object):
    '''Define Boto Methods for Rek here'''

    def __init__(self):
        self.client = boto3.client('rekognition')

    def get_emotion(self, image):
        '''Capture Emotion from an image'''

        response = self.client.detect_faces(
            Image={
                'Bytes': image,
                },
            Attributes= [
                'ALL'
            ]
        )

        emotion_array = []

        try:
            for item in response['FaceDetails'][0]['Emotions']:
                confidence = int(item['Confidence'])
                name = str(item['Type'])
                # print(name)
                # print(confidence)
                emotion = {
                    "Emotion": name,
                    "Confidence": confidence
                }

                emotion_array.append(emotion)

            return emotion_array
        except IndexError:
            return "No Face in Picture"

    def get_labels(self, image):
        '''Capture Rek Labels from an image'''

        response = self.client.detect_labels(
            Image={
                'Bytes': image,
                },
                MaxLabels=10,
                MinConfidence=50
        )

        return response

    def get_text(self, image):
        '''Return Text Detected in Image'''

        response = self.client.detect_text(
            Image={
                'Bytes': image,
            }
        )

        return response

    def get_face(self, image):
        '''Capture All Facial Features from an image'''

        response = self.client.detect_faces(
            Image={
                'Bytes': image,
                },
            Attributes= [
                'ALL'
            ]
        )

        return response

What’s Next?

Looking to the future, the ability to leverage real-time sentiment analysis for advertising and audience feedback are growing use cases.

Out of Home advertising (OOH) is a $7.7B market which is focused on billboards and other forms of advertisements such as dynamic content you may see within an airport or train station. Internally, we describe this use case as smart billboards that present content to customers based on who they are and what their current emotional state is. Additionally, by capturing sentiment, a media company could use this information to insert an advertisement that will resonate with that viewer based on how they are feeling.

One could imagine leveraging this technology to provide feedback to a public speaker when they are delivering their content. Imagine at a re:Invent, for example, if the speaker could know what section of the audience is enjoying his or her content based on a visual cue that could be shown on the comfort monitors colored coded to match the level of engagement (green, yellow or red). Based on the color, the speaker could adjust their presence on stage or even perhaps their voice inflections to better resonate with the audience.

re:Invent Presentation with comfort monitors

Wrap Up

This how-to post can help developers get started on the mechanics around capturing images or video for sentiment analysis. As you look to put into production, you will want to expand the concepts here to, for example, integrate with Amazon Kinesis Video Streams for more real-time analysis.

Advertising, deeper audience engagement, and meaningful customer interactions will continue to get better with artificial intelligence. Sixteen years ago, Steven Spielberg considered what advertising in the future could look like…The future is here and it is up to you to build the next great customer engagement platform using technology like Amazon Rekognition.

AWS for M&E Blog