AWS Machine Learning Blog

Understand Movie Star Social Networks Using Amazon Rekognition and Graph Databases

Amazon Rekognition is an AWS service that makes it easy to add image analysis to your applications. The latest feature added to the API for this deep-learning-powered computer vision is Celebrity Recognition. This simple-to-use functionality detects and recognizes thousands of individuals who are famous, noteworthy, or prominent in their field. Users can harness the tool to index and search digital image libraries for celebrities based on any particular interest.

One common way we have seen our customers store data about individuals is within graph databases. As a previous blog post discusses in detail, companies such as Facebook, LinkedIn, and Twitter have revolutionized the way society interacts through their ability to manage a huge network of relationships. The purpose of this blog post is to demonstrate how simple it can be to pair Rekognition’s Celebrity and Face Recognition functionality with the relationship information stored in graph databases.

The pairing of these technologies allows customers to start with a picture and understand how the person in the picture is related to another person of interest. Users can even submit two pictures and quickly determine how the people in the two different pictures might be related to each other. One comical example of this relationship mapping is the popular Six Degrees of Kevin Bacon Game. However the business value of such an application is enormous. Law enforcement agencies can start with two pictures, use Rekognition to identify the people, and then query a graph database to understand if the two people of interest might know each other. Similarly, hospitality companies can use Rekognition and a graph database to quickly identify any celebrities on the premises and understand which other celebrities they might know who are staying nearby.

For this blog post, we walk through a demonstration of how to use Rekognition with a graph database (we will be using Neo4j Community Edition), with a Jupyter Notebook using the D3.js library.


To get started with this exciting combination of technologies, first get a copy of the project from the AWS Labs Github repository.   The project structure has two main areas:

  • <project root> – This is where the actual Jupyter Notebook is located with all dependencies.
  • <project root>/cft – The AWS CloudFormation templates, sample properties, and sample commands to create the infrastructure.

You’ll need to add a new or existing ssh key. The AWS CloudFormation template installs the Community Edition of Neo4j, downloads a Jupyter Notebook from AWS Labs containing example Python code to interact with Rekognition, and configures a few other Amazon EC2 settings that are necessary to quickly get started. The Cloud Formation template also automatically loads the popular Movie Graph Database to be queried from either the Neo4j browser or a Jupyter Notebook.

Run the AWS CloudFormation template named rek-neo4j-blogpost-git.template. With this template, all you need is the name of your ssh EC2 key (see below):

aws cloudformation create-stack --stack-name rekognitionblog \
  --template-body file://rek-neo4j-blogpost-git.template \
  --parameters ParameterKey=KeyName,ParameterValue=<YOURKEYHERE> \
  --capabilities CAPABILITY_NAMED_IAM

After waiting the few minutes necessary for AWS CloudFormation to complete the installation, you can get the DNS and IP address of your new server by executing the following and viewing the Outputs section of the response (a sample follows):

> aws cloudformation describe-stacks --stack-name rekognitionblog
. . .
. . .
“Outputs”: [

. . . 

Remember the IP address and the DNS of your new server for future use.

Browse to your newly created EC2 instance’s port 7474 by entering <public DNS>:7474 into your favorite browser. Log in for the first time with neo4j / neo4j and then you will be prompted to change the password. This is the password you will use in the notebook (we use the password password for this demo, but feel free to use a more secure password of your choice). If you can’t browse to the Neo4j instance, try logging out of your VPN if you are having trouble opening the Neo4j browser.

Access the Notebook

For this exercise, we will use ssh to tunnel to the ec2 instance to be able to browse to the Jupyter Notebook. The default user on the instance is ‘ubuntu,’ and remember to use the ssh key you specified when setting up the instance:

> sudo ssh –I <your-public-key> -N -L 8888:localhost:8888 ubuntu@<publicDNS>

Next, start the Jupyter Notebook from the place where the notebook is located using the following commands within your EC2 instance.

> cd /opt/notebook
> jupyter notebook –no-browser
. . . 
. . .
Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:

Take note of the token that is given back to you as feedback. Use it in your choice of browser by navigating to localhost8888/?token=<whatever the token> in a new browser.

You should now have the Jupyter Notebook open that includes examples of using Python to interact with the Rekognition API. The structure of the code is as follows:

  1. Download, install, and import the necessary Python modules.
  2. Retrieve an image of choice and store the object as ‘image1.jpg’.
  3. Use the Rekognition API to detect the number of faces in the image (celebrity or not), print the bounding boxes around those faces, and demonstrate printing the cropped face of the largest face in the image. Rekognition identifies faces starting from the largest and moving iteratively to the smaller faces, to a maximum of 15 faces in an image.
  4. Use the Rekognition API to detect if any celebrities are in the image and print their names and a link to find additional information about the person.
  5. Authenticate to the Neo4j database and create a graph object.
  6. Confirm that the movie graph database is accessible by querying how many degrees of separation are between Kevin Bacon and whichever movie star is shown in image1.jpg. Note that Rekognition identifies celebrities from all walks of life, but the movie graph database only stores information about those who have been on television or in movies. In the default code, image1.jpg shows Michael Dorman from the Amazon Original Show ‘Patriot’.
  7. Store a second picture as image2.jpg and print the downloaded object.
  8. Use the Rekognition API to detect and print the names of any celebrities in image2.jpg. The default code demonstrates detecting Titus Welliver and Jamie Hector from the Amazon Original Show ‘Bosch.’
  9. Query the Neo4j Movie Graph database for the relationship between the movie stars in image1.jpg and image2.jpg. As an added bonus, the code also renders an interactive d3.js visualization of the relationship.


This blog post demonstrates interactions with the Rekognition API using Python, querying a Neo4j database with Rekognition output, and rendering d3.js visualizations in interactive Jupyter Notebooks. Pairing such technology allows you to find not only who is in a picture, but also how the identified person is related to another person of interest. The customer implications for such a solution are enormous for industries ranging from law enforcement to hospitality.

If you have any questions or suggestions, please leave a comment.

About the Authors

derek_graeber_90_1Derek Graeber is a Senior Consultant in Big Data & Analytics for AWS Professional Services. He works with enterprise customers to provide leadership on big data projects, helping them realize their business goals when running on AWS. In his spare time, he enjoys spending time with his wife and family, occasionally getting out to reaffirm that he will never be a good golfer.

Kyle Johnson is a Data Scientist with AWS Professional Services. He enjoys building repeatable artificial intelligence solutions to solve customer business problems. In his spare time, he enjoys going to Phipps Conservatory with his family and scheming ideas for Amazon Rekognition enabled Halloween decorations.