Build an Intelligent Index of Similar Images

with Amazon SageMaker

In this tutorial you will learn how to use Amazon SageMaker to train a model that will be used to index images that are visually similar to each other. You will test the model with an example image, to see how well it can identify other images that look like it. In oil & gas, this can help you find analogous reservoirs by comparing similar projects instantly and without manual intervention. For instance, if you identify new features or patterns in seismic surveys indicative of field potential, Sagemaker allows you to find related images from terabytes of archived seismic based on similar patterns. The applications of intelligent search across the industry are endless.

Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is often complicated and slow. Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker removes barriers that typically slow down developers who want to use machine learning.

In the next several minutes, you'll launch an Amazon SageMaker notebook instance and load an example notebook which contains the code to download some sample image data, prepare a test and evaluation datasets, use a pre-trained model to create vectors of test images and then train a new model that recognizes similar image vectors. The example code will then deploy the new model to Amazon SageMaker for hosting, where you will be to generate inference requests and measure the accuracy of the model

As part of the AWS Free Tier, you can get started with Amazon SageMaker for free. For the first two months after sign-up, you are offered a monthly free tier of 250 hours of t2.medium notebook usage for building your models, plus 50 hours of m4.xlarge for training, and 125 hours of m4.xlarge for hosting your machine learning models with Amazon SageMaker.

This tutorial requires an AWS account

There are no additional charge for Amazon Sagemaker. The resources you create in this tutorial are Free Tier eligible. 

More about the Free Tier >>

Step 1. Enter the Amazon Sagemaker Console

a.  Open the AWS Management Console, so you can keep this step-by-step guide open. When the screen loads, enter your user name and password to get started. Then type Sagemaker in the search bar and select Sagemaker to open the Sagemaker console. 

Step 1a

( click to enlarge )

Step 2. Set up an Amazon SageMaker notebook instance

In this step, you will set up and configure an Amazon SageMaker notebook instance.

a.  Launch an Amazon SageMaker notebook by selecting Create notebook instance in the Get started section on the right side of the screen.

Step 2
Step 2

( click to enlarge )

b. Under the Notebook instance settings section complete the following steps: 1) Give your notebook a name (for example, “ImageIndexing”), 2) Select a Notebook instance type (for example, ml.t2.medium, which is the smallest and lower cost), and 3) Select Create a new role.

Step 2b

( click to enlarge )

c. On the Create an IAM role screen, under S3 buckets you specify - optional, select None. Keep all the other default values as is to allow SageMaker related buckets and objects to be accessed. Select Create role.

Step 2c

( click to enlarge )

d. Back on the Create Notebook instance screen, select the IAM role that you just created in step 2.c. and then select Create notebook instance. The other values for VPC and KMS are not necessary for this simple tutorial, but should be used when using real private data.

Step 2d

( click to enlarge )

e. When the instance is ready, the notebook status will turn from Pending to InService. Whilst the instance is starting we need to add additional permissions to ensure that it can access the Amazon EC2 Container Registry. To do this, navigate to the IAM console and select Roles.

Step 2e

( click to enlarge )

f. Scroll down the list of roles and select the SageMaker role that you just created. Select Attach policies.

Step 2f

( click to enlarge )

g. Scroll down the list of roles and select the SageMaker role that you just created. Select Attach policies.

Step 2g

( click to enlarge )

h. In the following summary page, ensure that the new policy has been added to the rule. Once the policy has been added, your notebook instance will have permissions to access the Amazon EC2 Container service.

Step 2h

( click to enlarge )

i. Return to the Amazon SageMaker Console and check that the notebook status is listed as InService. Once the notebook is ready, select Open. This will open the Jupyter web application on your instance.

Step 2i

( click to enlarge )

Step 3. Import and configure the example Image Indexing notebook

In this step, you will work within the example Jupyter notebook, to prepare some data, train a model and launch an indexing service.

a. Once the Jupyter notebook is open, select New and then Terminal, to open a new terminal to retrieve the tutorial notebook. You will use the tutorial notebook to a custom model.

Step 3a

( click to enlarge )

b. Now, you will retrieve the notebook from the git repository. In the terminal, change to the SageMaker directory by entering: cd SageMaker . Next, clone the repository from GitHub by entering: git clone

c. Return to the Jupyter Home screen and navigate to the notebook by selecting the seismic-vision-search folder.

Step 3c

( click to enlarge )

d. Open the notebook named Seismic-Vision-Search.ipynb. Scroll through the notebook to read both the comments and code, which explains all of the steps required to product an image indexing service.

Step 3d

( click to enlarge )

e. To write its output the model needs an S3 Bucket. So, now open the S3 Console, and select Create Bucket.

Step 3e

( click to enlarge )

f. Name the bucket with the sagemaker- prefix, to allow Amazon SageMaker to access it, and append it with your name for uniqueness. Under Region, select the region in which you wish to create this bucket. Note that the bucket should be in the same region as the notebook instance. Select Create in the bottom left corner.

Step 3f

( click to enlarge )

Step 4: Update Then Execute Your Notebook

a. Next, scroll down the notebook cells to the first code cell with the S3 Bucket name, and assign the name of the S3 bucket you created (for example, sagemaker-mamoon).

Step 4

( click to enlarge )

b. From the Cell menu at the top of the Jupyter page select the option Run all.

Step 5: Clean up

a. SageMaker model hosting incurs hourly billing charges that may go above your free tier if left in service. To ensure that you don’t use any unnecessary billable time, you can run the last cell to tear down the hosting instance(s) and remove the SageMaker endpoint.

Step 5

( click to enlarge )


You have created a custom model for content-based image indexing and retrieval using Amazon SageMaker!

You can now customize this model for your own images to experiment with inferring similarity for any input image.