AWS Machine Learning Blog

Building an NLU-powered search application with Amazon SageMaker and the Amazon OpenSearch Service KNN feature

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. This post has also been updated with changes required for SageMaker SDK v2 and an improved notebook experience.

The rise of semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by natural language understanding (NLU) allow you to speak or type into a device using your preferred conversational language rather than finding the right keywords for fetching the best results. You can query using words or sentences in your native language, leaving it to the search engine to deliver the best results.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon OpenSearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost-effectively at scale. Amazon ES offers KNN search, which can enhance search in use cases such as product recommendations, fraud detection, and image, video, and some specific semantic scenarios like document and query similarity. Alternatively, you can also choose Amazon Kendra, a highly accurate and easy to use enterprise search service that’s powered by machine learning, with no machine learning experience required. In this post, we explain how you can implement an NLU-based product search for certain types of applications using Amazon SageMaker and the Amazon ES k-nearest neighbor (KNN) feature.

In the post Building a visual search application with Amazon SageMaker and Amazon ES, we shared how to build a visual search application using Amazon SageMaker and the Amazon ES KNN’s Euclidean distance metric. Amazon ES now supports open-source Elasticsearch version 7.7 and includes the cosine similarity metric for KNN indexes. Cosine similarity measures the cosine of the angle between two vectors in the same direction, where a smaller cosine angle denotes higher similarity between the vectors. With cosine similarity, you can measure the orientation between two vectors, which makes it the ideal choice for some specific semantic search applications. The highly distributed architecture of Amazon ES enables you to implement an enterprise-grade search engine with enhanced KNN ranking, with high recall and performance.

In this post, you build a very simple search application that demonstrates the potential of using KNN with Amazon ES compared to the traditional Amazon ES ranking method, including a web application for testing the KNN-based search queries in your browser. The application also compares the search results with Elasticsearch match queries to demonstrate the difference between KNN search and full-text search.

Overview of solution

Regular Elasticsearch text-matching search is useful when you want to do text-based search, but KNN-based search is a more natural way to search for something. For example, when you search for a wedding dress using KNN-based search application, it gives you similar results if you type “wedding dress” or “marriage dress.” Implementing this KNN-based search application consists of two phases:

  • KNN reference index – In this phase, you pass a set of corpus documents through a deep learning model to extract their features, or embeddings. Text embeddings are a numerical representation of the corpus. You save those features into a KNN index on Amazon ES. The concept underpinning KNN is that similar data points exist in close proximity in the vector space. As an example, “summer dress” and “summer flowery dress” are both similar, so these text embeddings are collocated, as opposed to “summer dress” vs. “wedding dress.”
  • KNN index query – This is the inference phase of the application. In this phase, you submit a text search query through the deep learning model to extract the features. Then, you use those embeddings to query the reference KNN index. The KNN index returns similar text embeddings from the KNN vector space. For example, if you pass a feature vector of “marriage dress” text, it returns “wedding dress” embeddings as a similar item.

Next, let’s take a closer look at each phase in detail, with the associated AWS architecture.

KNN reference index creation

For this use case, you use dress images and their visual descriptions from the Feidegger dataset. This dataset is a multi-modal corpus that focuses specifically on the domain of fashion items and their visual descriptions in German. The dataset was created as part of ongoing research at Zalando into text-image multi-modality in the area of fashion.

In this step, you translate each dress description from German to English using Amazon Translate. From each English description, you extract the feature vector, which is an n-dimensional vector of numerical features that represent the dress. You use a pre-trained BERT HuggingFace model hosted in Amazon SageMaker to extract 768 feature vectors of each visual description of the dress, and store them as a KNN index in an Amazon OpenSearch Service domain.

The following screenshot illustrates the workflow for creating the KNN index.

The process includes the following steps:

  1. Users interact with a Jupyter notebook on an Amazon SageMaker notebook instance. An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. Amazon SageMaker manages creating the instance and related resources.
  2. Each item description, originally open-sourced in German, is translated to English using Amazon Translate.
  3. A pre-trained BERT HuggingFace model is downloaded, and the model artifact is serialized and stored in Amazon Simple Storage Service(Amazon S3). The model is used to serve from a PyTorch model server on an Amazon SageMaker real-time endpoint.
  4. Translated descriptions are pushed through the SageMaker endpoint to extract fixed-length features (embeddings).
  5. The notebook code writes the text embeddings to the KNN index along with product Amazon S3 URI in an Amazon ES domain.

KNN search from a query text

In this step, you present a search query text string from the application, which passes through the Amazon SageMaker hosted model to extract 768 features. You use these features to query the KNN index in Amazon OpenSearch Service. KNN for Amazon OpenSearch Service lets you search for points in a vector space and find the nearest neighbors for those points by cosine similarity (the default is Euclidean distance). When it finds the nearest neighbors vectors (for example, k = 3 nearest neighbors) for a given query text, it returns the associated Amazon S3 images to the application. The following diagram illustrates the KNN search full-stack application architecture.

The process includes the following steps:

  1. The end-user accesses the web application from their browser or mobile device.
  2. A user-provided search query string is sent to Amazon API Gateway and AWS Lambda.
  3. The Lambda function invokes the Amazon SageMaker real-time endpoint, and the model returns a vector of the search query embeddings. Amazon SageMaker hosting provides a managed HTTPS endpoint for predictions and automatically scales to the performance needed for your application using Application Auto Scaling.
  4. The function passes the search query embedding vector as the search value for a KNN search in the index in the Amazon ES domain. A list of k similar items and their respective Amazon S3 URIs are returned.
  5. The function generates pre-signed Amazon S3 URLs to return back to the client web application, used to display similar items in the browser.


For this walkthrough, you should have an AWS account with appropriate AWS Identity and Access Management (IAM) permissions to launch the AWS CloudFormation template.

Deploying your solution

You use a CloudFormation stack to deploy the solution. The stack creates all the necessary resources, including the following:

  • An Amazon SageMaker notebook instance to run Python code in a Jupyter notebook
  • An IAM role associated with the notebook instance
  • An Amazon ES domain to store and retrieve sentence embedding vectors into a KNN index
  • Two S3 buckets: one for storing the source fashion images and another for hosting a static website

From the Jupyter notebook, you also deploy the following:

  • An Amazon SageMaker endpoint for getting fixed-length sentence embedding vectors in real time.
  • An AWS Serverless Application Model (AWS SAM) template for a serverless backend using API Gateway and Lambda.
  • A static front-end website hosted on an S3 bucket to demonstrate a real-world, end-to-end ML application. The front-end code uses ReactJS and the AWS Amplify JavaScript library.

To get started, complete the following steps:

  1. Sign in to the AWS Management Console with your IAM user name and password.
  2. Choose Launch Stack and open it in a new tab:

  1. On the Quick create stack page, select the check-box to acknowledge the creation of IAM resources.
  2. Choose Create stack.

  1. Wait for the stack to complete.

You can examine various events from the stack creation process on the Events tab. When the stack creation is complete, you see the status CREATE_COMPLETE.

You can look on the Resources tab to see all the resources the CloudFormation template created.

  1. On the Outputs tab, choose the SageMakerNotebookURL

This hyperlink opens the Jupyter notebook on your Amazon SageMaker notebook instance that you use to complete the rest of the lab.

You should be on the Jupyter notebook landing page.

  1. Choose nlu-based-item-search.ipynb.

Building a KNN index on Amazon ES

For this step, you should be at the beginning of the notebook with the title NLU based Item Search. Follow the steps in the notebook and run each cell in order.

You use a pre-trained BERT model (distilbert-base-nli-stsb-mean-tokens) from HuggingFace and host it on an Amazon SageMaker PyTorch model server endpoint to generate fixed-length sentence embeddings. The embeddings are saved to the Amazon OpenSearch Service domain created in the CloudFormation stack. For more information, see the markdown cells in the notebook.

Continue when you reach the cell Deploying a full-stack NLU search application in your notebook.

The notebook contains several important cells; we walk you through a few of them.

Download the multi-modal corpus dataset from Feidegger, which contains fashion images and descriptions in German. See the following code:

## Data Preparation

import os 
import shutil
import json
import tqdm
import urllib.request
from tqdm import notebook
from multiprocessing import cpu_count
from tqdm.contrib.concurrent import process_map

images_path = 'data/feidegger/fashion'
filename = 'metadata.json'

my_bucket = s3_resource.Bucket(bucket)

if not os.path.isdir(images_path):

def download_metadata(url):
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)
#download metadata.json to local notebook

def generate_image_list(filename):
    metadata = open(filename,'r')
    data = json.load(metadata)
    url_lst = []
    for i in range(len(data)):
    return url_lst

def download_image(url):
    urllib.request.urlretrieve(url, images_path + '/' + url.split("/")[-1])
#generate image list            
url_lst = generate_image_list(filename)     

workers = 2 * cpu_count()

#downloading images to local disk
process_map(download_image, url_lst, max_workers=workers)

Upload the dataset to Amazon S3:

# Uploading dataset to S3

files_to_upload = []
dirName = 'data'
for path, subdirs, files in os.walk('./' + dirName):
    path = path.replace("\\","/")
    directory_name = path.replace('./',"")
    for file in files:
            "filename": os.path.join(path, file),
            "key": directory_name+'/'+file

def upload_to_s3(file):
        my_bucket.upload_file(file['filename'], file['key'])
#uploading images to s3
process_map(upload_to_s3, files_to_upload, max_workers=workers)

This dataset has product descriptions in German, so you use Amazon Translate for the English translation for each German sentence:

with open(filename) as json_file:
    data = json.load(json_file)

#Define translator function
def translate_txt(data):
    results = {}
    results['filename'] = f's3://{bucket}/data/feidegger/fashion/' + data['url'].split("/")[-1]
    results['descriptions'] = []
    translate = boto3.client(service_name='translate', use_ssl=True)
    for i in data['descriptions']:
        result = translate.translate_text(Text=str(i), 
            SourceLanguageCode="de", TargetLanguageCode="en")
    return results

Save the sentence transformers model to notebook instance:

!pip install install transformers[torch]

#Save the model to disk which we will host at sagemaker
from transformers import AutoTokenizer, AutoModel
saved_model_dir = 'transformer'
   os.makedirs(saved_model_dir, exist_ok=True)

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/distilbert-base-nli-stsb-mean-tokens")
model = AutoModel.from_pretrained("sentence-transformers/distilbert-base-nli-stsb-mean-tokens") 


Upload the model artifact (model.tar.gz) to Amazon S3 with the following code:

#zip the model in tar.gz format
!cd transformer && tar czvf ../model.tar.gz *

#Upload the model to S3

inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

Deploy the model into an Amazon SageMaker PyTorch model server using the Amazon SageMaker Python SDK. See the following code:

from sagemaker.pytorch import PyTorch, PyTorchModel
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role

class StringPredictor(Predictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')
pytorch_model = PyTorchModel(model_data = inputs, 
                             entry_point ='',
                             source_dir = './code',
                             py_version = 'py3', 
                             framework_version = '1.7.1',

predictor = pytorch_model.deploy(instance_type='ml.g4dn.xlarge', 
                                 endpoint_name = f'nlu-search-model-{int(time.time())}')

Define a cosine similarity Amazon OpenSearch KNN index mapping with the following code (to define cosine similarity KNN index mapping, you need Amazon OpenSearch Service 7.7 and above):

#KNN index maping
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "analysis": {
          "analyzer": {
            "default": {
              "type": "standard",
              "stopwords": "_english_"
    "mappings": {
        "properties": {
           "zalando_nlu_vector": {
                "type": "knn_vector",
                "dimension": 768

Each product has five visual descriptions, so you combine all five descriptions and get one fixed-length sentence embedding. See the following code:

# For each product, we are concatenating all the 
# product descriptions into a single sentence,
# so that we will have one embedding for each product

def concat_desc(results):
    obj = {
        'filename': results['filename'],
    obj['descriptions'] = ' '.join(results['descriptions'])
    return obj

concat_results = map(concat_desc, results)
concat_results = list(concat_results)

Import the sentence embeddings and associated Amazon S3 image URI into the Amazon OpenSearch Service KNN index with the following code. You also load the translated descriptions in full text, so that later you can compare the difference between KNN search and standard match text queries in Amazon OpenSearch Service.

# defining a function to import the feature vectors corresponds to each S3 URI into Elasticsearch KNN index
# This process will take around ~10 min.

def es_import(concat_result):
    vector = json.loads(predictor.predict(concat_result['descriptions']))
             body={"zalando_nlu_vector": vector,
                   "image": concat_result['filename'],
                   "description": concat_result['descriptions']}
workers = 8 * cpu_count()
process_map(es_import, concat_results, max_workers=workers)

Building a full-stack KNN search application

Now that you have a working Amazon SageMaker endpoint for extracting text features and a KNN index on Amazon OpenSearch Service, you’re ready to build a real-world, full-stack ML-powered web app. You use an AWS SAM template to deploy a serverless REST API with API Gateway and Lambda. The REST API accepts new search strings, generates the embeddings, and returns similar relevant items to the client. Then you upload a front-end website that interacts with your new REST API to Amazon S3. The front-end code uses Amplify to integrate with your REST API.

  1. In the following cell, prepopulate a CloudFormation template that creates necessary resources such as Lambda and API Gateway for full-stack application:
s3_resource.Object(bucket, 'backend/template.yaml').upload_file('./backend/template.yaml', ExtraArgs={'ACL':'public-read'})

sam_template_url = f'https://{bucket}'

# Generate the CloudFormation Quick Create Link

print("Click the URL below to create the backend API for NLU search:\n")

The following screenshot shows the output: a pre-generated CloudFormation template link.

  1. Choose the link.

You are sent to the Quick create stack page.

  1. Select the check-boxes to acknowledge the creation of IAM resources, IAM resources with custom names, and CAPABILITY_AUTO_EXPAND.
  2. Choose Create stack.

When the stack creation is complete, you see the status CREATE_COMPLETE. You can look on the Resources tab to see all the resources the CloudFormation template created.

  1. After the stack is created, proceed through the cells.

The following cell indicates that your full-stack application, including front-end and backend code, are successfully deployed:

print('Click the URL below:\n')
print(outputs['S3BucketSecureURL'] + '/index.html')

The following screenshot shows the URL output.

  1. Choose the link.

You are sent to the application page, where you can provide your own search text to find products using both the KNN approach and regular full-text search approaches.

  1. When you’re done testing and experimenting with your KNN search application, run the last two cells at the bottom of the notebook:
# Delete the endpoint

# Empty S3 Contents
training_bucket_resource = s3_resource.Bucket(bucket)

hosting_bucket_resource = s3_resource.Bucket(outputs['s3BucketHostingBucketName'])

These cells end your Amazon SageMaker endpoint and empty your S3 buckets to prepare you for cleaning up your resources.

Cleaning up

To delete the rest of your AWS resources, go to the AWS CloudFormation console and delete the nlu-search-api and nlu-search stacks.


In this post, we showed you how to create a KNN-based search application using Amazon SageMaker and Amazon OpenSearch Service KNN index features. You used a pre-trained BERT model from the HuggingFace Model Hub. You used a pre-trained BERT model from the sentence-transformers Python library. You can also fine-tune your BERT model using your own dataset. For more information, see Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker.

For more information about the code sample in the post, see the GitHub repo.

About the Authors

Amit Mukherjee is a Sr. Partner Solutions Architect with a focus on data analytics and AI/ML. He works with AWS partners and customers to provide them with architectural guidance for building highly secure and scalable data analytics platforms and adopting machine learning at a large scale.

Laith Al-Saadoon is a Principal Solutions Architect with a focus on data analytics at AWS. He spends his days obsessing over designing customer architectures to process enormous amounts of data at scale. In his free time, he follows the latest in machine learning and artificial intelligence.

Eitan Sela is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.