AWS for Industries

New Guidance for Intelligent Product Substitutions on AWS

In a perfect world retailers would never be out of stock, but unexpected demand can thwart the best intentions. Grocers are out of stock 8.2% of the time on average and as high as 15% for promoted items, putting $7-12B of sales at risk. The next best alternative is to offer a similar product that most likely meets the customer’s needs. By providing intelligent product substitution recommendations to the employee picking orders, we can enhance the customer experience and prevent lost sales.

In 2020, US online grocery sales grew 54% and over half of those orders will be out of stock for at least one requested item. The item can be skipped resulting in a lost sale, or the employee picking the order can guess at an alternative, which sometimes doesn’t meet the customer’s needs at all. In either case, the customer experience is adversely impacted, and the company brand is tarnished.

A better approach is to recommend a similar product, such as a different brand, color, flavor, or size. If this is done intelligently, a good customer experience is preserved—issues avoided. Creating substitution rules for a huge number of products is inefficient and ineffective. Instead, Amazon Web Services (AWS) has built and published a solution to provide recommended substitutions: Guidance.

Amazon OpenSearch Service provides suggestions for substitutions of out-of-stock products. Product names and descriptions are converted to numerical vectors using a text-embedding algorithm, and then inserted into an OpenSearch Service k-Nearest Neighbor (k-NN) index. When requesting a substitute product, candidate products are constrained using OpenSearch Service pre-filtering, and then ranked according to their closeness in numerical representation from the query product.


Embeddings are numerical representations of text strings (such that embeddings that are similar in distance to one another have similar meanings). There are various embedding algorithms available, but here we have opted to use the all-MiniLM-L6-v2 sentence transformer, available on Hugging Face, due to its high performance on this task. By leveraging this sentence transformer, we can put product titles and descriptions into a numerical vector space, where similar products are located near to each other.

For example, consider a query product with the title ‘Organic peanut butter’, and alternative candidate products ‘peanuts’, ‘butter’, ‘basic peanut butter’, and ‘cheese’. We then embed the titles of these products and insert them into a k-NN index. When we ask for the neighbors (ranked by distance) from the query product, we receive a ranking of:

  • Basic peanut butter
  • Peanut
  • Butter
  • Cheese

This is because ‘Basic peanut butter’ has been embedded as a vector that is close in distance to the embedding of ‘Organic peanut butter’, and the embedding of ‘Cheese’ is the most distant.

Solution Overview

Consider the following architecture:

New Guidance for Intelligent Product Substitutions on AWS

Figure 1 – Guidance for Product Substitutions on AWS

The workflow contains the following stages:

  1. A product catalog is uploaded to an S3 bucket
  2. An AWS Lambda function is triggered that reads from the S3 bucket, and inserts the products into Amazon DynamoDB
  3. Products inserted into DynamoDB are streamed to a Lambda that embeds the textual fields of the products as numeric vectors and indexes the products into the OpenSearch Service cluster
  4. The user queries the Amazon API Gateway endpoint with a query product’s ID
  5. A Lambda queries the OpenSearch Service cluster for the most similar products that satisfy the specified filters


To implement this solution, make sure you have the following prerequisites:

Deploying the application

1. Clone the project to your development environment

2. Install NPM dependencies

cd substitutions;

npm ci;

3. AWS CDK Bootstrap your account with the required qualifier

npm run cdk bootstrap -- --toolkit-stack-name CDKToolkit-Substitutions --qualifier subs

4. Deploy

npm run cdk deploy

Once deployed, AWS CDK will output the API endpoint and available methods to the command line. You can also find these in the Outputs section of the corresponding AWS CloudFormation stack in the AWS console.

Using the application

Formatting the data

The application expects the data to be in JSON Lines format. Each product in your catalog should be on its own line in the file.

Each product is required to have at least the following required fields:

  • id (string) – a unique product identifier
  • title (string) – the name of the product, generally including the brand name and product type; for example, ‘MyBrand Super Organic Almond Milk’

Additionally, products can have the following extra reserved fields:

  • description (string) – a description of the product; for example, ‘The best almond milk in town… made with only almonds and water. You will never drink another almond milk again!’
  • brand (string) – the brand of the product, as a categorical string with consistent spelling for each brand in the dataset; for example, ‘MyBrand’.
  • price (float) – the price of the product; for example, 3.50
  • categories (string[]) – the category nesting of the product, with highest-level category at the start, and lowest-level category at the end; for example, [‘Drinks’, ‘Vegan Milk’, ‘Almond Milk’]
  • diet_type (string[]) – a list of specific diets that the product adheres two; for example, [‘vegan’, ‘gluten free’, ‘kosher’]
  • allergens (string[]) – a list of allergens that are present in the product; for example, [‘nuts’]

Uploading the data

Once the data is prepared, it should be uploaded as a single, or multiple, files to the deployed S3 bucket named <ACCOUNT_ID>-<REGION>-substitutions-input-bucket.

This will insert the products into a DynamoDB table, and trigger a stream to a Lambda function that will embed the textual properties of the products and insert the vectors into an OpenSearch Service k-NN index.

The indexing process can take a while. To check the progress of the indexing, you can call using the /status API method. This will return the number of products currently in the table, and the number of products currently indexed in OpenSearch Service. Indexing is complete when these two values are equal.

Requesting substitutions

Once the data is indexed into the OpenSearch Service cluster, we can start requesting substitutions. The /substitutions API requires, at minimum, a Product ID to be supplied in the form of a URL query parameter as so: /substitutions?id=<PRODUCT_ID>. The result is a ranked list of similar products by name and description.

Substitution results can be further constrained by applying pre-filters to the k-NN search. This solution provides the following filters: category match, price factor, diet type match, brand match, no new allergens.

What’s returned is an order list of possible substitutions. For a picker, we suggest providing the top three and allowing them to choose. Further, we suggest tracking the efficacy of the recommendations over time. We have seen retailers determine substitutions using a combination of known, stored substitutions that are proven, and automated substitutions as would be calculated by this solution.


Retailers, especially grocers, that are fulfilling orders and often hit out-of-stock situations will benefit from intelligent product substitution recommendations resulting in faster picking and increased customer satisfaction.

To see the reference architecture and download the source code, visit Product Substitutions Guidance.

Contact an AWS Representative to know how we can help accelerate your business.

Alexandros Jordan Maragakis

Alexandros Jordan Maragakis

Alexandros Jordan Maragakis is a Solutions Architect at AWS based in United Kingdom. He’s spent the last 3+ years focused on cloud-based solutions for various industries, but his passion lies in machine learning and frontend development. In his spare time AJ enjoys music and plays guitar like a pro.