Improving order history search using semantic search with Amazon OpenSearch Service

If you’ve ever shopped on Amazon, you’ve used Your Orders. This feature maintains your complete order history dating back to 1995, so you can track and manage every purchase you’ve made. The order history search feature lets you find your past purchases by entering keywords in the search bar. Beyond just finding items, it provides a straightforward way to repurchase the same or similar items, saving you time and effort.

Various features across Amazon’s shopping experience, such as Rufus and Alexa, use order history search to help you find your past purchases. Therefore, it’s important that order history search can locate your past purchased items as accurately and quickly as possible.

In this post, we show you how the Your Orders team improved order history search by introducing semantic search capabilities on top of our existing lexical search system, using Amazon OpenSearch Service and Amazon SageMaker.

Limitations of lexical search

Order history search uses lexical matching to find items from the entire order history of a customer that match at least one word of the search keywords. For example, if a customer searches for “orange juice,” the system retrieves all orange juice items as well as fresh oranges and other fruit juices the customer had previously ordered. Although lexical matching can provide a high recall of items with terms matching the search keywords precisely, it doesn’t work well for related or generic search keywords, like “health drinks” in this example.

Since the launch of Rufus, Amazon’s AI-enabled shopping assistant, a growing number of customers are experiencing a streamlined and richer shopping journey, including searching for their previous purchases with Rufus. Customers can now ask “Show me healthy drinks” without worrying about using lengthy, more precise terms like “kombucha”, “green tea”, and “protein shakes”. This makes the search experience more conversational and intent-based, presenting an opportunity to make item discovery more intuitive. For Rufus to answer order history searches with the same intuitive experience such as “Show me the healthy drinks I bought last year”, the underlying order history data store (“Your Orders”) needs semantic search capability to understand the underlying semantics of search keywords beyond the conventional lexical matching.

Challenges implementing semantic search

Implementing semantic search at our scale presented several technical challenges:

Scale – We needed to enable semantic search across billions of records corresponding to customers’ order history globally.
Zero downtime – We needed to keep the system 100% available while making changes on the backend to introduce semantic search.
Preventing search quality degradation – Semantic search is intended to improve the quality of search results. However, in some cases, it can reduce search quality. For example, if a customer remembers their item name exactly and wants to find only items matching that name, surfacing similar items in addition to the exactly matching items will increase crowding in results and make it harder to find the relevant item. Similarly, semantic search will not work for cases where the customer intends to search by identifier values, like order ID, which lack an inherent semantic meaning. For these scenarios, we use lexical search only.

Solution overview

Semantic search is powered by large language models (LLMs), which are mostly trained on human languages. These models can be adapted to take a piece of text in any language they were trained in and emit an embedding vector of a fixed length, irrespective of the input text length. By design, embedding vectors capture the semantic meaning of input text such that two semantically similar text strings have high cosine similarity computed on their respective embedding vectors. For semantic search on order history, the input text subject to embedding generation and similarity computation are the customer search phrases and the product text of purchased items.

We divide our solution into two parts:

Improving system scalability and resiliency for handling requests at scale – Before implementing semantic search, we needed to ensure our infrastructure could handle the increased computational load, leading us to adopt a cell-based architecture. This step is not needed for every use case, but systems with very high scale in terms of request or data volume can benefit a lot from its use before implementing a resource-intensive use case like semantic search.
Implementing semantic search – We began by evaluating the available embedding models, using the offline evaluation capabilities of Amazon Bedrock to test different models. After we selected our model, we could establish the infrastructure for generating embedding vectors.

Improving system scalability and resiliency

We used the cell-based architecture design pattern for improving our scalability and resiliency. A cell-based design entails partitioning the system into identical, smaller, self-contained chunks, or cells, which handle only a part of the overall traffic received by the system. The following diagram shows a high-level representation of a cell-based design for order history search.

Cell-based architecture diagram showing customer request routing to Amazon OpenSearch Service domains via hash-based partitioning

Each cell serves a defined subset of our customers. Cells don’t need to communicate with one another to serve a customer request. Each customer is assigned to a cell and each request from that customer is routed to that cell. The OpenSearch Service domain in each cell holds data only for the subset customers that it is supposed to serve. The number of cells (N) and distribution of data among those cells depends on the business use case, but the goal is to achieve as even a distribution of data and traffic as possible.

The routing logic can be kept as simple or as sophisticated as the use case requires it to be. The cell assignment values can either be computed at runtime for each request, or they can be computed one time and written to a cache or persistent data store like Amazon DynamoDB, from where cell assignment values can be fetched for subsequent requests. For order history search, the logic was simple and quick enough to be executed at runtime for each request. Looking up cell assignment from a persistent data store is especially useful for cases where there is a risk of some cells becoming “heavier” than others over time. In such cases, it becomes easier to redistribute the heavy cell’s data by simply overriding cell assignment values for specific keys in the data store, instead of having to change the partitioning logic immediately, which might have an impact on data distribution across all the cells.

As the system’s load grows, the number of cells in the system can be increased to handle the additional traffic. Even without increasing the number of cells in the system, we can redistribute current data among the existing N cells by reassigning some keys from one or more heavily populated cells to different lightly populated cells to spread out the load more evenly across all the cells and make more efficient use of the infrastructure.

A cell-based architecture also helps make the system more resilient. For example, if we lose one cell, our capacity is diminished only by 1/N, instead of 100%. This arrangement can also be improved to reduce the capacity loss even further by assigning partitioning keys to two or more cells such that they get written to two or more cells. In such cases, loss of a single cell does not result in data loss.

Implementing semantic search

Implementing semantic search for our order history search required several key decisions and technical steps. We began by evaluating the available embedding models, using the offline evaluation capabilities of Amazon Bedrock to test different models against our specific business domain requirements. This evaluation process helped us identify which model would deliver the best performance for our use case. After we selected our model, we needed to establish the infrastructure for generating embedding vectors. We containerized our embedding model and registered it in Amazon Elastic Container Registry (Amazon ECR), then deployed it using SageMaker inference endpoints to handle the actual vector computation at scale.

For the search infrastructure itself, we chose OpenSearch Service to implement our semantic search capabilities. OpenSearch Service provided both the vector storage we needed and the search algorithms required to deliver relevant results to our users.

One of our biggest challenges was updating our historical data to support semantic search on existing orders. We built a data processing pipeline using AWS Step Functions to orchestrate the workflow and AWS Lambda functions to handle the actual vector generation for our legacy data, so we could provide semantic search for all the records we wanted to.

The following diagram illustrates the high-level architecture.

Architecture diagram showing read-flow and write-flow for semantic search using Amazon OpenSearch Service and Amazon SageMaker embedding vectors

Model evaluation and selection

Order history search uses an embedding model trained on Amazon-specific data. Domain-specific training is critical because the generated embedding vectors must work well for the business context to return quality results.

We used an LLM-as-a-judge methodology with Anthropic’s Claude on Amazon Bedrock to evaluate candidate models. Anthropic’s Claude received prompts containing anonymized item text and search phrases from customer order history, then filtered and ranked items by relevance. These results served as ground truth for comparison.

We evaluated models using standard ranking metrics:

Normalized Discounted Cumulative Gain (NDCG) – Measures ranking quality against ideal order
Mean Reciprocal Rank (MRR) – Considers position of first relevant item
Precision – Rates accuracy of retrieved results
Recall – Rates ability to retrieve all relevant items

This process helped us determine the best model.

Retrieval strategy: Customer-scoped comprehensive search

Order history search has two key requirements:

Search only through the requesting customer’s order history – We don’t want items from one customer’s order history showing up in search results for another customer
Search all of that customer’s history – We don’t want to miss showing an item that would have been relevant for the customer’s search phrase just because the search algorithm missed evaluating it for some reason

Our approach involves using OpenSearch Service to retrieve all items for the customer who issued the search query, calculating relevance scores for each of them against the search phrase, sorting by score, and returning top K results. This provides comprehensive results coverage for each customer.

Vector storage with OpenSearch Service

We used two OpenSearch Service features for efficient vector storage and search:

knn_vector datatype – Built-in support for storing embedding vectors. Existing domains can add this field type without reindexing, enabling exact kNN search across all records. We didn’t need approximate kNN because the number of records for most customers was small enough for exact kNN to scale.
Scripted scoring – Painless scripts compute vector similarity server-side, reducing client complexity and maintaining low latency.

Hybrid search

Hybrid search refers to combining the results of lexical and semantic search to benefit from the strengths of each. The hybrid query capabilities of OpenSearch Service simplify implementing hybrid search by letting clients specify both types of queries in a single request. OpenSearch Service runs both queries in parallel, merges their results, normalizes the relevance scores of the sub-queries, and sorts results by the provided sort order (relevance score by default) before returning them to clients.

This gives clients the best of both types of searches. For example, there are certain scenarios where the search phrase doesn’t make much sense semantically, like when customers search by their orderId values. Semantic search is not designed for such cases; these are best served using keyword matching.

The hybrid search functionality helped save implementation effort and potential latency increase for order history search.

Updating historical data

After the infrastructure has been set up, newly ingested records are persisted with the relevant embedding vectors and support semantic search on those records. However, when customers search, they typically search for products they had purchased earlier. Therefore, the system might not help improve customer experience much unless the older records are updated to include the relevant embeddings. The approach to populate this data depends on the scale of the problem at hand.

Releasing the change to minimize potential customer impact

Our final step was to release the change to clients in a manner such that the impact of any potential problems is as small as possible. There are multiple ways to do that, including:

Implementing semantic search in a manner such that any transient issues in the semantic search flow make the logic fall back to lexical-only search, instead of failing the request completely. Even if semantic search doesn’t execute, the system should still be able to return results of lexical search to the client, instead of empty results.
Gating the change such that the default behavior remains lexical-only search and clients who need the semantic search feature must pass an additional flag in the request, for example, which executes the semantic or hybrid flow only for those requests.
Keeping the new flow behind a feature flag during the initial period such that it could be turned off completely if some critical problem is detected.

Examples of improved customer experience

The following are some examples of customer interactions with Rufus that required Rufus to query the respective customer’s order history to answer their question and give them the required pieces of information.

The following screenshots show how semantic search picks up wooden spoons for a “sustainable utensils” query and different kinds of chargers despite not having the keyword “charger” in the title description, in the case of the wall connector.

Two side-by-side screenshots demonstrating semantic search results for sustainable utensils and chargers in an e-commerce interface.

The following screenshots show how semantic search picks up relevant results even though the title description doesn’t include the queried keywords.

Two side-by-side screenshots demonstrating semantic search results for healthy snacks and kids educational items in an e-commerce order interface.

The semantic search feature of order history search helped Rufus fetch them and show to the customers. Before semantic search, Rufus wasn’t able to show any results to customers for such queries.

Business impact

Our solution resulted in the following key business impacts:

Customer experience improvements – The solution achieved 10% improvement in query recall, increasing the percentage of searches that return relevant results. It also reduced customer service contacts for issues related to locating past orders.
Partner integration success – The solution strengthened natural language processing capabilities for Alexa and Rufus, enhancing their ability to interpret order history queries. It also reduced the need for reranking and postprocessing by partner teams. We improved query success rate by 20%, meaning more customer searches now return at least one relevant item. We also observed enhanced result coverage by 48%, with semantic search consistently surfacing additional relevant matches that lexical search would have missed.

Conclusion

In this post, we showed you how we evolved Amazon order history search to support semantic search capabilities. This transition involved using cutting-edge AI technology while working within existing infrastructure limitations to develop solutions that avoided disruption and maintained SLAs during the feature upgrade. The implementation also involved backfilling, where billions of documents were processed at rates multiple times higher than normal ingestion to compute embedding vectors for previously purchased items. This operation required careful engineering and took advantage of the resilience OpenSearch Service offers even under extreme load.

Beyond the immediate implementation, this foundation enables continued innovation in search technology. The embedding vectors framework can incorporate improved models as they become available, and the architecture supports expansion into new capabilities such as personalization and multi-modal search.

You can get started with exact k-NN search today following the instructions in Exact k-NN search. If you’re looking for a managed solution for your OpenSearch cluster, check out Amazon OpenSearch Service.

AWS Big Data Blog