AWS Database Blog
Full-text, exact-match, range, and hybrid search on Amazon ElastiCache
Amazon ElastiCache now supports real-time full-text, exact-match, numeric range, and hybrid search directly in your cache, without a separate search service. Applications can search terabytes of data with latency as low as microseconds and throughput up to millions of search operations per second for workloads that demand low-latency, scalable search across dynamic data. These new search capabilities provide developers the flexibility to query data already stored in ElastiCache by attributes beyond simple key-value lookups.
Exact-match search retrieves documents by matching precise values across text, tag, and numeric attributes, such as product names, categories, user IDs, or order numbers. Numeric range search filters documents by attributes such as price thresholds, date ranges, or transaction amounts. In addition to exact match, full-text search operates on text attributes with prefix matching for type-ahead suggestions, fuzzy matching for typo tolerance, and proximity matching for multi-term searches. You can combine these search types with vector similarity in a single hybrid query, capturing both precise terms and semantic intent to deliver more relevant results than either method alone. For vector workloads, ElastiCache for Valkey delivers the lowest latency vector search with the highest throughput and best price-performance at 95%+ recall among popular vector databases on AWS.
These search capabilities are available in ElastiCache version 9.0 for Valkey, alongside server-side aggregations for real-time analytics and reporting (see Announcing aggregations on ElastiCache). ElastiCache version 9.0 for Valkey also introduces hash field expiration for fine-grained TTL control on individual fields and up to 40% higher pipelined throughput. For the full release details, see Announcing Valkey 9.0 for ElastiCache.
In this post, we walk through the new search capabilities, show how they work together, and build a search and recommendation engine from scratch.
Power real-time search across your applications
We have heard from customers that as their applications scale, their search workflows need to preserve the low-latency experience users expect while supporting the throughput their businesses require. For example, payments platforms, streaming platforms, and online retailers all store millions of documents in ElastiCache and need to retrieve data by metadata attributes at microsecond latency. Additionally, customers tell us that as their workloads evolve, they need rich search queries to support new use cases on data already stored in ElastiCache. For example, applications often store user and session context, such as device type, session state, and user activity, in ElastiCache to deliver low-latency experiences. As workloads evolve, customers want to use that same data to power recommendation systems, which requires searching across these attributes.
ElastiCache now provides a range of methods to search and retrieve data with latency as low as microseconds at throughput of millions of queries per second (QPS). Data becomes searchable as soon as writes complete, so applications always query the most current results. These capabilities power use cases such as catalog discovery, recommendation engines, agentic memory, real-time leaderboards, and session lookups.
Catalog discovery: Online retailers and streaming platforms build search experiences that help their customers discover items across large catalogs. These platforms can combine text search on product names and descriptions with filters on brand, category, price, and rating in a single query to provide a faceted browsing experience. Prefix matching powers type-ahead search that loads suggestions as users type, delivering results in microseconds so the experience feels instant. You can make the search experience more robust with typo-tolerant search powered by fuzzy matching, which handles misspellings automatically. Fuzzy matching is more computationally expensive than exact matching, so running it on an in-memory search engine like ElastiCache keeps the experience fast and responsive.
Recommendation engines: As catalogs grow to millions of items, users expect digital platforms to provide personalized browsing experiences that surface relevant content and products quickly. Modern recommendation systems encode users and items as vector embeddings. These systems retrieve recommendation candidates from across millions of items through vector search combined with filters on names, descriptions, category, availability, and price range. Hybrid search supports this by combining text, tag, and numeric filters with vector similarity in a single query, so retrieved candidates are both semantically relevant and satisfy business constraints. A product page can show “similar items” by filtering to the same category and price band, then ranking by embedding similarity. You can extend this to personalized recommendations by building a user embedding from interaction history (using techniques such as mean pooling of viewed item embeddings, attention-based models, or sequential models) and passing it as the vector query to rank results by learned preferences.
Agentic memory: Agent memory lets agents learn from past interactions to improve response relevance without replaying full conversation history, reducing token costs. Agent memory systems store and retrieve memories by scope attributes (user, agent, session) and semantic relevance to the current interaction. With hybrid search, these systems combine scope and text filters with vector similarity in a single query. Agent memory sits on the live conversation path, demanding read-after-write visibility so newly stored facts are immediately retrievable and requires high concurrent reads and writes to retrieve and consolidate new memories. ElastiCache indexes memories synchronously on write, leverages multithreading, and delivers the highest throughput among popular vector databases on AWS at latency as low as microseconds. For a step-by-step implementation with ElastiCache and Mem0, see Build persistent memory for agentic AI applications with Mem0 Open Source, ElastiCache for Valkey, and Amazon Neptune Analytics.
ElastiCache for Valkey is a good fit when you want to build a self-managed memory layer or when you need a low latency, customizable in-memory store. If you prefer a fully managed approach, you can use Amazon Bedrock AgentCore Memory to handle memory for you.
Financial applications and leaderboards: Trading platforms and gaming applications store documents with numeric attributes such as transaction amounts, timestamps, risk scores, and player rankings that they need to retrieve at low latency. Numeric range queries on ElastiCache support fast lookups across these attributes, filtering by time windows, amount thresholds, or score bands. Gaming applications can maintain real-time leaderboards that reflect score updates immediately and support range queries like “top 100 players in my region.”
User and session management: Applications across industries store structured attributes such as session IDs, device types, and user handles within a cache for session management. These applications write session data to the cache as users log in and update it throughout the session lifecycle, requiring fast writes with immediate searchability. ElastiCache indexes updates synchronously, so searches against session attributes reflect the latest state without delay. Exact match search locates active sessions and entitlements by precise identifiers across millions of documents at sub-millisecond latency.
Building a search and recommendation engine with ElastiCache
To demonstrate these search types together, we build a search and recommendation engine for AnyCompany, an e-commerce platform that sells millions of products across electronics, beauty, and home goods. AnyCompany wants a search experience where shoppers can find products by keyword, narrow results with filters like brand and price range, and discover related items through similarity. AnyCompany stores its product catalog of over a million items in ElastiCache as hash-backed documents (derived from the Amazon ESCI dataset with real titles, descriptions, and brands for this example). The following code builds five query patterns on this data: type-ahead search, full-text matching, typo-tolerant matching, filtered browsing, and similar product recommendations.
Prerequisites
The examples in this post use Python with the valkey-py client library. To follow along, you need the following (estimated time: 30 minutes):
- An AWS accountand the AWS Command Line Interface (AWS CLI)
- An AWS IAM role with permissions to create ElastiCache replication groups
- An Amazon EC2 instance in the same VPC as your Amazon ElastiCache cluster (or any application that can connect to Amazon ElastiCache)
- Python 3.9 or later and valkey-py version 6.1.1 or later (pip install valkey)
The complete sample code for this post is available in the ElastiCache samples GitHub repository.
Set up an ElastiCache for Valkey cluster
You can create an ElastiCache cluster for search with the AWS Management Console or the AWS CLI. The following example uses the CLI. Search is available for ElastiCache version 9.0 for Valkey or later.
Create an index and load the data
Create an index called products_vec_index over the product data to make it searchable. Title and description are indexed as full-text searchable attributes that support keyword, prefix, and fuzzy matching. Brand and color are indexed as exact-match tags for filtered browsing. Price, rating, and stock are indexed as sortable numeric attributes for range queries and sorting. embedding is indexed as a vector attribute for semantic similarity search and recommendations.
Populate the ElastiCache store with the product dataset. The dataset is a subset of the 1.3M products, containing 137k products with titles, descriptions, brands, and pre-computed 64-dimensional embeddings derived from the Amazon ESCI Shopping Queries dataset. Clone the sample repository and run the loading script:
Type-ahead search
With the index and data in place, AnyCompany can build its search engine using FT.SEARCH queries that run against the index and return matching documents. As the user types in the search bar, the application sends prefix queries to show suggestions in real time.
Phrase matching
When the user presses enter, the application runs a full-text search across title and description. SLOP controls how far apart the terms can be and still match, so results with terms closer together rank higher:
Typo tolerant matching
If the query returns no results, the application retries with fuzzy matching to correct typos. Fuzzy matching is more expensive because it computes edit distances, so it works best as a fallback rather than the default:
Filtered browsing
When a shopper searches for a product and applies filters, the application combines text search with tag and numeric filters in a single query:
Similar product recommendations
To power “similar products” recommendations, AnyCompany uses hybrid search with text filters to narrow results to the relevant product type, and vector search to rank the filtered results by similarity to the product being viewed:
You can extend this pattern to personalize results using embedding-based retrieval. Build a user embedding from a shopper’s interaction history using techniques such as mean pooling of viewed item embeddings, attention-based models, or sequence models. Pass the user embedding as the vector query instead of a single product embedding, and KNN scoring ranks results by the shopper’s learned preferences across the filtered set.
Performance under the hood
We measured latency and throughput for text and numeric query types on an ElastiCache for Valkey cluster with one shard, containing a single cache.r7g.2xlarge node, without replicas. The dataset contains approximately 1 GB of data with 1.3 million product documents along with text, tag, numeric, and vector attributes as described in the example above. We measured latency and throughput using valkey-benchmark.
| Query Type | P50 (ms)
1 client |
P99 (ms)
1 client |
QPS
300 clients |
| Text search (exact-match) | 0.135 | 0.255 | 60,000 |
| Prefix matching (type-ahead search) | 0.135 | 0.279 | 57,692 |
| Numeric range (filters on stock/rating) | 0.175 | 0.199 | 24,087 |
| Hybrid query – text + numeric range (faceted browsing) | 0.135 | 0.295 | 52,632 |
For vector search latency and throughput benchmarks, see Announcing vector search for Amazon ElastiCache. The example above tests performance on a single cache.r7g.2xlarge node. You can scale read throughput by adding replicas (up to 5 per shard) and shards to reach millions of QPS. Each replica carries its own index and can serve searches independently, though replica reads are eventually consistent. If low latency is your priority over data capacity, use single-slot indexes to keep all indexed data on one shard and avoid fan-out overhead entirely. You can add shards to increase the memory capacity without any client code changes.
ElastiCache automatically indexes data changes in real time and the engine indexes each write before acknowledging it. So any search after that point returns the updated data, providing read-after-write consistency. This consistency behavior holds even for multi-key transactions and Lua scripts. Because Valkey utilizes multi-threading where indexing runs across multiple threads, ElastiCache can provide high performance for search queries even for high write throughput workloads.
Clean up
If you created an ElastiCache cluster for this walkthrough and no longer need it, delete the cluster using the following AWS CLI command to avoid incurring future charges:
Conclusion
In this post, we walked through full-text, exact-match, numeric range, and hybrid search on ElastiCache. We covered use cases for these search types and showed how to build a search and recommendation system. Full-text, exact-match, numeric range, and hybrid search are available in all commercial AWS Regions, AWS GovCloud (US) Regions, and China Regions, for node-based clusters running ElastiCache version 9.0 for Valkey at no additional cost. Valkey is the most permissive open source and vendor-neutral alternative to Redis and the recommended engine on ElastiCache. To get started, create a new Valkey 9.0 or later cluster or upgrade an existing cluster using the AWS Management Console, AWS SDK, or AWS CLI. To learn more, visit the ElastiCache documentation. For questions and feedback, visit AWS re:Post for ElastiCache
About the authors
Our sincere thanks to Allen Samuels for his vision, guidance, and hands-on contributions throughout the project.