AWS Database Blog

Full-text, exact-match, range, and hybrid search on Amazon ElastiCache

Amazon ElastiCache now supports real-time full-text, exact-match, numeric range, and hybrid search directly in your cache, without a separate search service. Applications can search terabytes of data with latency as low as microseconds and throughput up to millions of search operations per second for workloads that demand low-latency, scalable search across dynamic data. These new search capabilities provide developers the flexibility to query data already stored in ElastiCache by attributes beyond simple key-value lookups.

Exact-match search retrieves documents by matching precise values across text, tag, and numeric attributes, such as product names, categories, user IDs, or order numbers. Numeric range search filters documents by attributes such as price thresholds, date ranges, or transaction amounts. In addition to exact match, full-text search operates on text attributes with prefix matching for type-ahead suggestions, fuzzy matching for typo tolerance, and proximity matching for multi-term searches. You can combine these search types with vector similarity in a single hybrid query, capturing both precise terms and semantic intent to deliver more relevant results than either method alone. For vector workloads, ElastiCache for Valkey delivers the lowest latency vector search with the highest throughput and best price-performance at 95%+ recall among popular vector databases on AWS.

These search capabilities are available in ElastiCache version 9.0 for Valkey, alongside server-side aggregations for real-time analytics and reporting (see Announcing aggregations on ElastiCache). ElastiCache version 9.0 for Valkey also introduces hash field expiration for fine-grained TTL control on individual fields and up to 40% higher pipelined throughput. For the full release details, see Announcing Valkey 9.0 for ElastiCache.

In this post, we walk through the new search capabilities, show how they work together, and build a search and recommendation engine from scratch.

Power real-time search across your applications

We have heard from customers that as their applications scale, their search workflows need to preserve the low-latency experience users expect while supporting the throughput their businesses require. For example, payments platforms, streaming platforms, and online retailers all store millions of documents in ElastiCache and need to retrieve data by metadata attributes at microsecond latency. Additionally, customers tell us that as their workloads evolve, they need rich search queries to support new use cases on data already stored in ElastiCache. For example, applications often store user and session context, such as device type, session state, and user activity, in ElastiCache to deliver low-latency experiences. As workloads evolve, customers want to use that same data to power recommendation systems, which requires searching across these attributes.

ElastiCache now provides a range of methods to search and retrieve data with latency as low as microseconds at throughput of millions of queries per second (QPS). Data becomes searchable as soon as writes complete, so applications always query the most current results. These capabilities power use cases such as catalog discovery, recommendation engines, agentic memory, real-time leaderboards, and session lookups.

Catalog discovery: Online retailers and streaming platforms build search experiences that help their customers discover items across large catalogs. These platforms can combine text search on product names and descriptions with filters on brand, category, price, and rating in a single query to provide a faceted browsing experience. Prefix matching powers type-ahead search that loads suggestions as users type, delivering results in microseconds so the experience feels instant. You can make the search experience more robust with typo-tolerant search powered by fuzzy matching, which handles misspellings automatically. Fuzzy matching is more computationally expensive than exact matching, so running it on an in-memory search engine like ElastiCache keeps the experience fast and responsive.

Recommendation engines: As catalogs grow to millions of items, users expect digital platforms to provide personalized browsing experiences that surface relevant content and products quickly. Modern recommendation systems encode users and items as vector embeddings. These systems retrieve recommendation candidates from across millions of items through vector search combined with filters on names, descriptions, category, availability, and price range. Hybrid search supports this by combining text, tag, and numeric filters with vector similarity in a single query, so retrieved candidates are both semantically relevant and satisfy business constraints. A product page can show “similar items” by filtering to the same category and price band, then ranking by embedding similarity. You can extend this to personalized recommendations by building a user embedding from interaction history (using techniques such as mean pooling of viewed item embeddings, attention-based models, or sequential models) and passing it as the vector query to rank results by learned preferences.

Agentic memory: Agent memory lets agents learn from past interactions to improve response relevance without replaying full conversation history, reducing token costs. Agent memory systems store and retrieve memories by scope attributes (user, agent, session) and semantic relevance to the current interaction. With hybrid search, these systems combine scope and text filters with vector similarity in a single query. Agent memory sits on the live conversation path, demanding read-after-write visibility so newly stored facts are immediately retrievable and requires high concurrent reads and writes to retrieve and consolidate new memories. ElastiCache indexes memories synchronously on write, leverages multithreading, and delivers the highest throughput among popular vector databases on AWS at latency as low as microseconds. For a step-by-step implementation with ElastiCache and Mem0, see Build persistent memory for agentic AI applications with Mem0 Open Source, ElastiCache for Valkey, and Amazon Neptune Analytics.

ElastiCache for Valkey is a good fit when you want to build a self-managed memory layer or when you need a low latency, customizable in-memory store. If you prefer a fully managed approach, you can use Amazon Bedrock AgentCore Memory to handle memory for you.

Financial applications and leaderboards: Trading platforms and gaming applications store documents with numeric attributes such as transaction amounts, timestamps, risk scores, and player rankings that they need to retrieve at low latency. Numeric range queries on ElastiCache support fast lookups across these attributes, filtering by time windows, amount thresholds, or score bands. Gaming applications can maintain real-time leaderboards that reflect score updates immediately and support range queries like “top 100 players in my region.”

User and session management: Applications across industries store structured attributes such as session IDs, device types, and user handles within a cache for session management. These applications write session data to the cache as users log in and update it throughout the session lifecycle, requiring fast writes with immediate searchability. ElastiCache indexes updates synchronously, so searches against session attributes reflect the latest state without delay. Exact match search locates active sessions and entitlements by precise identifiers across millions of documents at sub-millisecond latency.

Building a search and recommendation engine with ElastiCache

To demonstrate these search types together, we build a search and recommendation engine for AnyCompany, an e-commerce platform that sells millions of products across electronics, beauty, and home goods. AnyCompany wants a search experience where shoppers can find products by keyword, narrow results with filters like brand and price range, and discover related items through similarity. AnyCompany stores its product catalog of over a million items in ElastiCache as hash-backed documents (derived from the Amazon ESCI dataset with real titles, descriptions, and brands for this example). The following code builds five query patterns on this data: type-ahead search, full-text matching, typo-tolerant matching, filtered browsing, and similar product recommendations.

Prerequisites

The examples in this post use Python with the valkey-py client library. To follow along, you need the following (estimated time: 30 minutes):

The complete sample code for this post is available in the ElastiCache samples GitHub repository.

Set up an ElastiCache for Valkey cluster

You can create an ElastiCache cluster for search with the AWS Management Console or the AWS CLI. The following example uses the CLI. Search is available for ElastiCache version 9.0 for Valkey or later.

aws elasticache create-replication-group \
--replication-group-id AnyCompany-cache \
--replication-group-description "AnyCompany Valkey cluster" \
--engine valkey \
--engine-version 9.0 \
--transit-encryption-enabled \
--cache-node-type cache.r7g.large \
--replicas-per-node-group 0

Create an index and load the data

Create an index called products_vec_index over the product data to make it searchable. Title and description are indexed as full-text searchable attributes that support keyword, prefix, and fuzzy matching. Brand and color are indexed as exact-match tags for filtered browsing. Price, rating, and stock are indexed as sortable numeric attributes for range queries and sorting. embedding is indexed as a vector attribute for semantic similarity search and recommendations.

import gzip
import json
import struct
import urllib.request
import valkey
from valkey.commands.search.field import TextField, TagField, NumericField, VectorField
from valkey.commands.search.indexDefinition import IndexDefinition, IndexType

# <Input required>: Insert your ElastiCache cluster's endpoint
VALKEY_HOST = "placeholder_cluster.cnxa6h.clustercfg.use1.cache.amazonaws.com"

client = valkey.Valkey(host=VALKEY_HOST, port=6379, decode_responses=False, ssl=True,
    ssl_cert_reqs="required")

# Create the search index with text, tag, numeric, and vector fields
try: client.execute_command("FT.DROPINDEX", "products_vec_index")
except: pass

client.ft("products_vec_index").create_index(
    fields=[
        TextField("title"),
        TextField("description"),
        TagField("brand", separator=","),
        TagField("color", separator=","),
        NumericField("price"),
        NumericField("rating"),
        NumericField("stock"),
        VectorField("embedding", "FLAT", {
            "TYPE": "FLOAT32",
            "DIM": 64,
            "DISTANCE_METRIC": "COSINE"})],
    definition=IndexDefinition(prefix=["pv:"], index_type=IndexType.HASH))

Populate the ElastiCache store with the product dataset. The dataset is a subset of the 1.3M products, containing 137k products with titles, descriptions, brands, and pre-computed 64-dimensional embeddings derived from the Amazon ESCI Shopping Queries dataset. Clone the sample repository and run the loading script:

git clone https://github.com/aws-samples/amazon-elasticache-samples.git
cd amazon-elasticache-samples/blogs/elasticache-valkey/fts-benchmark

# <Input required>: update VALKEY_HOST variable with your cluster's endpoint and run:
python load_products_blog.py

Type-ahead search

With the index and data in place, AnyCompany can build its search engine using FT.SEARCH queries that run against the index and return matching documents. As the user types in the search bar, the application sends prefix queries to show suggestions in real time.

from valkey.commands.search.query import Query

results = client.ft("products_vec_index").search(
    Query("wire*").return_fields("title").paging(0, 5))

# User has typed "wire" - prefix match shows suggestions
# Output:
# [{'title': 'xyz Kids Wireless Headphones'},
# ...
# ...
#  {'title': 'Santas Wire Christmas Lighting Storage Bag'}]

Phrase matching

When the user presses enter, the application runs a full-text search across title and description. SLOP controls how far apart the terms can be and still match, so results with terms closer together rank higher:

# User submits "wireless headphones"
# SLOP 2 allows up to 2 words between terms
results = client.ft("products_vec_index").search(
    Query("wireless headphones") 
    .slop(2) 
    .return_fields("title", "brand", "price").paging(0, 5))

# Output:
# [{'title': 'xyz Studio3 Wireless Headphones - Gray (Renewed)', 
#    'brand': 'xyz', 'price': '1928.28'},
#  ...
#  {'title': 'xyz TUNE 220TWS - True Wireless in-Ear Headphone - Blue', 
#   'brand': 'xyz', 'price': '1121.23'}]

Typo tolerant matching

If the query returns no results, the application retries with fuzzy matching to correct typos. Fuzzy matching is more expensive because it computes edit distances, so it works best as a fallback rather than the default:

# Retry with fuzzy matching for "wireles headphoens"
results = client.ft("products_vec_index").search(
    Query("%wireles% %headphoens%")
    .return_fields("title", "brand", "price").paging(0, 5))

# Output:
# [{'title': 'xyz Comfort 35 Wireless Headphones, Noise Cancelling - Silver (Renewed)',
#   'brand': 'xyz', 'price': '1811.75'},
#  ...
#  ...
#  {'title': 'xyz SoundSport Wireless Headphones, Black + Charging Case', 
#   'brand': 'xyz', 'price': '568.47'}]

Filtered browsing

When a shopper searches for a product and applies filters, the application combines text search with tag and numeric filters in a single query:

# User searches "headphones" and filters by price $50-$150, rating 4.0+
results = client.ft("products_vec_index").search(
    Query("@title:headphones @price:[50 150] @rating:[4.0 5.0]")
    .return_fields("title", "brand", "price", "rating")
    .paging(0, 5))


# Output:
# [{'title': 'xyz WH1000XM3 Bluetooth Wireless Noise Canceling Headphones', 
#   'brand': 'xyz', 'price': '102.29', 'rating': '4.8'},
#  ...
#  ...
#  {'title': 'Bluetooth Earbuds xyz SoundLink .. in Ear Headphones', 
#   'brand': 'xyz', 'price': '125.45', 'rating': '4.5'}]

Similar product recommendations

To power “similar products” recommendations, AnyCompany uses hybrid search with text filters to narrow results to the relevant product type, and vector search to rank the filtered results by similarity to the product being viewed:

# Get the embedding of the product the user is currently viewing
# for example - "Kids Headphones with Microphone 2 Pack"
product_embedding = client.hget("pv:B0825SSTMN", "embedding") 

# Hybrid: text pre-filter "headphones" + vector KNN for similarity ranking
results = client.ft("products_vec_index").search(
    Query("@title:headphones =>[KNN 5 @embedding $vec AS score]")
    .return_fields("title", "brand", "price", "score")
    .dialect(2),
    query_params={"vec": product_embedding})

# Output:
#  {'title': 'xyz I35 Kid Headphones with Microphone Volume Limited ...', 
#   'brand': 'xyz', 'price': '155.06', 'score': '0.293'},
#  ...
#  ...
#  {'title': 'Kids Headphones with Pouch, xyz Wired ...', 
#   'brand': 'xyz', 'price': '957.95', 'score': '0.351'}]

You can extend this pattern to personalize results using embedding-based retrieval. Build a user embedding from a shopper’s interaction history using techniques such as mean pooling of viewed item embeddings, attention-based models, or sequence models. Pass the user embedding as the vector query instead of a single product embedding, and KNN scoring ranks results by the shopper’s learned preferences across the filtered set.

Performance under the hood

We measured latency and throughput for text and numeric query types on an ElastiCache for Valkey cluster with one shard, containing a single cache.r7g.2xlarge node, without replicas. The dataset contains approximately 1 GB of data with 1.3 million product documents along with text, tag, numeric, and vector attributes as described in the example above. We measured latency and throughput using valkey-benchmark.

Query Type P50 (ms)
1 client
P99 (ms)
1 client
QPS
300 clients
Text search (exact-match) 0.135 0.255 60,000
Prefix matching (type-ahead search) 0.135 0.279 57,692
Numeric range (filters on stock/rating) 0.175 0.199 24,087
Hybrid query – text + numeric range (faceted browsing) 0.135 0.295 52,632

For vector search latency and throughput benchmarks, see Announcing vector search for Amazon ElastiCache. The example above tests performance on a single cache.r7g.2xlarge node. You can scale read throughput by adding replicas (up to 5 per shard) and shards to reach millions of QPS. Each replica carries its own index and can serve searches independently, though replica reads are eventually consistent. If low latency is your priority over data capacity, use single-slot indexes to keep all indexed data on one shard and avoid fan-out overhead entirely. You can add shards to increase the memory capacity without any client code changes.

ElastiCache automatically indexes data changes in real time and the engine indexes each write before acknowledging it. So any search after that point returns the updated data, providing read-after-write consistency. This consistency behavior holds even for multi-key transactions and Lua scripts. Because Valkey utilizes multi-threading where indexing runs across multiple threads, ElastiCache can provide high performance for search queries even for high write throughput workloads.

Clean up

If you created an ElastiCache cluster for this walkthrough and no longer need it, delete the cluster using the following AWS CLI command to avoid incurring future charges:

aws elasticache delete-replication-group --replication-group-id AnyCompany-cache

Conclusion

In this post, we walked through full-text, exact-match, numeric range, and hybrid search on ElastiCache. We covered use cases for these search types and showed how to build a search and recommendation system. Full-text, exact-match, numeric range, and hybrid search are available in all commercial AWS Regions, AWS GovCloud (US) Regions, and China Regions, for node-based clusters running ElastiCache version 9.0 for Valkey at no additional cost. Valkey is the most permissive open source and vendor-neutral alternative to Redis and the recommended engine on ElastiCache. To get started, create a new Valkey 9.0 or later cluster or upgrade an existing cluster using the AWS Management Console, AWS SDK, or AWS CLI. To learn more, visit the ElastiCache documentation. For questions and feedback, visit AWS re:Post for ElastiCache


About the authors

Chaitanya Nuthalapati

Chaitanya Nuthalapati

Chaitanya is a Senior Technical Product Manager in AWS In-Memory Database Services, focused on Amazon ElastiCache for Valkey. Previously, he built solutions with generative AI, machine learning, and graph networks. Off the clock, Chaitanya is busy collecting hobbies, which currently include tennis, skateboarding, and paddle-boarding.

Karthik Subbarao

Karthik Subbarao

Karthik is a Senior Software Engineer at Amazon ElastiCache and an active contributor to the open-source Valkey project. He is passionate about distributed systems, databases, Rust, and, in general, innovating through software development / technology.

Ian Childress

Ian Childress

Ian is a Software Development Manager at AWS, where he leads the team building Valkey modules and integrations, including full-text search infrastructure. Outside of work, Ian plays hockey and is a relentless tinkerer who writes high-performance systems in Go. Come summer, he trades the ice for the lake, wake surfing with his family every weekend.

Eran Balan

Eran Balan

Eran is a Specialist Solutions Architect at AWS, focused on in-memory databases and Amazon ElastiCache. He works with customers across EMEA to design caching architectures, optimize performance, and navigate migrations — from Redis OSS to Valkey and beyond. Off the clock, Eran enjoys catching a good musical or theatre show, hiking, and open-water swimming.

Our sincere thanks to Allen Samuels for his vision, guidance, and hands-on contributions throughout the project.