Overview
Developers building customer-facing, large-scale search, Retrieval-Augmented Generation (RAG), and recommendation systems face a core challenge: retrieving and operationalizing data in real time. Data is fragmented across formats, including PDFs, free text, and semi-structured sources. This makes it difficult to unify, index, and serve data efficiently to applications and end users. Without the right infrastructure, applications become slow, brittle, and costly to scale. Vespa addresses this by unifying structured, unstructured, vector, and tensor data in a single system, enabling efficient, real-time retrieval and ranking at scale.
The Vespa AI search platform is built for real-time retrieval, ranking, and inference on AWS, powering customer-facing applications including search, RAG, recommendations, and personalization. It unifies structured, unstructured, vector, and tensor data to deliver fast, accurate, and highly relevant results at millisecond latency. Vespa is purpose-built for customer-facing experiences where latency, relevance, and scale directly impact engagement, conversion, and revenue.
By combining full-text search, vector search, and machine-learned ranking within a single query pipeline, Vespa delivers consistent, high-quality results across every user interaction. Its tensor-based ranking architecture enables applications to evaluate multiple signals simultaneously, including semantic meaning, behavioral data, and real-time context, enabling results to continuously adapt to user intent and business priorities. Ranking and inference run directly within the engine, eliminating external pipelines and enabling real-time updates to content, models, and business signals.
Running on AWS, Vespa delivers elastic scalability, high availability, and fully managed infrastructure through Vespa Cloud. Automated provisioning, scaling, monitoring, and upgrades reduce operational overhead while supporting high-throughput, low-latency workloads. Vespa is trusted in production by organizations including Perplexity, Spotify, and Yahoo to power large-scale, real-time search, recommendation, and AI applications. Developers use Vespa to build responsive, intelligent applications that enhance the customer experience, improve conversion rates, and drive measurable business outcomes.
Highlights
- Real-time performance and efficiency: Reduce latency and network overhead with co-located data and computation, enabling fast, resource-efficient retrieval at any scale.
- Relevance with hybrid search and ML ranking: Deliver accurate, contextual results using hybrid search and distributed machine-learned ranking across structured, unstructured, and vector data.
- Elastic scalability on AWS: Scale clusters up or down in real time while maintaining low latency, high throughput, and consistent uptime for production workloads.
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Trust Center
Buyer guide

Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/unit |
|---|---|---|
Vespa Units | Vespa Units consumed | $0.01 |
Vendor refund policy
See the Vespa Cloud Terms of Service.
Custom pricing options
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Support
Vendor support
See https://cloud.vespa.ai/support for support details.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Standard contract
Customer reviews
POC has delivered semantic and hybrid search that focuses on architecture and query flexibility
What is our primary use case?
I have been using Vespa at a POC level, and we ran it for a period of roughly less than a month.
The main purpose of using Vespa was to run a POC to introduce semantic search into the company, and we reviewed and researched various solutions available on the market. As a result, Vespa seemed interesting. The company was using Elasticsearch, and we were interested in how it compared to Elasticsearch in terms of infrastructure architecture, usage purpose, and handling embeddings within the engine in the way we wanted, so we decided to try Vespa.
For the query part, we focused on creating queries using embeddings with the E5 Multilingual Embedder on top of the existing lexical search of product titles and product details for semantic search. We wrote the queries with the purpose of providing hybrid search.
My company deployed Vespa in a cloud environment.
What is most valuable?
The most outstanding features and characteristics of Vespa include an architecture that lets you focus on implementing features. The function that automatically manages sharding and shards is excellent, and the flexibility of the server cluster and infrastructure architecture is outstanding. The indexing performance was greatly improved by using HTTP/2.
In the pilot phase, I expected that, rather than organizational changes, in terms of service on the actual business platform and from the perspective of customer usability, the UI features and planning features could have been provided differently if it had been adopted.
What needs improvement?
There were aspects of Vespa that needed improvement, such as if a monitoring dashboard were provided—and not only the monitoring dashboard, but also related supplementary tools for the administrative aspects—that would be better.
The most challenging part of adopting or using Vespa was that I wished there were pages where we could easily access advanced deployment strategies, usage, query strategies, and related information.
I felt that the official documentation itself was good. From a learning perspective, there might have been a shortage of examples that would make it easier to approach.
For how long have I used the solution?
I have been working in my current field for around eight years.
What do I think about the stability of the solution?
I would evaluate Vespa's stability as very high.
What do I think about the scalability of the solution?
I did not have direct experience with Vespa's scalability or large-scale data processing performance, but based on practical experience, I indirectly confirmed its potential when applied to actual services.
I confirmed that Vespa has a structure that allows cluster scale-out, so we were very satisfied with its scalability and large-scale data handling.
Which solution did I use previously and why did I switch?
Before adopting Vespa, we were considering other solutions like Qdrant and Milvus , and we decided to evaluate Vespa because we felt those two solutions were somewhat lacking in terms of scalability.
What other advice do I have?
Vespa had a positive impact on our company, but we do not have any cases of actual production application.
I hope that Vespa will surpass Elasticsearch in the global AI embedder and embedding market, and in the AI DB and vector DB market. I gave this review a rating of eight out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Vector search has improved e‑commerce relevance but setup and learning curve still need work
What is our primary use case?
My main use case for Vespa is implementing it as the back-end search engine for an e-commerce site, where we have about six million products, or six million SKUs, that we are selling. I implemented Vespa as an alternative for Elasticsearch.
Using Vespa for the e-commerce site involved utilizing it as the backend search engine to replace Elasticsearch, which we felt was not doing us justice. The very first thing I did was convince my CIO to try out Vespa. We did a quick proof of concept, engaged with the right people through the Vespa Slack channels, and then we did the actual implementation, including A/B testing it against the previously running fully optimized Elasticsearch pipeline.
What is most valuable?
One of the best features that Vespa offers is natively handling embeddings or vectors, along with its capability for really fast searches. The powerful DSL provided by Vespa allows you to define search calculations, which I used extensively over the period of one year.
Vector search is definitely the biggest selling point for Vespa. Even though Elasticsearch has vector search capabilities, it is not as powerful as what Vespa offers. Since Vespa natively supports sparse embeddings, that was an advantage, but in the end, I opted for dense embeddings using Google's Gemma's embeddings, as I worked on retraining that model on our dataset.
Vespa's DSL includes many built-in functions, which is quite powerful. Even without embedding features, the DSL shines. However, it takes considerable time for indexing—initial runs took almost a whole day to create the embeddings and push them into Vespa, and it required adequate resources to run. Another issue was its inability to handle synonyms in the same manner as Elasticsearch.
What needs improvement?
Vespa definitely had its own set of challenges. It was really hard to get into initially, especially when I started implementing it in 2024 along with one junior employee, and the lack of documentation made it difficult. I aimed for an implementation with ColBERT, a sparse embedding mechanism, which I believed would fit well for e-commerce. We went through iterations during A/B testing because the initial set did not work as expected, which extended the process to about one and a half years.
Vespa has a considerable learning curve, making it challenging for most people to get into, and it is also expensive, which can deter startups or those with smaller budgets from using it. Community support was decent, and we turned to it for clarifications. However, substantial improvements in documentation are necessary, especially more examples for handling DSL effectively. Having a runtime testing feature would greatly facilitate quick iterations.
For how long have I used the solution?
I have been using Vespa for more than about a year and a half.
What do I think about the stability of the solution?
In terms of stability and scalability, Vespa performed well. While it took some attempts to stabilize, I managed to scale effectively with the traffic we experienced and the servers we operated.
How are customer service and support?
The customer support I received was pretty good, mainly through interactions in the Slack community, where I typically got responses within hours or by the next day, leading me to rate them an eight or maybe even nine.
Which solution did I use previously and why did I switch?
Before choosing Vespa, I explored a few other search engine solutions, starting with Orama, a Node.js and TypeScript-based search engine that struggled to handle six million SKUs, and then Typesense, which aimed for instant searches but failed to accommodate the numerous attributes I needed for sparse data. That led me to Vespa, which met my expectations.
How was the initial setup?
The setup cost is definitely huge, and pricing is also steep. In terms of licensing, it seems generous for those who do not want to engage with Vespa's hosted services.
What about the implementation team?
I have very little experience with Vespa's governance and security, but I found it generally robust, despite lacking extensive engagement in that area. We deploy Vespa on AWS , which qualifies it as a public cloud solution.
We did not purchase Vespa through the AWS Marketplace . Instead, we deployed the free version of Vespa that was available.
What was our ROI?
I would not agree with seeing a return on investment since we had not fully deployed Vespa into production, with only two people working on it for approximately a year and a half, which did not require a large team. However, we spent about two thousand to three thousand dollars per month on AWS while using Vespa, which was higher compared to around one thousand to one thousand five hundred dollars per month for Elasticsearch, although we saw some slight improvements in key metrics before stopping the A/B test.
What's my experience with pricing, setup cost, and licensing?
The setup cost is definitely huge, and pricing is also steep. In terms of licensing, it seems generous for those who do not want to engage with Vespa's hosted services.
What other advice do I have?
I would rate Vespa a six, as it is a powerful tool with great potential in terms of search engine capabilities, but the steep learning curve and initial setup costs are significant downsides.
I chose six because of the steep learning curve and the substantial initial costs involved with setting up Vespa. If it were feasible for people with limited budgets, even as low as fifty dollars a month, it would be more appealing.
While conducting A/B testing, Vespa seemed to be performing slightly better than Elasticsearch, especially in search relevancy within live production systems, and its performance was decent. Comparing raw Elasticsearch text-based search against Vespa's vector-based and text-based search, we were already recommending Vespa to several peer companies.
During A/B testing, looking at conversion rates, search-to-basket ratios, and add-to-basket ratios showed improvement until we shut it down. It took several iterations to get the results, particularly after switching to Embedding Gemma, emphasizing that the quality of embedding used heavily influenced the outcome.
Nothing else comes to mind regarding improvements needed for Vespa.
I would not suggest Vespa unless you are an enterprise due to the steep learning curve and significant infrastructure costs involved. My overall rating for Vespa is six out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Hybrid search has improved document retrieval and now supports high-volume conversational queries
What is our primary use case?
My full name is Shubhank and I am serving in Redblink Technologies in Mohali where I have been doing integration work related to AI. I implemented RAG.
The previous year, we were using Quadrant as a vector store. With that, we were creating many collections there. Our company discussed internally and decided to move to Vespa . This was about six or eight months ago. We are using Vespa in our RAG pipeline.
We have implemented a RAG pipeline where we have document retrieval. Users can chat with their documents. We are breaking down our documents into meaningful chunks using LangChain4j and feeding that directly into Vespa as a vector store. Later, while the user chats or starts a chat with the document, we can retrieve according to the user's prompt.
We have such more use cases. We have a client, CPA Pilot , where there are many text documents, so we directly chunk those documents. There are very large documents, so in Quadrant, the collections were almost full. Inside Vespa, there is no system of collections, so that also helped us. We use self-hosted Vespa for that particular client and we are chunking down the long documents using LangChain4j and hitting Vespa to store it. During retrieval, we get good results and get proper relevant scores based on the user's query.
What is most valuable?
Earlier we used Quadrant where there is only vector embeddings and search on that basis. Vespa provided us a highly scalable and more reliable platform. Vespa also provides BM25 text search and embedding search. The main reason to move to Vespa is hybrid search.
We have explored BM25 and hybrid search. We have implemented direct search and also embedding search by creating embeddings and storing them in Vespa. We can also use direct text search based on the user's query. This way we have implemented hybrid search and get the user's response.
It works very well because we have many documents, plus a single document is also very long. Vespa is very good with retrieval and high-volume queries.
Earlier we used Quadrant and now we are moving with Vespa. In Quadrant, we have a concept of collection where for every assistant, we create a new collection, and every document in that assistant goes to that particular collection. In Vespa, there is no concept of collection so we have to separate it on the basis of that assistant. That makes it unique. We were familiar with only the single, single, single collection for that specific assistant. With Vespa, we have all in one place and get it separated out on the basis of assistants and the environment we are using.
What needs improvement?
We want Vespa to implement some UI features so that we can visualize how our data goes and what embeddings it stores. The main thing Vespa has to implement is the UI. Right now, we are hitting the API and getting the results in Postman.
One more improvement we want is an option in Vespa for getting some suggestions from Vespa. If you are storing a document in the vector store, Vespa could suggest some information you have to store for that particular document. My suggestion is going with implementing some features related to agentic AI. We have a couple of agents, so Vespa could decide which agent is best suitable for this user's query. That would be helpful.
For how long have I used the solution?
I have been working here since last year.
What do I think about the stability of the solution?
We have not specifically calculated the metrics, but until now we do not feel any issues with Vespa.
What do I think about the scalability of the solution?
Vespa is stable and it is also scalable. We have many documents and a single document also has a lot of content inside it. Vespa stores it in a very significant and optimized way.
How are customer service and support?
I do not have that much involvement with the customer support. I have raised some questions on Slack for the Vespa community and received responses in 24 hours. I have discussed my concerns and questions. The community support is very good.
Which solution did I use previously and why did I switch?
We were initially trying to use Pinecone , but after a lot of discussion and research, we decided to go with Vespa.
How was the initial setup?
AWS provides us with more analytics with Vespa, such as how it is performing on the servers. It is easy because of their well-documented documentation.
We are using self-hosted Vespa in our AWS servers.
The setup process is fine. It helps us save money and we got very good responses from the users.
What about the implementation team?
We are moving with Vespa. In Quadrant, we have a concept of collection where for every assistant, we create a new collection, and every document in that assistant goes to that particular collection.
What's my experience with pricing, setup cost, and licensing?
The cost part is not at a high point or at a low point. It is somewhere in the middle. That also helps us to sustain it.
Which other solutions did I evaluate?
We were initially trying to use Pinecone , but after a lot of discussion and research, we decided to go with Vespa.
What other advice do I have?
For anyone who wants to use a vector store, they should do research on their end, and if nothing comes up after discussion and research, I recommend using Vespa because they have good reliability. The main thing is the speed. The retrieval speed is very good. I recommend Vespa for systems to get integrated with.
Vespa is very good and it improves our product, and we got more clients. We got very good results and very good relevance. This mainly depends on how you can design the Vespa document schemas. The document schema design determines how your relevance will come and how your retrieval will be done. The feedback for how Vespa responds is good and also fast. We are using Amazon Web Services (AWS) and it is easy because of their well-documented documentation. I give Vespa a rating of nine out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Advanced ranking has improved candidate matching and now simplifies end‑to‑end hiring workflows
What is our primary use case?
I use Vespa as a vector database for ranking and matching. I have jobs and candidates indexed in Vespa , which is a vector database. When I have a job and need to get the first 50 candidates who match that job, a normal vector search would not retrieve the first 50 because I may need to filter or rank based on some features and fields. Vespa helps by allowing me to first select the first 200, and then within the 200, I rank the first 50 based on certain criteria.
What is most valuable?
The best feature to me is the LTR feature, the ranking feature to be specific. For most other vector databases, you perform the query and then apply your own logic on LTR outside the database, but with Vespa, these two operations can be done within the database. This means that latency is reduced and bottlenecks are reduced if these two operations can be done within the database. It is more having your filter and sort operation within the database itself. I also appreciate the filter part that Vespa offers.
The results have been better with Vespa. The matching has been far better with Vespa. Before, I was using other solutions such as PGVector and Pinecone and Weaviate, but they were harder to integrate, meaning more work for the developer. With Vespa, the integration on the development side has been easier.
I cannot say that the improved results are directly tied to Vespa alone because many iterations have been made, and this includes the general architecture. The fact that Vespa does its own indexing, and I can just receive a string of text, index it and store it as a vector, is remarkable.
What needs improvement?
The integration is actually a pain. If something could be done to make it easier, I would really appreciate it. The reason I am saying this is that if I have a migration script that I need to run in the database, it is difficult to run migration scripts on Vespa because each time I run a migration script, I have to restart Vespa. This makes the CI process a little bit difficult.
A UI would be nice to have. It is not something I thought of earlier, but now that I am thinking of it, I believe a UI would be beneficial, similar to how Neo4j offers it. Of course, I have not looked into it extensively, so I do not know if it offers a UI because for my use case, I did not need the UI.
The documentation could also be improved, although the documentation was quite easy to follow for me. I do not know if it is a skill issue or not. For beginners, for someone just getting into vector databases and they just discover Vespa, I believe it will be somewhat harder for them. If the documentation can be made more beginner-friendly, it would be better.
The migration script and the amount of resources Vespa requires is significant.
The embedding is good. I have actually used it; the only problem is that if I need some more context passed into the embedding, then with Vespa, it is difficult, meaning I have to pass the text through an external LLM to extract the context and then pass it into Vespa for embedding. If there is a way I can improve the embedding context, to pass some context into Vespa's embedding so that I can just pass a string and let it handle the embedding by itself, that would be beneficial.
I noticed Vespa only requires deployment within the environment. If I have an internal network and since Vespa does not support passwords and usernames, it makes it difficult to control what level of access a user has to data. This raises some questions regarding integrity. If two different services are using the same Vespa instance, then data protection is at risk. If Vespa can introduce username and passwords, this would solve many things. With this structure, it also means that Vespa cannot be exposed to the public. If I want to buy more resources on a different vendor and have my services on a different vendor, exposing Vespa to the public means additional setup, which means IP mapping or something similar. If the IPs are changing, then I am also running into a different problem. Therefore, username and passwords, although basic, can really help, or at least roles.
For how long have I used the solution?
I have used Vespa for roughly two years.
What do I think about the scalability of the solution?
I have yet to experience scalability issues. I do not know how it handles traffic, but I will have an answer for that soon. I also have not tried scaling it, so I do not know how it scales. These are answers I have yet to find.
As I mentioned, I have not tried scaling it and I do not know what problems I might run into while scaling Vespa. I still need some more time, maybe as I get more users. At the moment, it is not a bottleneck for us; the one instance of Vespa is working well. Perhaps soon I will scale it and can have a better answer for this.
Which solution did I use previously and why did I switch?
I was using PGVector before switching to Vespa. The reason that made me switch to Vespa is because I needed more functionality, such as the ranking feature. There were some other options on the table; Weaviate was one of them.
What other advice do I have?
Up to now, I am still in the building phase. I have not gone commercial with my product, and so I cannot give a relevant answer about that. I am still trying out Vespa to see if it actually meets my business need. I would tell others that the product is actually good if they have some resources on their side because it is resource-intensive. It actually requires someone who knows what they are doing to reap most of the benefits out of Vespa because you do not have to implement most of the features in the code layer; you can just do it at the database layer. I would rate this product an 8 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Powerful backend for vector and hybrid search with many bells and whistles.
The Vespa search backend itself provided a good match to our requirements of near-real time hybrid search, combining nearest neighbor embedding search with attribute filters, in a distributed and highly scalable way. Our target installation comprised >12TB of memory across 24 hosts and held O(1B) vector embeddings.
Native extensions can only be written in Java which, without a native Java toolchain at our company, proved too challenging to pursue. The documentation is vast but could be better organized and have more contextual examples in places.