
Overview
Cohere's Rerank endpoint enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance.
Cohere's Rerank delivers much higher quality results than just embedding-based search, and it requires only adding a single line of code into your application.
Cohere Rerank 3 Nimble is optimized for speed. It performs about 3x faster than our Rerank 3 model. For our most accurate reranker model, please see Cohere Rerank 3 Model - Multilingual.
The endpoint supports documents and queries written in over 100 languages.
Highlights
- Cohere's Rerank endpoint can be applied to both keyword-based search systems and vector search systems. When using a keyword-based search engine, like Elasticsearch or OpenSearch, the Rerank endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword search system without changing their existing infrastructure. This is an easy and low-complexity method of improving search results by introducing semantic search technology into a user’s stack.
- This endpoint is powered by our large language model that computes a score for the relevance of the query with each of the initial search results. Compared to embedding-based semantic search, it yields better search results — especially for complex and domain-specific queries.
- Rerank supports JSON objects as documents where users can specify at query time the fields (keys) that semantic search should be applied over.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $9.16 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $8.50 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $9.16 |
Vendor refund policy
No refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Minor fixes and performance improvements.
Additional details
Inputs
- Summary
The model accepts JSON requests that specifies the input objects to be reranked - the user can specify this at which keys to be reranked by adjusting the rank_fields parameter. Alternatively, the user can just send a list of texts to be reranked.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
query | The search query | Type: FreeText | Yes |
documents | A list of document objects or strings to rerank - if a document is provided the text fields is required or if the user specifies specific fields to rerank over, all other fields will be preserved in the response | Type: FreeText | Yes |
top_n | The number of most relevant documents or indices to return, defaults to the length of the documents
| Default value: []
Type: Integer
Minimum: 0
Maximum: 1 | No |
return_documents | If false returns results without the doc text - the api will return a list of {index, relevance score} where index is inferred from the list passed into the request. If true returns results with the doc text passed in - the api will return an ordered list of {index, text, relevance score} where index + text refers to the list passed into the request | Default value: FALSE
Type: Categorical
Allowed values: TRUE, FALSE | No |
max_chunks_per_doc | The maximum number of chunks to produce internally from a document | Default value: []
Type: Integer
Minimum: 0
Maximum: 10 | No |
rank_fields | If you sent a document object, you can specify the fields to rerank over | Default value: []
Type: FreeText | No |
Resources
Vendor resources
Support
Vendor support
Contact us at support+aws@cohere.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




Customer reviews
Controlled text generation has supported secure workflows and governed data privacy
What is our primary use case?
We adopted Cohere primarily for their command model to support enterprise-grade text generation and NLP workflows.
There was a use case for one of our customers where they required automated text generation and summarization of long documents and draft creation for internal content, so we used Cohere's command model with AWS Bedrock.
For another customer, there was a similar use case but they also wanted semantic search and RAG, and instruction-based responses for chat and workflow automation were required, so we used Cohere's command model for that.
What is most valuable?
Cohere's command model is particularly useful for scenarios where consistent controlled output is more important, especially where we need creative responses, so I think Cohere's command model fits better in that case. We also found it well suited for structured enterprise tasks such as policy drafting, knowledge extraction, and generating standardized text for operational workflows.
It struck a good balance between fluency and predictability, which helps our team and is valuable for our business-critical applications, giving better insight to our team.
One of the major benefits I saw was data isolation and governance since Cohere has been implemented.
Consistent output quality, strong instruction following, and excellent embedding performance for retrieval tasks have benefited our organization. It was also offered from Amazon Bedrock , so this complete offering and strength from Cohere's command model helped our customers, and it is enterprise-friendly with deployment options such as VPC and data isolation that helped significantly.
Data privacy was a major concern because we operate from Asia-Pacific, and there is strong governance for data privacy in our country, so data privacy is the major compliance that helped us here.
What needs improvement?
Cohere could improve in areas where the command model is not as creative as some larger LLMs available in the market, which is expected but noticeable in open-ended generative tasks.
Reporting and analytics in the dashboard could be more detailed and fine-tuned, which would enhance the experience.
Fine-tuning could be simplified to support broader teams without deep ML expertise.
For speeding up, what I have already suggested is that it can be more creative, and their reporting and analytics can be improved, as this would help teams without machine learning expertise and speed up their end goals.
The dashboard reporting can be improved.
For how long have I used the solution?
We have been using Cohere for around one year.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
The scalability and performance are quite good.
How are customer service and support?
We have not reached out to customer support yet, but once we encounter an issue and need to raise a ticket, we will provide feedback.
How would you rate customer service and support?
Negative
What was our ROI?
Cohere helped us with all three aspects: money is saved, time is saved, and we needed fewer resources to meet our end goals.
What's my experience with pricing, setup cost, and licensing?
Compared to models available in the market, Cohere's pricing, setup cost, and licensing are better.
Which other solutions did I evaluate?
We have tried multiple models, but we found that Cohere's command was a better fit for our needs.
We explored models from Anthropic and AWS native models such as AWS Titan Text before choosing Cohere.
What other advice do I have?
Data privacy was a major concern because we operate from Asia-Pacific, and there is strong governance for data privacy in our country, so data privacy is a major compliance that helped us here.
Cohere offers great customization options.
If governance, consistency, and data privacy are priorities, Cohere meets our organization's requirements well.
I recommend that anyone, especially in environments where governance, consistency, and data privacy are priorities, should choose Cohere, particularly the command model for teams looking for a controlled enterprise-safe alternative for text generation, summarization, and instruction automation.
Currently, we have used Cohere from the AWS Bedrock offering only, but since AWS has changed their third-party model availability from partner accounts, in the future, we are going to be a reseller for Cohere.
The documentation and learning resources were very helpful.
Our overall review rating for Cohere is 8 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Fast document processing has improved tender workflows but documentation still needs work
What is our primary use case?
My main use case for Cohere is for LLM and chatbot development.
I use Cohere to fill boxes about documents, specifically about tenders.
Cohere helps me fill boxes about documents, and I work with docx documents for a private company.
What is most valuable?
The best features Cohere offers are that it is fast and great.
Speed has helped me in my day-to-day work, and I really notice the difference because it responds very quickly to LLM requests.
Cohere has positively impacted my organization because I use it with Oracle, and in an enterprise way, it helped me offer clients a unique place to develop and use LLM. I can tell you that it helped me offer clients a unique place to develop and use LLM, as I use Oracle services.
What needs improvement?
I am uncertain about how Cohere can be improved.
The documentation and support could be improved, as there is limited documentation available on the web.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
I am uncertain about Cohere's scalability.
How are customer service and support?
I am uncertain about customer support.
Which solution did I use previously and why did I switch?
I used GPT-4 before Cohere, and it is great.
Before choosing Cohere, I evaluated other options, specifically GPT-4.
What was our ROI?
I am uncertain if I have seen a return on investment or any relevant metrics such as time saved, money saved, or fewer employees needed.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing is that it is expensive to use all Oracle services.
What other advice do I have?
I do not want to add anything else about the features, including anything about accuracy or ease of use.
I do not have specific advice to give to others looking into using Cohere. I gave this review a rating of 6.
Reranking has boosted retrieval quality and has improved performance in my information systems
What is our primary use case?
My main use case for Cohere is Retrieval Augmented Generation.
A specific example of how I use Retrieval Augmented Generation with Cohere is for information retrieval systems.
What is most valuable?
The best feature Cohere offers is the Reranking model.
What stands out for me about the ranking model is that it improved performance in my work.
Cohere positively impacted my organization by improving the performance of my RAG system.
I noticed a 10% improvement in my log system after using Cohere.
What needs improvement?
Cohere is good enough, and I think it can be improved.
For how long have I used the solution?
I have been using Cohere for two years.
What do I think about the stability of the solution?
Cohere is stable.
What do I think about the scalability of the solution?
The scalability of Cohere is good.
How are customer service and support?
The customer support for Cohere is good.
How would you rate customer service and support?
Negative
How was the initial setup?
My experience with pricing, setup cost, and licensing for Cohere is good.
What was our ROI?
I have not seen metrics for return on investment, and I have no metrics to share.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing for Cohere is good.
What other advice do I have?
My advice to others looking into using Cohere is to try it.
My company does not have a business relationship with this vendor other than being a customer.
I gave this review a rating of 8.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Have improved project workflows using faster response times and reduced data embedding costs
What is our primary use case?
I have used Cohere in a RAG use case where I had to vectorize some data. I used multiple models in RAG to find a better model that could give superior results. I was trying to find a cloud-hosted model, and Cohere's Embed English v3.0 is a cloud-hosted model that took less time to embed the textual data. When I was trying to get the similarity search after embedding that data, Cohere provided much better results.
Let's suppose I had to embed 100 documents at a time. Most other models, including all-MiniLM-L6-v2, took more time when I was trying to embed using that model. When I tried Cohere, it was much faster. I would say it was more than 50 to 60% faster than those models. It was even somewhat faster than text-embedding-3, which is from OpenAI. So Cohere helped to reduce the development time and embedding times.
What is most valuable?
I believe Cohere offers excellent features, especially the cloud-hosted model and the API calls. The number of times I can call the API within a minute is very good. The ping is great; I have started a request to Cohere model, and it was very quick to respond. The best part was the free tier because most models do not provide a free tier.
Regarding benefits, Cohere is less costly than other models. If I talk about OpenAI or Google embedding models, they charge highly compared to Cohere. Regarding the training data, Cohere has the most data embedded or trained with the most English. Cohere's Embed English v3.0 has been trained with much more data than other models, including OpenAI. This gives an extra benefit to my organization.
What needs improvement?
One thing that Cohere can improve is related to some distances when I am trying similarity search. Let's suppose I have provided textual data that has been embedded. I have to use some extra process from numpy after embedding the model. In the case of OpenAI embedding models, I do not have to use that extra process, and they provide lower distances compared to my results from Cohere. I was getting distances of approximately 0.005 sometimes, but in the case of Cohere, I was getting distances around 0.5 or sometimes more than that. I think that can be improved. It was possibly because of some configuration or the way I was using it, but I am not exactly sure about that.
For how long have I used the solution?
I have been using Cohere for the last seven or eight months.
What do I think about the scalability of the solution?
The scalability was very good because of the response time. Even though I do not need that much processing at a time, I have had a good experience with Cohere so far.
Which solution did I use previously and why did I switch?
Previously, I was using all-MiniLM-L6-v2 and switched to Cohere because all-MiniLM-L6-v2 needed to be locally deployed. That model was processing locally, and the results I was getting from that model, even though it was open source, I was not satisfied. That is why I switched to Cohere.
What was our ROI?
I can highlight two benefits. Cohere charges less than OpenAI, so it saves cost. In the second use case, the timing is significant. Cohere's Embed English model took less time to embed than OpenAI's embedding ada-002 model. In this case, it also saves time. These two benefits I can highlight.
Which other solutions did I evaluate?
I have evaluated OpenAI's Embed English v3 and text-embedding-3 models. I have evaluated multiple models, and I even evaluated some models from Hugging Face .
What other advice do I have?
Cohere provides a free tier, and any developer who is starting their journey can use Cohere for RAG use cases. They can utilize the model benefits. After using Cohere, I got distances after the similarity search that were much lower compared to other vectorization and embedding models. The only model that performed better than Cohere was OpenAI's text-embedding-3-large. It was good, but Cohere was the second-best performing model in my use case.
I think Cohere's use cases are excellent, and I would suggest Cohere to others because of the less response time and time-saving in the process. It is also cheaper than other models. I would give this review a rating of eight out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Has improved customer interaction speeds and supports flexible model switching
What is our primary use case?
My main use case for Cohere is to use a Cohere embedded model to create our own vector databases and check conversations.
A specific example of how I use Cohere's embedding model for our vector databases or conversation checking involves abilities that take customer approvals and convert that information into vectors. I save this information in our own systems and also store small vectors on customer devices to use during custom customer requests.
My use case involves indexing and saving small portions of information.
What is most valuable?
In my experience, Cohere offers reliable embedding models for customers who do not want to use standard OpenAI models.
I find that the choice of embedding models is limited, and Cohere was available for Azure , which makes it a good alternative for customers who prefer not to use OpenAI.
Cohere has positively impacted my organization by helping our customers work more efficiently when creating requests, and the embedding results are of very high quality.
What needs improvement?
I believe Cohere can be improved technically by providing more feedback, logs, and metrics for embedding requests, as it currently appears to be a black box without any understanding of quality. Quality can only be understood after using it with customer requests, and during the embedding process, measurable metrics are not visible.
There are no particularly unique features distinguishing Cohere from other solutions.
For how long have I used the solution?
I have been using Cohere for approximately nine to ten months.
What do I think about the stability of the solution?
Cohere is stable in my experience.
What do I think about the scalability of the solution?
The scalability of Cohere showed that after sending a large amount of information and embeddings, it became slower, though we do not use any special solution for scaling.
How are customer service and support?
I have not interacted with Cohere's support team. However, I contacted Azure about the slowness, and we decided to use smaller chunks of information during the embedding process.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
I previously used embedding models from OpenAI. I switched to Cohere because customers wanted to use something other than OpenAI models.
How was the initial setup?
I did not purchase Cohere through the Azure Marketplace . I deployed unmanaged models and shared models.
What was our ROI?
I do not have relevant metrics about the return on investment from using Cohere yet because the customer's application is in a paging stage and has not been released. However, I understand that it is performing well, and we plan to continue with it.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing indicates that it does not require a special license, and the prices are competitive compared to competitors.
Which other solutions did I evaluate?
I did not evaluate other options before choosing Cohere. I looked at prices, and since we used Azure cloud, it did not provide many models for selection. Only OpenAI and Cohere were available for embedding.
What other advice do I have?
For others looking into using Cohere, I advise that it is a good model for people who want to be agnostic when using models and creating something flexible to switch from one model to another. I would rate this product an eight out of ten.