
Overview
Cohere's Rerank endpoint enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers much higher quality results than just embedding-based search, and it requires only adding a single line of code into your application.
Highlights
- Cohere's Rerank endpoint can be applied to both keyword-based search systems and vector search systems. When using a keyword-based search engine, like Elasticsearch or OpenSearch, the Rerank endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword search system without changing their existing infrastructure. This is an easy and low-complexity method of improving search results by introducing semantic search technology into a user’s stack.
- This endpoint is powered by our large language model that computes a score for the relevance of the query with each of the initial search results. Compared to embedding-based semantic search, it yields better search results — especially for complex and domain-specific queries.
- Semantic Search, Ranking, Reranking, Text Embeddings
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Buyer guide

Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $6.16 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $5.71 |
ml.g4dn.12xlarge Inference (Batch) | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $19.80 |
ml.g4dn.2xlarge Inference (Batch) | Model inference on the ml.g4dn.2xlarge instance type, batch mode | $3.81 |
ml.p3.2xlarge Inference (Real-Time) | Model inference on the ml.p3.2xlarge instance type, real-time mode | $15.49 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $6.16 |
ml.g4dn.xlarge Inference (Real-Time) | Model inference on the ml.g4dn.xlarge instance type, real-time mode | $2.98 |
ml.g4dn.2xlarge Inference (Real-Time) | Model inference on the ml.g4dn.2xlarge instance type, real-time mode | $3.81 |
Vendor refund policy
No refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Among many bug fixes and improvements the latest release includes the following features: Chunked Context:
- Enabled chunked context for all models. Chunked context allows for batch processing between the context and the generation phases, thereby balancing the computational and memory cost of each phase and increasing throughput.
Additional details
Inputs
- Summary
The model accepts JSON requests that specifies the input texts to be reranked.
{ "documents": [ {"text":"Carson City is the capital city of the American state of Nevada. "}, {"text" : "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean."}, {"text" : "Washington, D.C. is the capital of the United States. "}, ], "query": "What is the capital of the United States?", "top_n": 2, "return_documents": true }
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
query | The search query. | Type: FreeText | Yes |
documents | A list of document objects or strings to rerank - if a document is provided the text fields is required and all other fields will be preserved in the response | Type: FreeText
Limitations: List of text. | Yes |
top_n | The number of most relevant documents or indices to return, defaults to the length of the documents | Default value: len(documents)
Type: Integer
Minimum: 1 | No |
return_documents | If false returns results without the doc text - the api will return a list of {index, relevance score} where index is inferred from the list passed into the request.
If true returns results with the doc text passed in - the api will return an ordered list of {index, text, relevance score} where index + text refers to the list passed into the request | Default value: false
Type: Categorical
Allowed values: true,false | No |
max_chunks_per_doc | The maximum number of chunks to produce internally from a document | Default value: 10
Type: Integer
Minimum: 0 | No |
Resources
Vendor resources
Support
Vendor support
Contact us at support+aws@cohere.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products



Customer reviews
Improved document chatbot has delivered accurate answers and builds stronger client trust
What is our primary use case?
My main use case for Cohere is that it's a good embedding model. I have used it with Titan, but Cohere came out better.
A specific example of how I've used Cohere for embeddings is when I was working with one of our clients where we were establishing a chatbot that can help us go through 31 PDFs. For embedding, we used Cohere and Titan, and Cohere was a superior product.
I have integrated Cohere in that chatbot project using SageMaker , and it was an easy API call that I used.
What is most valuable?
In my opinion, the best features Cohere offers are the embedding flexibility and the normal way the LLM reacted to the embeddings of Cohere. I used OpenSearch to integrate and store all the embeddings, and I used Titan as well to store the embeddings in OpenSearch , but the result was much better.
The flexibility I mentioned is evident because when we were using Titan, it was hallucinating a lot and not giving proper answers because I felt the embedding was poor. When we used it with Cohere, the embeddings were better and the chatbot with the LLM that used the embeddings from Cohere answered in a better way.
Cohere has positively impacted my organization as the project was a success. Clients were really happy with the results, and we received more business from them.
What needs improvement?
Cohere can be improved by having more integrations beyond its current offerings with Amazon. Integrations with Databricks , Azure