
Overview
Cohere's Rerank endpoint enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance.
Cohere's Rerank delivers much higher quality results than just embedding-based search, and it requires only adding a single line of code into your application.
Cohere Rerank 3 Nimble is optimized for speed. It performs about 3x faster than our Rerank 3 model. For our most accurate reranker model, please see Cohere Rerank 3 Model - Multilingual.
The endpoint supports documents and queries written in over 100 languages.
Highlights
- Cohere's Rerank endpoint can be applied to both keyword-based search systems and vector search systems. When using a keyword-based search engine, like Elasticsearch or OpenSearch, the Rerank endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword search system without changing their existing infrastructure. This is an easy and low-complexity method of improving search results by introducing semantic search technology into a user’s stack.
- This endpoint is powered by our large language model that computes a score for the relevance of the query with each of the initial search results. Compared to embedding-based semantic search, it yields better search results — especially for complex and domain-specific queries.
- Rerank supports JSON objects as documents where users can specify at query time the fields (keys) that semantic search should be applied over.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $9.16 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $8.50 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $9.16 |
Vendor refund policy
No refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Minor fixes and performance improvements.
Additional details
Inputs
- Summary
The model accepts JSON requests that specifies the input objects to be reranked - the user can specify this at which keys to be reranked by adjusting the rank_fields parameter. Alternatively, the user can just send a list of texts to be reranked.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
query | The search query | Type: FreeText | Yes |
documents | A list of document objects or strings to rerank - if a document is provided the text fields is required or if the user specifies specific fields to rerank over, all other fields will be preserved in the response | Type: FreeText | Yes |
top_n | The number of most relevant documents or indices to return, defaults to the length of the documents
| Default value: []
Type: Integer
Minimum: 0
Maximum: 1 | No |
return_documents | If false returns results without the doc text - the api will return a list of {index, relevance score} where index is inferred from the list passed into the request. If true returns results with the doc text passed in - the api will return an ordered list of {index, text, relevance score} where index + text refers to the list passed into the request | Default value: FALSE
Type: Categorical
Allowed values: TRUE, FALSE | No |
max_chunks_per_doc | The maximum number of chunks to produce internally from a document | Default value: []
Type: Integer
Minimum: 0
Maximum: 10 | No |
rank_fields | If you sent a document object, you can specify the fields to rerank over | Default value: []
Type: FreeText | No |
Resources
Vendor resources
Support
Vendor support
Contact us at support+aws@cohere.com
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




