Listing Thumbnail

    Cohere Rerank 2 Model - English

     Info
    Sold by: Cohere 
    Deployed on AWS
    Free Trial
    Rerank will return a sorted list of documents based on the semantic similarity between the query and documents.

    Overview

    Cohere's Rerank endpoint enables you to significantly improve search quality by augmenting traditional key-word based search systems with a semantic-based reranking system which can contextualize the meaning of a user's query beyond keyword relevance. Cohere's Rerank delivers much higher quality results than just embedding-based search, and it requires only adding a single line of code into your application.

    Highlights

    • Cohere's Rerank endpoint can be applied to both keyword-based search systems and vector search systems. When using a keyword-based search engine, like Elasticsearch or OpenSearch, the Rerank endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword search system without changing their existing infrastructure. This is an easy and low-complexity method of improving search results by introducing semantic search technology into a user’s stack.
    • This endpoint is powered by our large language model that computes a score for the relevance of the query with each of the initial search results. Compared to embedding-based semantic search, it yields better search results — especially for complex and domain-specific queries.
    • Semantic Search, Ranking, Reranking, Text Embeddings

    Details

    Sold by

    Delivery method

    Latest version

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product free for 7 days according to the free trial terms set by the vendor.

    Cohere Rerank 2 Model - English

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (8)

     Info
    Dimension
    Description
    Cost/host/hour
    ml.g5.2xlarge Inference (Batch)
    Recommended
    Model inference on the ml.g5.2xlarge instance type, batch mode
    $6.16
    ml.g5.xlarge Inference (Real-Time)
    Recommended
    Model inference on the ml.g5.xlarge instance type, real-time mode
    $5.71
    ml.g4dn.12xlarge Inference (Batch)
    Model inference on the ml.g4dn.12xlarge instance type, batch mode
    $19.80
    ml.g4dn.2xlarge Inference (Batch)
    Model inference on the ml.g4dn.2xlarge instance type, batch mode
    $3.81
    ml.p3.2xlarge Inference (Real-Time)
    Model inference on the ml.p3.2xlarge instance type, real-time mode
    $15.49
    ml.g5.2xlarge Inference (Real-Time)
    Model inference on the ml.g5.2xlarge instance type, real-time mode
    $6.16
    ml.g4dn.xlarge Inference (Real-Time)
    Model inference on the ml.g4dn.xlarge instance type, real-time mode
    $2.98
    ml.g4dn.2xlarge Inference (Real-Time)
    Model inference on the ml.g4dn.2xlarge instance type, real-time mode
    $3.81

    Vendor refund policy

    No refunds.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Amazon SageMaker model

    An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

    Deploy the model on Amazon SageMaker AI using the following options:
    Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference  .
    Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI  .
    Version release notes

    Among many bug fixes and improvements the latest release includes the following features: Chunked Context:

    • Enabled chunked context for all models. Chunked context allows for batch processing between the context and the generation phases, thereby balancing the computational and memory cost of each phase and increasing throughput.

    Additional details

    Inputs

    Summary

    The model accepts JSON requests that specifies the input texts to be reranked.

    { "documents": [ {"text":"Carson City is the capital city of the American state of Nevada. "}, {"text" : "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean."}, {"text" : "Washington, D.C. is the capital of the United States. "}, ], "query": "What is the capital of the United States?", "top_n": 2, "return_documents": true }

    Input MIME type
    application/json
    https://github.com/cohere-ai/cohere-aws/blob/main/examples/rerank_v2_samples/rerank_english_v2_input.json
    https://github.com/cohere-ai/cohere-aws/blob/main/examples/rerank_v2_samples/rerank_english_v2_input.json

    Input data descriptions

    The following table describes supported input data fields for real-time inference and batch transform.

    Field name
    Description
    Constraints
    Required
    query
    The search query.
    Type: FreeText
    Yes
    documents
    A list of document objects or strings to rerank - if a document is provided the text fields is required and all other fields will be preserved in the response
    Type: FreeText Limitations: List of text.
    Yes
    top_n
    The number of most relevant documents or indices to return, defaults to the length of the documents
    Default value: len(documents) Type: Integer Minimum: 1
    No
    return_documents
    If false returns results without the doc text - the api will return a list of {index, relevance score} where index is inferred from the list passed into the request. If true returns results with the doc text passed in - the api will return an ordered list of {index, text, relevance score} where index + text refers to the list passed into the request
    Default value: false Type: Categorical Allowed values: true,false
    No
    max_chunks_per_doc
    The maximum number of chunks to produce internally from a document
    Default value: 10 Type: Integer Minimum: 0
    No

    Support

    Vendor support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    4
    1 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    100%
    0%
    0%
    0%
    1 AWS reviews
    |
    3 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Rustam Sharipov

    Has improved customer interaction speeds and supports flexible model switching

    Reviewed on Oct 31, 2025
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Cohere is to use a Cohere embedded model to create our own vector databases and check conversations.

    A specific example of how I use Cohere's embedding model for our vector databases or conversation checking involves abilities that take customer approvals and convert that information into vectors. I save this information in our own systems and also store small vectors on customer devices to use during custom customer requests.

    My use case involves indexing and saving small portions of information.

    What is most valuable?

    In my experience, Cohere offers reliable embedding models for customers who do not want to use standard OpenAI models.

    I find that the choice of embedding models is limited, and Cohere was available for Azure , which makes it a good alternative for customers who prefer not to use OpenAI.

    Cohere has positively impacted my organization by helping our customers work more efficiently when creating requests, and the embedding results are of very high quality.

    What needs improvement?

    I believe Cohere can be improved technically by providing more feedback, logs, and metrics for embedding requests, as it currently appears to be a black box without any understanding of quality. Quality can only be understood after using it with customer requests, and during the embedding process, measurable metrics are not visible.

    There are no particularly unique features distinguishing Cohere from other solutions.

    For how long have I used the solution?

    I have been using Cohere for approximately nine to ten months.

    What do I think about the stability of the solution?

    Cohere is stable in my experience.

    What do I think about the scalability of the solution?

    The scalability of Cohere showed that after sending a large amount of information and embeddings, it became slower, though we do not use any special solution for scaling.

    How are customer service and support?

    I have not interacted with Cohere's support team. However, I contacted Azure  about the slowness, and we decided to use smaller chunks of information during the embedding process.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    I previously used embedding models from OpenAI. I switched to Cohere because customers wanted to use something other than OpenAI models.

    How was the initial setup?

    I did not purchase Cohere through the Azure Marketplace . I deployed unmanaged models and shared models.

    What was our ROI?

    I do not have relevant metrics about the return on investment from using Cohere yet because the customer's application is in a paging stage and has not been released. However, I understand that it is performing well, and we plan to continue with it.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing indicates that it does not require a special license, and the prices are competitive compared to competitors.

    Which other solutions did I evaluate?

    I did not evaluate other options before choosing Cohere. I looked at prices, and since we used Azure cloud, it did not provide many models for selection. Only OpenAI and Cohere were available for embedding.

    What other advice do I have?

    For others looking into using Cohere, I advise that it is a good model for people who want to be agnostic when using models and creating something flexible to switch from one model to another. I would rate this product an eight out of ten.

    Daniel Pan

    Has built key functionality for AI workflows in enterprise applications

    Reviewed on Oct 09, 2025
    Review from a verified AWS customer

    What is our primary use case?

    We founded this company two and a half years ago, and since the middle of 2022, we foresaw the trending of generative AI and large language models, so my startup is working on developing generative AI applications for our clients, including enterprises and a few other startups across America and Canada.

    I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.

    In some clients' projects, we were required to introduce reranking model in the RAG flow (Retrieval-augmented generation). In this flow, we use different components to allow users to select and pick up from the UI components, drag and drop to their flow to enhance their RAG pipeline. That's where we introduced Cohere models as one of the providers for reranking.

    How has it helped my organization?

    Cohere's reranking model helped us complete this request

    What is most valuable?

    From our data, I can tell that at least 15% of end users were actively using reranking to enhance their RAG pipeline because we have the UI to indicate that reranking is recommended as it can enhance the quality of the retrieval.

    For clarification, I want to describe this data more clearly. As mentioned, 15% of end users chose to enable this module based on the fact that we have the pricing tier with an extra cost for their API call.

    In general, I'm satisfied with the speed, and I can confirm this because we have the long fields to track all conversations, and we see that this loop for reranking actually costs relatively less time throughout the whole chat flow. Regarding quality, it's hard to tell because we don't have a benchmark. In our enterprise applications, we are trying to build up evaluation pipelines, do AB testing, and other analysis, but it's not a conventional computer science application, so it's very hard to build up evaluation pipelines with objective criteria. It's challenging for us to make a conclusion about quality, but the speed is good.

    A direct benefit of using Cohere's reranking model is that we can tell clients we have this module rather than missing this piece, as reranking is a very important component that companies discuss to enhance RAG quality.

    Although it's not impacting our business model, I'm pushing for the evaluation system because it can expand our business scope. We want to sell our system to clients, and while they may not be aware of evaluation initially, it's beneficial to have. Once we have these systems, we can showcase to end users that employing such a reranking system improves quality. We need proof to convince ourselves that after implementing reranking, we get better quality.

    What needs improvement?

    It would be better to have a dashboard for users to showcase how reranking helps improve quality. When end users choose the service, they want to see the actual output. The evaluation part is challenging for recent large language model applications but remains very important.

    If Cohere could provide a dashboard where we can employ an LLM as a judge to check quality before and after reranking, that would be helpful. We could either have another large language model evaluate this part or allow UAT users to manually check with humans in the middle. As an enterprise provider, we want such features because when chatting with clients, we can demonstrate that employing Cohere's reranking model significantly improves results compared to not using it.

    Documentation is not a major blocking issue for us as we are sophisticated software engineers. Integration and the API provided for reranking models are not complicated, so we can easily handle that. The documentation is good. The major point is to prove the value through evaluation. We need a sophisticated solution to showcase visibly to our clients and engineering team to convince them that using this model creates improvements.

    For how long have I used the solution?

    I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.

    What do I think about the stability of the solution?

    That's only what we need in our product currently. I will communicate when we have other requirements.

    We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.

    What do I think about the scalability of the solution?

    We don't observe many scaling problems because it's an enterprise application. There are a few hundred people using this. The concurrent user rate is not significant, which might be why we don't see many scaling issues so far.

    How are customer service and support?

    We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    For reranking, Cohere was our only solution.

    How was the initial setup?

    I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece.

    What was our ROI?

    Hard to estimate the overall ROI. but if you see the ROI for the feature of reranking, it's a positive number

    What's my experience with pricing, setup cost, and licensing?

    I'm not in the position to answer that question because I was not the one who deployed that model, but I believe it is because we see the model name as ARN name, so it's most likely coming from Bedrock.

    Which other solutions did I evaluate?

    For reranking, Cohere is the only solution we have used so far.

    What other advice do I have?

    As a feature developer, I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece. My rating for this solution is 8 out of 10.

    Gokul Anil

    Has streamlined test creation and analysis while needing better semantic accuracy for specific domain knowledge

    Reviewed on Oct 08, 2025
    Review provided by PeerSpot

    What is our primary use case?

    I am working on test automation, specifically an intelligent test automation framework. Based on the existing framework, which is handled in TypeScript and Selenium, I used Cohere intelligence to create new tests based on the test data and test cases that we provide. It will read through all the test cases in natural language, process them, analyze the internal working of our existing framework, and create the artifacts, test data, and test source based on the existing framework.

    Currently, we are using Cohere APIs. First, I used the chat in the application itself to identify how it works by providing RAG sources, including PDF and text files. After confirming it worked fine, we moved to find an API, and we are using that API to handle all these tasks. The APIs are very functional for all our current use cases, mainly the intelligent test automation.

    What is most valuable?

    Cohere is very useful because I have been in scenarios where code was written with multiple reusable concepts containing many functionalities covered as different functions, but without descriptions of what particular functions were doing. We used Cohere intelligence and its knowledge on Oracle ERP  PPM , and it was able to read through all the TypeScript code and create descriptions intelligently, which were almost 90% correct when reviewed.

    It was very useful because we had 500-plus reusables, and it was able to analyze all of them and put them into a catalog. This makes it very easy to find and use the catalog to determine whether existing functionality is already implemented, preventing redundant implementations.

    When it creates a new test, it creates it almost 70 to 80% correctly without errors. The time savings are significant - what previously took one or two days can now be completed in two to three hours maximum. We can complete many more tests in a day or sprint with Cohere's help.

    Along with test automation, we handle analysis tasks, and now we have more time for better analysis. We are planning to implement test analysis capabilities as well. Once you receive the requirements and test cases, you can directly use them as input, and it will generate all artifacts and test data.

    What needs improvement?

    When performing similarity matching between text descriptions and the catalog descriptions created using Cohere, the matching could be improved. Because it does not have extensive understanding of Oracle functionalities in ERP , it sometimes gives wrong results or the confidence score is lower than desired. Improving that understanding would provide better matches.

    When working with Cohere and providing large data sets, there was some hallucination, though it mostly works fine without many issues.

    For how long have I used the solution?

    I have been using Cohere for almost seven to eight months.

    What do I think about the stability of the solution?

    I have not faced any downtime or related issues. It works fine.

    Which solution did I use previously and why did I switch?

    I used Llama but it was not giving results comparable to what I get from Cohere when comparing the two solutions. We only had these two options at that time, and we chose Cohere over Llama.

    How was the initial setup?

    The setup was pretty smooth. I was able to find things easily. The documents were readily available on the internet, and I was able to find and integrate them without any issues. I subscribed to emails about new model updates, which allowed me to stay current. Oracle has now wrapped it inside their own AI, and we are using the latest version of Cohere as our chosen model.

    What about the implementation team?

    I started with the public version and then they wrapped it inside Oracle's system. I believe it is private, only accessible to Oracle employees with proper authentication and sign-in details. The pricing and setup were handled by the organization, so I am not aware of those aspects.

    Which other solutions did I evaluate?

    We only had two options at that time: Llama and Cohere. After trying both, we chose Cohere over Llama.

    What other advice do I have?

    Try it and use it. If you find it worthy, then implement it. I have shared all my experiences with you. My rating for Cohere is 7 out of 10.

    CollinsOmondi

    Support team is available and answers all the questions and it is also free which is good for personal projects

    Reviewed on Jul 02, 2024
    Review provided by PeerSpot

    What is our primary use case?

    I use it for a personal project, a Discord bot for my Discord server. I haven't used it that much, but so far it's amazing. I like the support team. They are very good.

    How has it helped my organization?

    Everything is definitely intuitive, and whenever you have an issue, it's very easy to reach out to them on Discord. They're very active, so I'm not really complaining about having issues.

    What is most valuable?

    The very first thing that I really like about it is the support team. They're really available on Discord, and they answer all of your questions.

    I think it's free for personal projects unless you want to go to production. I haven't really used it that much, but the features that I have used so far, I have no issues with them.

    What needs improvement?

    Cohere has text generation. I think it is mainly focused on AI search. If there was a way to combine the searches with images, I think it would be nice to include that.

    For how long have I used the solution?

    I've recently started exploring Cohere. It has been a few months now, two to three months.

    What do I think about the stability of the solution?

    I'll rate the stability a six out of ten since I haven't been using it much. I haven't really seen any issues.

    What do I think about the scalability of the solution?

    I will rate the scalability a seven out of ten because I haven't explored it all the way.

    How are customer service and support?

    The customer service and support are very good.

    How would you rate customer service and support?

    Positive

    How was the initial setup?

    It's very easy. You just need an API key, and all the configurations are there. It's very easy to start.

    If you have worked with something that requires API keys, you should be good to go. I don't think you need a lot of experience.

    What's my experience with pricing, setup cost, and licensing?

    Cohere has a free tier. You can use the API in development mode, so you can just use it for free. But if you go to production, you will have to pay.

    I would advise someone to really consider it first if they really need it because it can be expensive.

    So it might be a little expensive.

    What other advice do I have?

    Overall, I would rate it a seven out of ten.

    I would recommend it to others because it is very promising, so it would be worth the time. Others should try it.

    View all reviews