
Overview
Rerankers are neural networks that predict the relevancy scores between a query and documents and rank them based on the scores. They are used to refine search results in semantic search/retrieval systems and retrieval-augmented generation (RAG). rerank-2 is a cutting-edge reranker optimized for quality, improving accuracy atop OpenAI v3 large by an average of 13.89%—2.3x the improvement attained by the latest Cohere reranker (English v3). rerank-2 is also natively multilingual, beating Cohere multilingual v3 by 8.83% on 51 datasets across 31 languages. It supports a 16K-token combined context length for a query-document pair, with up to 4K tokens for the query. Latency is 1.5 s for 25K tokens, and throughput is 60M tokens per hour at $0.05 per 1M tokens on an ml.g6.xlarge. Learn more about rerank-2 here: https://blog.voyageai.com/2024/09/30/rerank-2/Â
Highlights
- Optimized for quality, improving accuracy atop OpenAI v3 large by an average of 13.89% —2.3x the improvement attained by the latest Cohere reranker (English v3).
- Natively multilingual, beating Cohere multilingual v3 by 8.83% on 51 datasets across 31 languages.
- 16K token context length for queries and documents, with up to 4K token context length for queries; well-suited for applications on long documents. Latencies are 1.5 s (1 GPU), 415 ms (4 GPUs), and 245 ms (8 GPUs) for 25K tokens. We recommend using multiple GPUs to reduce latency. The supported 12xlarge and 24xlarge instances come with 4 GPUs each, while the 48xlarge instances are equipped with 8 GPUs. 60M tokens per hour at $0.05 per 1M tokens on an ml.g6.xlarge
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.