From our data, I can tell that at least 15% of end users were actively using reranking to enhance their RAG pipeline because we have the UI to indicate that reranking is recommended as it can enhance the quality of the retrieval.
For clarification, I want to describe this data more clearly. As mentioned, 15% of end users chose to enable this module based on the fact that we have the pricing tier with an extra cost for their API call.
In general, I'm satisfied with the speed, and I can confirm this because we have the long fields to track all conversations, and we see that this loop for reranking actually costs relatively less time throughout the whole chat flow. Regarding quality, it's hard to tell because we don't have a benchmark. In our enterprise applications, we are trying to build up evaluation pipelines, do AB testing, and other analysis, but it's not a conventional computer science application, so it's very hard to build up evaluation pipelines with objective criteria. It's challenging for us to make a conclusion about quality, but the speed is good.
A direct benefit of using Cohere's reranking model is that we can tell clients we have this module rather than missing this piece, as reranking is a very important component that companies discuss to enhance RAG quality.
Although it's not impacting our business model, I'm pushing for the evaluation system because it can expand our business scope. We want to sell our system to clients, and while they may not be aware of evaluation initially, it's beneficial to have. Once we have these systems, we can showcase to end users that employing such a reranking system improves quality. We need proof to convince ourselves that after implementing reranking, we get better quality.