We founded this company two and a half years ago, and since the middle of 2022, we foresaw the trending of generative AI and large language models, so my startup is working on developing generative AI applications for our clients, including enterprises and a few other startups across America and Canada.
I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.
In some clients' projects, we were required to introduce reranking model in the RAG flow (Retrieval-augmented generation). In this flow, we use different components to allow users to select and pick up from the UI components, drag and drop to their flow to enhance their RAG pipeline. That's where we introduced Cohere models as one of the providers for reranking.
Cohere's reranking model helped us complete this request
From our data, I can tell that at least 15% of end users were actively using reranking to enhance their RAG pipeline because we have the UI to indicate that reranking is recommended as it can enhance the quality of the retrieval.
For clarification, I want to describe this data more clearly. As mentioned, 15% of end users chose to enable this module based on the fact that we have the pricing tier with an extra cost for their API call.
In general, I'm satisfied with the speed, and I can confirm this because we have the long fields to track all conversations, and we see that this loop for reranking actually costs relatively less time throughout the whole chat flow. Regarding quality, it's hard to tell because we don't have a benchmark. In our enterprise applications, we are trying to build up evaluation pipelines, do AB testing, and other analysis, but it's not a conventional computer science application, so it's very hard to build up evaluation pipelines with objective criteria. It's challenging for us to make a conclusion about quality, but the speed is good.
A direct benefit of using Cohere's reranking model is that we can tell clients we have this module rather than missing this piece, as reranking is a very important component that companies discuss to enhance RAG quality.
Although it's not impacting our business model, I'm pushing for the evaluation system because it can expand our business scope. We want to sell our system to clients, and while they may not be aware of evaluation initially, it's beneficial to have. Once we have these systems, we can showcase to end users that employing such a reranking system improves quality. We need proof to convince ourselves that after implementing reranking, we get better quality.
It would be better to have a dashboard for users to showcase how reranking helps improve quality. When end users choose the service, they want to see the actual output. The evaluation part is challenging for recent large language model applications but remains very important.
If Cohere could provide a dashboard where we can employ an LLM as a judge to check quality before and after reranking, that would be helpful. We could either have another large language model evaluate this part or allow UAT users to manually check with humans in the middle. As an enterprise provider, we want such features because when chatting with clients, we can demonstrate that employing Cohere's reranking model significantly improves results compared to not using it.
Documentation is not a major blocking issue for us as we are sophisticated software engineers. Integration and the API provided for reranking models are not complicated, so we can easily handle that. The documentation is good. The major point is to prove the value through evaluation. We need a sophisticated solution to showcase visibly to our clients and engineering team to convince them that using this model creates improvements.
I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.
That's only what we need in our product currently. I will communicate when we have other requirements.
We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.
We don't observe many scaling problems because it's an enterprise application. There are a few hundred people using this. The concurrent user rate is not significant, which might be why we don't see many scaling issues so far.
We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.
For reranking, Cohere was our only solution.
I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece.
Hard to estimate the overall ROI. but if you see the ROI for the feature of reranking, it's a positive number
I'm not in the position to answer that question because I was not the one who deployed that model, but I believe it is because we see the model name as ARN name, so it's most likely coming from Bedrock.
For reranking, Cohere is the only solution we have used so far.
As a feature developer, I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece. My rating for this solution is 8 out of 10.