Artificial Intelligence

Rishiraj Chandra

Author: Rishiraj Chandra

Evaluation Workflow

Evaluate Amazon Bedrock Agents with Ragas and LLM-as-a-judge

In this post, we introduced the Open Source Bedrock Agent Evaluation framework, a Langfuse-integrated solution that streamlines the agent development process. We demonstrated how this evaluation framework can be integrated with pharmaceutical research agents. We used it to evaluate agent performance against biomarker questions and sent traces to Langfuse to view evaluation metrics across question types.