Amazon Bedrock AgentCore Evaluations is now generally available
Amazon Bedrock AgentCore Evaluations is now generally available, providing automated quality assessment for AI agents. Evaluations enables developers to monitor agent quality through continuous evaluation of production traffic, validate changes through testing workflows, and measure agent performance against defined expectations. AgentCore Evaluations offers two evaluation types. Online evaluation continuously monitors agent performance in production by sampling and scoring live traces. On-demand evaluation enables teams to test agents programmatically, supporting regression testing in CI/CD pipelines and interactive development workflows.
Teams can evaluate agents using 13 built-in evaluators for response quality, safety, task completion, and tool usage. Developers can also use Ground Truth to measure agent performance against expectations, including reference answers for response validation, behavioral assertions for session-level goals, and expected tool execution sequences. For domain-specific requirements, teams can configure custom evaluators using their choice of prompts and model for LLM-based evaluation, or implement custom logic in Python or JavaScript through Lambda-hosted functions for code-based evaluation. Evaluations integrates with AgentCore Observability for unified monitoring and real-time alerts.
AgentCore Evaluations is available in nine AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland).
Learn more about Amazon Bedrock AgentCore Evaluations through the documentation, and get started with the AgentCore Starter Toolkit