Instruksi instalasi
Prerequisites
- AWS credentials with Bedrock model access
uvinstalled- Claude Code, Cursor, Kiro, VS Code, or any MCP-compatible IDE
Install
Pick your IDE and paste / click.
Claude Code — one CLI command:
claude mcp add eval -s user -- uvx --from llm-evaluation-system eval-mcp
Cursor — one-click deeplink: Install eval-mcp in Cursor
Kiro — add to ~/.kiro/settings/mcp.json:
{ "mcpServers": { "eval": { "command": "uvx", "args": ["--from", "llm-evaluation-system", "eval-mcp"] } } }
Codex CLI — add to ~/.codex/config.toml, then restart Codex:
[mcp_servers.eval] command = "uvx"args = ["--from", "llm-evaluation-system", "eval-mcp"]
VS Code (with GitHub Copilot MCP) — one CLI command:
code --add-mcp '{"name":"eval","command":"uvx","args":["--from","llm-evaluation-system","eval-mcp"]}'
Using a coding agent to install? Point it at INSTALL.md — it handles the config edit and asks about optional S3 team sharing.
Upgrading
uvx caches the resolved version per package. To pull newer releases, invalidate the cache:
uv cache clean llm-evaluation-system
Restart your IDE after. The next launch resolves and caches the newest published version.
Use
Ask your AI assistant to evaluate agents, models, or prompts — using a dataset you provide or one generated from your documents or context:
- "Evaluate my agent at
./my_agent.py" - "Compare Claude Sonnet vs Nova Pro on this dataset"
- "Test these three prompt templates against my golden QA set"
- "Generate a dataset from this PDF and run an eval"
The agent picks the right mode, auto-generates whatever's missing (dataset, judge, criteria), runs it, opens the results viewer in your browser, and hands you a PDF report.