Topic

Agent Evaluation

3 items across the graph — tagged with Agent Evaluation.

From the graph · 3

🐢 Open-Source Evaluation & Testing library for LLM Agents

Open-source benchmark for browser AI agents on daily tasks.

Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.