Agent Evaluation
3 items across the graph — tagged with Agent Evaluation.
From the graph · 3
repo
Giskard-AI/giskard-oss
→repo🐢 Open-Source Evaluation & Testing library for LLM Agents
TIGER-AI-Lab/ClawBench
→repoOpen-source benchmark for browser AI agents on daily tasks.
hidai25/eval-view
→Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.
