Evaluation Framework
3 items across the graph — tagged with Evaluation Framework.
From the graph · 3
repo
promptfoo/promptfoo
→repoTest your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simpl…
Kiln-AI/Kiln
→repoBuild, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
EuroEval/EuroEval
→The robust European language model benchmark.
