repoGitHubTrust 82 · PrimaryPublished 15h agoLive · 15h ago
langwatch/langwatch
The platform for LLM evaluations and AI agent testing
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Covers
Implements
Related across the graph
newsLangChain Engineer Introduces Harbor for Complex AI Agent Evaluation - TechGigpaperAgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM AgentspaperPACE: A Proxy for Agentic Capability EvaluationpaperAutoTrainess: Teaching Language Models to Improve Language Models AutonomouslypaperSWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions
