newsHugging FaceTrust 88 · LabPublished 3d agoLive · yesterday
ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration
Covers
paperSWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding SessionspaperMirrorCode: AI can rebuild entire programs from behavior alonepaperGovern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native SoftwarerepoFabiojvv/ai-cortex-hubpaperTraceLab: Characterizing Coding Agent Workloads for LLM Serving
Covers (incoming)
paperAxDafny: Agentic Verified Code Generation in DafnypaperCan Agents Generalize to the Open World? Unveiling the Fragility of Static Training in Tool UsepaperAre Performance-Optimization Benchmarks Reliably Measuring Coding Agents?repohammadhaqqani/awesome-devops-airepoopensandbox-group/OpenSandboxreposamuelhm/42Jobsreporocketride-org/rocketride-serverrepoDashAISoftware/dashAIrepolangchain-ai/langgraphjsrepoTimefoldAI/timefold-solverrepopotpie-ai/potpierepospring-projects/spring-airepospiceai/spiceairepovercel/airepobytechefhq/bytechefrepozenml-io/zenmlreporun-llama/ParseBench
Related across the graph
paperAre Performance-Optimization Benchmarks Reliably Measuring Coding Agents?repoDashAISoftware/dashAIpaperTraceLab: Characterizing Coding Agent Workloads for LLM ServingpaperAxDafny: Agentic Verified Code Generation in DafnypaperMirrorCode: AI can rebuild entire programs from behavior alonerepoTimefoldAI/timefold-solverrepopotpie-ai/potpierepoFabiojvv/ai-cortex-hubrepozenml-io/zenmlreporocketride-org/rocketride-serverrepospring-projects/spring-airepospiceai/spiceaipaperGovern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Softwarerepohammadhaqqani/awesome-devops-aireporun-llama/ParseBenchrepoopensandbox-group/OpenSandboxrepovercel/aipaperCan Agents Generalize to the Open World? Unveiling the Fragility of Static Training in Tool UsepaperSWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessionsrepobytechefhq/bytechefreposamuelhm/42Jobsrepolangchain-ai/langgraphjs
