paperarXivTrust 82 · PrimaryPublished yesterdayLive · 7h ago

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Software tests and code evolve together: a code change should be followed by new or updated tests that record the new software behavior. Yet existing test generation and update benchmarks often isolate the test from the code change, and rely on static metadata that does not verify whether a test is executable or semantically tied to the code change. This makes it difficult to evaluate whether a test automation agent understands how a code change should propagate into the test suite. We introduce TestEvo-Bench, a benchmark of test and code co-evolution tasks mined from software repositories,

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorJiale Amber Wang →
TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution
Linked via arxiv authorKaiyuan Wang →
TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution
Linked via arxiv authorPengyu Nie →
TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Covers

newsScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration newsIntroducing GeneBench-Pro newsREAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage [R]

Implements

repopotpie-ai/potpie repoGiskard-AI/giskard-oss

authored (incoming)

personJiale Amber Wang personKaiyuan Wang personPengyu Nie

Related across the graph

repoGiskard-AI/giskard-oss personKaiyuan Wang repopotpie-ai/potpie personJiale Amber Wang personPengyu Nie newsScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration newsIntroducing GeneBench-Pro newsREAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage [R]

Topics

cs.AI