Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

MirrorCode: AI can rebuild entire programs from behavior alone

AI models are rapidly improving at autonomous coding, as shown by benchmark progress and one-off demonstrations such as AI implementing a C compiler. However, existing coding benchmarks tend to focus on shorter tasks, and one-off demonstrations are hard to compare systematically because they often have some human guidance, and are not standardized or repeated across models. To address these challenges, we introduce MirrorCode, a long-horizon coding benchmark based on reimplementing entire software projects. In MirrorCode, AI agents must replicate the functionalities of an existing program, wit

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

Covers (incoming)

Implements (incoming)

Related across the graph

Topics