paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

MirrorCode: AI can rebuild entire programs from behavior alone

AI models are rapidly improving at autonomous coding, as shown by benchmark progress and one-off demonstrations such as AI implementing a C compiler. However, existing coding benchmarks tend to focus on shorter tasks, and one-off demonstrations are hard to compare systematically because they often have some human guidance, and are not standardized or repeated across models. To address these challenges, we introduce MirrorCode, a long-horizon coding benchmark based on reimplementing entire software projects. In MirrorCode, AI agents must replicate the functionalities of an existing program, wit

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsOrnith-1.0: self-improving open-source models for agentic coding newsMonitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch newsReflections on Software Engineering in the Age of AI

Covers (incoming)

newsScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Implements (incoming)

repopotpie-ai/potpie repoNeuralInverse/neuralinverse

Related across the graph

newsOrnith-1.0: self-improving open-source models for agentic coding repoNeuralInverse/neuralinverse repopotpie-ai/potpie newsMonitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatch newsScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration newsReflections on Software Engineering in the Age of AI

Topics

cs.AI