paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
MirrorCode: AI can rebuild entire programs from behavior alone
AI models are rapidly improving at autonomous coding, as shown by benchmark progress and one-off demonstrations such as AI implementing a C compiler. However, existing coding benchmarks tend to focus on shorter tasks, and one-off demonstrations are hard to compare systematically because they often have some human guidance, and are not standardized or repeated across models. To address these challenges, we introduce MirrorCode, a long-horizon coding benchmark based on reimplementing entire software projects. In MirrorCode, AI agents must replicate the functionalities of an existing program, wit
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Covers
Covers (incoming)
Implements (incoming)
Related across the graph
newsOrnith-1.0: self-improving open-source models for agentic codingrepoNeuralInverse/neuralinverserepopotpie-ai/potpienewsMonitor and debug generative AI inference with SageMaker detailed metrics and Insights dashboard on CloudWatchnewsScarfBench: Benchmarking AI Agents for Enterprise Java Framework MigrationnewsReflections on Software Engineering in the Age of AI
