Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 2m ago

DecompRL: Solving Harder Problems by Learning Modular Code Generation

How can Large Language Models (LLMs) solve problems they currently cannot? Repeated sampling scales test-time compute but GPU cost grows linearly with attempts, while reinforcement learning (RL) with verifiable rewards improves single-attempt accuracy at the expense of sample diversity. Both strategies ultimately fail when the base policy has near-zero probability of producing a correct solution: no amount of sampling or gradient signal can overcome a search space that is simply too large. We take a different approach: rather than sampling harder, we make the task easier by decomposing problem

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorJuliette Decugis

    DecompRL: Solving Harder Problems by Learning Modular Code Generation

  • Linked via arxiv authorFabian Gloeckle

    DecompRL: Solving Harder Problems by Learning Modular Code Generation

  • Linked via arxiv authorFrancis Bach

    DecompRL: Solving Harder Problems by Learning Modular Code Generation

  • Linked via arxiv authorTaco Cohen

    DecompRL: Solving Harder Problems by Learning Modular Code Generation

  • Linked via arxiv authorGabriel Synnaeve

    DecompRL: Solving Harder Problems by Learning Modular Code Generation

Implements

Covers

authored (incoming)

Related across the graph

Topics