DecompRL: Solving Harder Problems by Learning Modular Code Generation
How can Large Language Models (LLMs) solve problems they currently cannot? Repeated sampling scales test-time compute but GPU cost grows linearly with attempts, while reinforcement learning (RL) with verifiable rewards improves single-attempt accuracy at the expense of sample diversity. Both strategies ultimately fail when the base policy has near-zero probability of producing a correct solution: no amount of sampling or gradient signal can overcome a search space that is simply too large. We take a different approach: rather than sampling harder, we make the task easier by decomposing problem
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorJuliette Decugis →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
- Linked via arxiv authorFabian Gloeckle →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
- Linked via arxiv authorFrancis Bach →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
- Linked via arxiv authorTaco Cohen →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
- Linked via arxiv authorGabriel Synnaeve →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
