paperarXivTrust 82 · PrimaryPublished yesterdayLive · 2m ago

DecompRL: Solving Harder Problems by Learning Modular Code Generation

How can Large Language Models (LLMs) solve problems they currently cannot? Repeated sampling scales test-time compute but GPU cost grows linearly with attempts, while reinforcement learning (RL) with verifiable rewards improves single-attempt accuracy at the expense of sample diversity. Both strategies ultimately fail when the base policy has near-zero probability of producing a correct solution: no amount of sampling or gradient signal can overcome a search space that is simply too large. We take a different approach: rather than sampling harder, we make the task easier by decomposing problem

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorJuliette Decugis →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
Linked via arxiv authorFabian Gloeckle →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
Linked via arxiv authorFrancis Bach →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
Linked via arxiv authorTaco Cohen →
DecompRL: Solving Harder Problems by Learning Modular Code Generation
Linked via arxiv authorGabriel Synnaeve →
DecompRL: Solving Harder Problems by Learning Modular Code Generation

Implements

reporllm-org/rllm repochrisliu298/awesome-llm-unlearning

Covers

newsIEEE Rolls Out Large Language Models Virtual Training Course

authored (incoming)

personJuliette Decugis personFabian Gloeckle personFrancis Bach personTaco Cohen personGabriel Synnaeve

Related across the graph

personJuliette Decugis repochrisliu298/awesome-llm-unlearning personFabian Gloeckle personGabriel Synnaeve personTaco Cohen personFrancis Bach reporllm-org/rllm newsIEEE Rolls Out Large Language Models Virtual Training Course

Topics

cs.LG