paperarXivTrust 82 · PrimaryPublished 2d agoLive · 20h ago

QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling

Scaling inference compute, by generating many parallel attempts per problem, is a costly but reliable lever for improving language model capabilities. By default these attempts are generated independently, wasting inference compute on redundant solutions. This waste seems unavoidable. After all, independence is what makes parallel sampling trivial to scale. However, this tradeoff is not fundamental: there is a rich design space of samplers that generate correlated but exact samples entirely in parallel. We explore this design space as an avenue for improving sample efficiency in scaling infere

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorMichael Y. Li →
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
Linked via arxiv authorAnthony Zhan →
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
Linked via arxiv authorKanishk Gandhi →
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
Linked via arxiv authorNoah D. Goodman →
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
Linked via arxiv authorEmily B. Fox →
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling

Covers

newsHardware startup unveils inference accelerator newsAdaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

authored (incoming)

personMichael Y. Li personAnthony Zhan personKanishk Gandhi personNoah D. Goodman personEmily B. Fox

Related across the graph

personKanishk Gandhi personAnthony Zhan personEmily B. Fox personNoah D. Goodman newsHardware startup unveils inference accelerator personMichael Y. Li newsAdaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Topics

cs.CL