paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

Speculative decoding accelerates language model inference by using a fast drafter to propose candidate tokens that are then verified by a larger target model. Existing theory largely studies the stochastic, distribution-preserving setting, where the goal is to exactly sample from the target distribution. In contrast, many practical systems use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, where success is governed by local ranking and threshold events rather than exact distributional equality. We develop a theory for these regimes. We identify that many common accept

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

repominimal-diffusion-lm

Implements (incoming)

reposgl-project/SpecForge

Related across the graph

repominimal-diffusion-lm reposgl-project/SpecForge

Topics

cs.CL