paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding
Speculative decoding accelerates language model inference by using a fast drafter to propose candidate tokens that are then verified by a larger target model. Existing theory largely studies the stochastic, distribution-preserving setting, where the goal is to exactly sample from the target distribution. In contrast, many practical systems use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, where success is governed by local ranking and threshold events rather than exact distributional equality. We develop a theory for these regimes. We identify that many common accept
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
