paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding
Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based speculative decoding further improves parallelism by generating multiple tokens per forward pass via block-level diffusion, achieving state-of-the-art (SOTA) performance. However, existing methods adopt a fixed inference block size and assume a uniform optimal decoding strategy across all inputs. In this paper, we show that this assumption is suboptimal, as the optimal block s
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
