paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago
Multi-Block Diffusion Language Models
Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a \textit{running-set} of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent diffusion forcing strategy introduces visibility among multiple noisy blocks, its training st
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Implements
Covers
Covers (incoming)
Related across the graph
newsDiffusionGemma: 4x faster text generationrepominimal-diffusion-lmnewsLearning Unmasking Policies for Diffusion Language Models - Apple Machine Learning ResearchnewsNew Server Hopes to Break Through AI’s “Memory Wall”newsWhat if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]
