paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago

Multi-Block Diffusion Language Models

Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a \textit{running-set} of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent diffusion forcing strategy introduces visibility among multiple noisy blocks, its training st

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

repominimal-diffusion-lm

Covers

newsDiffusionGemma: 4x faster text generation newsWhat if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]newsNew Server Hopes to Break Through AI’s “Memory Wall”

Covers (incoming)

newsLearning Unmasking Policies for Diffusion Language Models - Apple Machine Learning Research

Related across the graph

newsDiffusionGemma: 4x faster text generation repominimal-diffusion-lm newsLearning Unmasking Policies for Diffusion Language Models - Apple Machine Learning Research newsNew Server Hopes to Break Through AI’s “Memory Wall”newsWhat if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]

Topics

cs.CL