paperarXivTrust 82 · PrimaryPublished 6d agoLive · 3d ago

Masked Diffusion Decoding as $x$-Prediction Flow

Masked diffusion language models (MDLMs) generate text by iteratively unmasking tokens, but their standard decoder reduces each step to a binary action: a position is either committed to a single token or left fully masked, with no representation of partial belief in between. This all-or-nothing regime discards rich predictive information and forces premature, irrevocable commitments, leading to poor performance under a limited decoding budget. In this paper, we reinterpret mask prediction as clean-state prediction ($x$-prediction) and show that it can be used to induce a continuous flow in in

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

repominimal-diffusion-lm

Covers

newsWhat if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]

Covers (incoming)

newsLearning Unmasking Policies for Diffusion Language Models - Apple Machine Learning Research

Related across the graph

repominimal-diffusion-lm newsLearning Unmasking Policies for Diffusion Language Models - Apple Machine Learning Research newsWhat if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]

Topics

cs.CL