Angestrom
news · Reddit r/MachineLearning

What if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]

<!-- SC_OFF --><div class="md"><p>I'm proposing a way to handle massive context longer than a model's context window by treating semantic compression as the noise function of a diffusion-like process. Instead of denoising masked tokens into coherent text (like DiffusionGemma or Nemotron-Diffusion do for generation), the model reads the source document in multiple passes at decreasing compression levels, heavy summary first, verbatim last all the while it iteratively refines an &quot;integration

Want the primary source?View original →