news · Reddit r/MachineLearning
What if context compression is a diffusion noise function? Proposal + honest results from untrained-model experiments [R]
<!-- SC_OFF --><div class="md"><p>I'm proposing a way to handle massive context longer than a model's context window by treating semantic compression as the noise function of a diffusion-like process. Instead of denoising masked tokens into coherent text (like DiffusionGemma or Nemotron-Diffusion do for generation), the model reads the source document in multiple passes at decreasing compression levels, heavy summary first, verbatim last all the while it iteratively refines an "integration
Want the primary source?View original →