paper · arXiv

Sparse attention at million-token context

A linear-cost attention variant that holds quality past a million tokens.

Want the primary source?View original →