repoGitHubTrust 82 · PrimaryPublished 14h agoLive · 7h ago

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on your GPU.

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

paperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cache

Covers

newsGoing from single GPU to dual GPU is nice but not in the way I expected

Related across the graph

paperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cache newsGoing from single GPU to dual GPU is nice but not in the way I expected

Topics