repoGitHubTrust 82 · PrimaryPublished 14h agoLive · 7h ago
aivrar/multi-turboquant
Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on your GPU.
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
