Topic

Kv Cache

7 items across the graph — tagged with Kv Cache.

From the graph · 7

LMCache/LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

openinfer-project/openinfer

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

manjunathshiva/turboquant-mlx

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

Related topics

kv-cache 7 llm 7 inference 4 cuda 4 gpu 3 kv-cache-compression 2 vllm 2 mlx 1 flash-attention 1 awq 1 deep-learning 1 quantization 1

Search Kv Cache →All topics →