Topic

Cache

17 items across the graph — tagged with Cache.

From the graph · 17

repo
LMCache/LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

repo
kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

repo
morphik-org/morphik-core

The most accurate document search and store for building AI apps

repo
uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

repo
Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

repo
Zefan-Cai/R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

repo
jjang-ai/vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

repo
openinfer-project/openinfer

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

repo
redis/redis-vl-python

Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.

repo
jaylfc/taOS

Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-ho…

repo
ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

repo
deven96/ahnlich

Suite of tools containing an in-memory vector datastore and AI proxy

repo
novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

repo
manjunathshiva/turboquant-mlx

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

repo
Rohit-Dnath/RAMen

RAMen is a fast in-memory data store like Redis, but built for AI: drop-in Redis protocol, native vector search, semantic caching, and a built-in MCP server for…

repo
aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

repo
aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

Related topics