Topic

Cache

17 items across the graph — tagged with Cache.

From the graph · 17

repo

LMCache/LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

→repo

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

→repo

morphik-org/morphik-core

The most accurate document search and store for building AI apps

→repo

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

→repo

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

→repo

Zefan-Cai/R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

→repo

jjang-ai/vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

→repo

openinfer-project/openinfer

Pure Rust + CUDA LLM inference engine — no PyTorch, OpenAI-compatible, serves Qwen3 to Kimi-K2

→repo

redis/redis-vl-python

Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.

→repo

jaylfc/taOS

Self-hosted AI agent OS. Your memory, chat, agents, and files stay on hardware you own, offline by default, cloud by choice. Offline AI memory (taOSmd), self-ho…

→repo

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

→repo

deven96/ahnlich

Suite of tools containing an in-memory vector datastore and AI proxy

→repo

novitalabs/pegaflow

High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and SGLang.

→repo

manjunathshiva/turboquant-mlx

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

→repo

Rohit-Dnath/RAMen

RAMen is a fast in-memory data store like Redis, but built for AI: drop-in Redis protocol, native vector search, semantic caching, and a built-in MCP server for…

→repo

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

→repo

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

→

From the graph · 17

Related topics