Topic

Compression

14 items across the graph — tagged with Compression.

From the graph · 14

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

→repo

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

→repo

Tencent/AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

→repo

jjang-ai/vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

→repo

manojmallick/sigmap

97% token reduction for AI coding sessions — zero deps, 31 languages, MCP server

→repo

juyterman1000/entroly

Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucin…

→repo

gglucass/headroom-desktop

Unlock 2x more Claude Code and Codex usage

→repo

Context-Engine-AI/Context-Engine

Context-Engine MCP - Agentic Context Compression Suite

→repo

trvon/yams

Persistent memory for LLMs and apps. Content-addressed storage with dedupe, compression, full-text and vector search.

→repo

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

→repo

diillson/chatcli

ChatCLI é uma aplicação de linha de comando que utiliza modelos de linguagem como OpenAI, GoogleAI, e outras, para conversas interativas no terminal. Suporta co…

→repo

peremartra/Rearchitecting-LLMs

Official code for the Manning book on structural LLM optimization: depth/width pruning, knowledge distillation, and attention optimization, runnable on free Col…

→repo

aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

→repo

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

→

From the graph · 14

Related topics