Topic

Compression

14 items across the graph — tagged with Compression.

From the graph · 14

repo
headroomlabs-ai/headroom

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

repo
Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

repo
Tencent/AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

repo
jjang-ai/vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

repo
manojmallick/sigmap

97% token reduction for AI coding sessions — zero deps, 31 languages, MCP server

repo
juyterman1000/entroly

Cut your Claude / OpenAI / Gemini bill 70–95% on AI coding. Local proxy that compresses context, keeps provider caches hot, and verifies LLM output ($0 hallucin…

repo
gglucass/headroom-desktop

Unlock 2x more Claude Code and Codex usage

repo
Context-Engine-AI/Context-Engine

Context-Engine MCP - Agentic Context Compression Suite

repo
trvon/yams

Persistent memory for LLMs and apps. Content-addressed storage with dedupe, compression, full-text and vector search.

repo
Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

repo
diillson/chatcli

ChatCLI é uma aplicação de linha de comando que utiliza modelos de linguagem como OpenAI, GoogleAI, e outras, para conversas interativas no terminal. Suporta co…

repo
peremartra/Rearchitecting-LLMs

Official code for the Manning book on structural LLM optimization: depth/width pruning, knowledge distillation, and attention optimization, runnable on free Col…

repo
aivrar/vllm-windows-build

Native Windows build of vLLM 0.24.0 - no WSL, no Docker. Python 3.13 + CUDA 12.8 + PyTorch 2.11 cu128 for RTX 30/40/50-series, pre-built wheel, Windows patchset…

repo
aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cach…

Related topics