newsReddit r/LocalLLaMATrust 58 · CommunityPublished 2d agoLive · 2d ago

I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)

I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix. Rule of thumb I landed on: at Q4_K_M a model needs roughly 0.6GB of memory per billion params, and you want to size to about 70% of your RAM/VRAM so the OS, context and KV cache still have room.

Covers

tutorialEvaluate a model properly

Covers (incoming)

paperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cache repoNVIDIA-NeMo/Curator repojmaczan/tiny-vllm repoLMCache/LMCache repomlhher/late-cli repoluziyao1995/vllm repojundot/omlx repojohn-rocky/apple-silicon-llm-bench reponovitalabs/pegaflow repoModelEngine-Group/unified-cache-management repomanjunathshiva/turboquant-mlx repotrvon/yams repoAndyyyy64/whichllm

Related across the graph

reponovitalabs/pegaflow repojmaczan/tiny-vllm repoAndyyyy64/whichllm repoluziyao1995/vllm repomlhher/late-cli paperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cache repojundot/omlx repoLMCache/LMCache repoModelEngine-Group/unified-cache-management repomanjunathshiva/turboquant-mlx tutorialEvaluate a model properly repotrvon/yams repojohn-rocky/apple-silicon-llm-bench repoNVIDIA-NeMo/Curator