newsReddit r/LocalLLaMATrust 58 · CommunityPublished 2d agoLive · 2d ago
I mapped which local LLMs actually fit each RAM tier, 8 to 128GB (open dataset)
I kept answering the same question for friends ("I've got a 16GB MacBook / a 3060, what can I actually run?") and got tired of guessing, so I started a spreadsheet. It grew into a real dataset, so I put it on GitHub under CC BY for anyone to use or fix. Rule of thumb I landed on: at Q4_K_M a model needs roughly 0.6GB of memory per billion params, and you want to size to about 70% of your RAM/VRAM so the OS, context and KV cache still have room.
Covers
Covers (incoming)
paperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV CacherepoNVIDIA-NeMo/Curatorrepojmaczan/tiny-vllmrepoLMCache/LMCacherepomlhher/late-clirepoluziyao1995/vllmrepojundot/omlxrepojohn-rocky/apple-silicon-llm-benchreponovitalabs/pegaflowrepoModelEngine-Group/unified-cache-managementrepomanjunathshiva/turboquant-mlxrepotrvon/yamsrepoAndyyyy64/whichllm
Related across the graph
reponovitalabs/pegaflowrepojmaczan/tiny-vllmrepoAndyyyy64/whichllmrepoluziyao1995/vllmrepomlhher/late-clipaperGSRQ: Gain-Shape Residual Quantization for Sub-1-bit KV Cacherepojundot/omlxrepoLMCache/LMCacherepoModelEngine-Group/unified-cache-managementrepomanjunathshiva/turboquant-mlxtutorialEvaluate a model properlyrepotrvon/yamsrepojohn-rocky/apple-silicon-llm-benchrepoNVIDIA-NeMo/Curator
