Read original ↗
EnrichedOpen SourceReddit r/LocalLLaMACommunityLive · 2d agoPublished 7/1/2026

Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?

My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5/Q5_K_S/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.

View in news graph →

Why it matters

This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.

Technical breakdown

My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5/Q5_K_S/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.gguf \ --temp 0.6 \ --top_p 0.95 \ --top_k 20 \ --min_p 0.0 \ --repeat-penalty 1.0 \ --presence-penalty 0.0 \ -c 100000

Business impact

Watch for product launches, funding moves, or policy shifts tied to this headline.