Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?
My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5/Q5_K_S/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5/Q5_K_S/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.gguf \ --temp 0.6 \ --top_p 0.95 \ --top_k 20 \ --min_p 0.0 \ --repeat-penalty 1.0 \ --presence-penalty 0.0 \ -c 100000
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
