Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished 2d agoLive · 2d ago

Best tps can I get with Qwen3.5 122B on 32GB VRAM + 64GB RAM?

My attempt at running Qwen3.5 122B on my 5090 (32GB VRAM) + 64GB RAM is really bleak. I'm getting a speed that starts at 6 tps and ends at ~20 tps. Can I improve this further? build/bin/llama-server \ -m ~/myp/models/unsloth/qwen3.5/Q5_K_S/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.gguf \ --temp 0.6 \ --top_p 0.95 \ --top_k 20 \ --min_p 0.0 \ --repeat-penalty 1.0 \ --presence-penalty 0.0 \ -c 100000 \ -t 16 \ -ngl 99 \ --flash-attn on \ --host 0.0.0.0

Covers (incoming)

Related across the graph