Help optimizing llama.cpp + Qwen 27B on RTX PRO 6000 Blackwell for coding agents
Our company recently acquired a workstation with an RTX PRO 6000 Blackwell , and we're experimenting with local LLMs to reduce part of our Claude token usage. Right now we’re running Qwen3.6 27B MTP Q8_K_XL with llama.cpp on Windows 11 . I've been using both Claude Opus and Sonne
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
Our company recently acquired a workstation with an RTX PRO 6000 Blackwell , and we're experimenting with local LLMs to reduce part of our Claude token usage. Right now we’re running Qwen3.6 27B MTP Q8_K_XL with llama.cpp on Windows 11 . I've been using both Claude Opus and Sonnet for a while, and my impression is that this model feels somewhat comparable to Sonnet, but a bit weaker
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
