Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090
First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox) ac
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox) across two quantizations of the same model. I've used the bench script from
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
