Read original ↗
EnrichedOpen SourceReddit r/LocalLLaMACommunityLive · 3d agoPublished 6/30/2026

Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090

First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox) ac

View in news graph →

Why it matters

This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.

Technical breakdown

First of all, a huge thank you to the r/LocalLLaMA community and the 3090 club. This benchmark started from your shared recipes... These are my findings on my hardware (Xeon E5-2666v3, 64GB RAM, single RTX 3090 24GB) comparing 5 engines (3 llama.cpp forks + mainline + Lucebox) across two quantizations of the same model. I've used the bench script from

Business impact

Watch for product launches, funding moves, or policy shifts tied to this headline.