Read original ↗
EnrichedOpen SourceReddit r/LocalLLaMACommunityLive · 4d agoPublished 6/29/2026

Slow performance Unsloth Gemma 12B Q8

I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/ro

View in news graph →

Why it matters

This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.

Technical breakdown

I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/root/llama.cpp ExecStart=/root/llama.cpp/build/bin/llama-server \

Business impact

Watch for product launches, funding moves, or policy shifts tied to this headline.