Slow performance Unsloth Gemma 12B Q8
I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/ro
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/root/llama.cpp ExecStart=/root/llama.cpp/build/bin/llama-server \
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
