Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished 4d agoLive · 4d ago

Slow performance Unsloth Gemma 12B Q8

I recently replaced GPT-OSS 20B Q4 with Gemma 4 12B Q8 but i went from roughly 70 t/s to 10 t/s. Am I doing something wrong? In the current session I am trying a Q5 modell with no change in performance meassured against the Q8. [Service] Type=simple User=root WorkingDirectory=/root/llama.cpp ExecStart=/root/llama.cpp/build/bin/llama-server \