newsReddit r/LocalLLaMATrust 58 · CommunityPublished 4d agoLive · 4d ago
Going from single GPU to dual GPU is nice but not in the way I expected
I was expecting what when doubling my VRAM from 24gb to 2x24gb I'd use higher quants with more context, and thus get smarter LLMs, but that's not what it ended up happening. At least for coding, I found that the difference in quality from, say, qwen 27B UD-Q4-XL to a Q6 or Q8 is rather small. Instead, at least for coding, the way I'm getting advantage of my extra power is parallelism. Instead of getting a smarter LLM, I am using qwen 27B with
