newsReddit r/LocalLLaMATrust 58 · CommunityPublished 4d agoLive · 4d ago

Going from single GPU to dual GPU is nice but not in the way I expected

I was expecting what when doubling my VRAM from 24gb to 2x24gb I'd use higher quants with more context, and thus get smarter LLMs, but that's not what it ended up happening. At least for coding, I found that the difference in quality from, say, qwen 27B UD-Q4-XL to a Q6 or Q8 is rather small. Instead, at least for coding, the way I'm getting advantage of my extra power is parallelism. Instead of getting a smarter LLM, I am using qwen 27B with

Covers (incoming)

paperGPU Parallelization Strategies for Forward and Backward Propagation in Shallow Neural Networks: A CUDA-Based Comparative Study repoNexusGPU/tensor-fusion

Related across the graph

repoNexusGPU/tensor-fusion paperGPU Parallelization Strategies for Forward and Backward Propagation in Shallow Neural Networks: A CUDA-Based Comparative Study