newsReddit r/LocalLLaMATrust 58 · CommunityPublished 3d agoLive · 2d ago

Devs - you have 64gb of VRAM - which model do you use for coding?

I've currently settled on an unsloth version of Qwen 3.5 122b-a10b model (UD-IQ4_NL). With 100k bf16 context window, I only had to load a few layers into CPU/RAM, it runs around 30 tok/sec which is fine for me. I've tested many models, hours of testing but I am currently deeply impressed with this one. I also use the Qwen 3.6 models (both) depending on need, but I think this biggun' is about to become my daily driver. Curious to know what others wi

Covers (incoming)

repoLMCache/LMCache repomlhher/late-cli

Related across the graph

repomlhher/late-cli repoLMCache/LMCache