Read original ↗
EnrichedOpen SourceReddit r/LocalLLaMACommunityLive · 2d agoPublished 6/30/2026

Devs - you have 64gb of VRAM - which model do you use for coding?

I've currently settled on an unsloth version of Qwen 3.5 122b-a10b model (UD-IQ4_NL). With 100k bf16 context window, I only had to load a few layers into CPU/RAM, it runs around 30 tok/sec which is fine for me. I've tested many models, hours of testing but I am currently deeply i

View in news graph →

Why it matters

This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.

Technical breakdown

I've currently settled on an unsloth version of Qwen 3.5 122b-a10b model (UD-IQ4_NL). With 100k bf16 context window, I only had to load a few layers into CPU/RAM, it runs around 30 tok/sec which is fine for me. I've tested many models, hours of testing but I am currently deeply impressed with this one. I also use the Qwen 3.6 models (both) depending on need, but I think this biggun' is about to be

Business impact

Watch for product launches, funding moves, or policy shifts tied to this headline.