newsReddit r/LocalLLaMATrust 58 · CommunityPublished 4d agoLive · 4d ago
High-quality GLM-5.2 Quant on 4x DGX Spark - Guide, Results, and Comps
I got GLM-5.2 NVFP4 running on four DGX Sparks at 128K context. This is still a niche/hacky setup, but it is now a real serving point rather than just a proof of life. Objective : A high quality 4-bit quant running on 4x spark. Model: https://huggingface.co/Mapika/GLM-5.2-NVFP4 TL;DR: 128k context at fp8_ds_mla, ~15-16 tps at c0 decode, falling to about ~13 tps decode at long
