High-quality GLM-5.2 Quant on 4x DGX Spark - Guide, Results, and Comps
I got GLM-5.2 NVFP4 running on four DGX Sparks at 128K context. This is still a niche/hacky setup, but it is now a real serving point rather than just a proof of life. Objective : A high quality 4-bit quant running on 4x spark. Model: https://huggingface.co/Mapika/GLM-5.2-NVFP4 T
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
I got GLM-5.2 NVFP4 running on four DGX Sparks at 128K context. This is still a niche/hacky setup, but it is now a real serving point rather than just a proof of life. Objective : A high quality 4-bit quant running on 4x spark. Model: https://huggingface.co/Mapika/GLM-5.2-NVFP4 TL;DR: 128k context at fp8_ds_mla, ~15-16 tps at c0 decode, falling to about ~13 tps decode at long
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
