newsReddit r/LocalLLaMATrust 52 · CommunityPublished yesterdayLive · 14h ago
Rebuilding Gemma 4 31b... better... As 26b...
Sooo... I decided screw it. I'm going to rebuild Gemma 4 31b. I really like the model. So the current plan is to rebuild the SWA layers. Currently running all the proper ablation tests to figure out what SWA layer gets removed. Gemma runs 5 SWA at 1024 tokens each. Then a global layer for the "Block" Layer 3 is consistently the weakest and will likely get removed. From there I am going to rescale the attention of SWA acro
