Rebuilding Gemma 4 31b... better... As 26b...
Sooo... I decided screw it. I'm going to rebuild Gemma 4 31b. I really like the model. So the current plan is to rebuild the SWA layers. Currently running all the proper ablation tests to figure out what SWA layer gets removed. Gemma runs 5 SWA at 1024 tokens each. Then a global
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
Sooo... I decided screw it. I'm going to rebuild Gemma 4 31b. I really like the model. So the current plan is to rebuild the SWA layers. Currently running all the proper ablation tests to figure out what SWA layer gets removed. Gemma runs 5 SWA at 1024 tokens each. Then a global layer for the "Block" Layer 3 is consistently the weakest and will likely get removed. From there I am going t
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
