newsReddit r/MachineLearningTrust 72 · CommunityPublished 5d agoLive · 5d ago

I shrank a transformer until every number fitted on the screen and made the weights editable [R]

I've been teaching myself how LLMs actually work, not at the API level, but down to the matrix multiplications. To force myself to really understand the forward pass, I first built a complete transformer by hand in a spreadsheet from embeddings through to the loss. Then I turned the forward pass into a web page so it's easier to share. It's a full transformer (single attention head, single block) shrunk to the smallest size where every single number still

Research Reddit r/MachineLearning

Covers

repoengineering87/llm-atlas tutorialBuild your first transformer from scratch

Covers (incoming)

paperPost-Training Pruning for Diffusion Transformers repoteilomillet/retrain

Related across the graph

repoteilomillet/retrain repoengineering87/llm-atlas tutorialBuild your first transformer from scratch paperPost-Training Pruning for Diffusion Transformers