Angestrom
← Labs

Self-attention from scratch

The mechanism behind every Transformer, in ~20 lines of NumPy — see exactly how query/key/value vectors turn into attention weights.

Failed to load runtime