The mechanism behind every Transformer, in ~20 lines of NumPy — see exactly how query/key/value vectors turn into attention weights.