Angestrom
repo · GitHub

nano-rlhf

A from-scratch RLHF training loop in one file.

Want the primary source?View original →