glossary_term · Angestrom
RLHF
Reinforcement learning from human feedback — tuning a model toward preferred answers.
Reinforcement learning from human feedback — tuning a model toward preferred answers. Reinforcement learning from human feedback — tuning a model toward preferred answers.
Read it here, in full.View original →