glossary_term · Angestrom

RLHF

Reinforcement learning from human feedback — tuning a model toward preferred answers.

Reinforcement learning from human feedback — tuning a model toward preferred answers. Reinforcement learning from human feedback — tuning a model toward preferred answers.
Read it here, in full.View original →