Topic cluster · 2 items

alignment

paper

Constitutional methods for alignment

Training models to critique and revise their own outputs against principles.

glossary_term

RLHF

Reinforcement learning from human feedback — tuning a model toward preferred answers.

Related topics