Topic cluster · 2 items
alignment
paper
Constitutional methods for alignment
Training models to critique and revise their own outputs against principles.
glossary_termRLHF
Reinforcement learning from human feedback — tuning a model toward preferred answers.