Enterprise Pricing

Topic cluster · 4 items

rl

Self-rewarding agents that retrace failures

Agents that attribute their own errors and retrace to repair multi-step reasoning.

Long-horizon credit assignment in RL

A method for propagating reward across very long action sequences.

nano-rlhf

A from-scratch RLHF training loop in one file.

RLHF

Reinforcement learning from human feedback — tuning a model toward preferred answers.

Related topics

agents (2)alignment (1)