paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

Freeform Preference Learning for Robotic Manipulation

Reward design remains a central bottleneck for autonomous robot policy improvement, especially in long-horizon manipulation tasks where sparse success labels provide too little signal and binary preferences collapse many competing notions of quality into one ambiguous signal. We introduce Freeform Preference Learning (FPL), a method for learning robot policies from freeform human preferences. Rather than asking annotators which of two trajectories is better overall, FPL lets them define natural-language preference axes, such as speed, safety, quality of placement, or carefulness, and provide p

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsDirect Preference Optimization Beyond Chatbots

Related across the graph

newsDirect Preference Optimization Beyond Chatbots

Topics

cs.AI