paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

Language-Critique Imitation Learning from Suboptimal Demonstrations

Prior work on imitation learning from suboptimal demonstrations typically relies on compressed supervision signals such as confidence estimates, discriminator scores, or importance weights. These scalar signals are inherently limited, as they cannot explicitly express intermediate reasoning about task progress, failure modes, or corrective actions. We propose a language-critique framework for imitation learning from suboptimal demonstrations that instead leverages natural language as a structured supervision signal, avoiding the collapse of expressive feedback into scalars. Our method first co

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorChih-Han Yang →
Language-Critique Imitation Learning from Suboptimal Demonstrations
Linked via arxiv authorDai-Jie Wu →
Language-Critique Imitation Learning from Suboptimal Demonstrations
Linked via arxiv authorYun-Ping Huang →
Language-Critique Imitation Learning from Suboptimal Demonstrations
Linked via arxiv authorPing-Chun Hsieh →
Language-Critique Imitation Learning from Suboptimal Demonstrations
Linked via arxiv authorKenneth Marino →
Language-Critique Imitation Learning from Suboptimal Demonstrations
Linked via arxiv authorShao-Hua Sun →
Language-Critique Imitation Learning from Suboptimal Demonstrations

authored (incoming)

personChih-Han Yang personDai-Jie Wu personYun-Ping Huang personPing-Chun Hsieh personKenneth Marino personShao-Hua Sun

Related across the graph

personKenneth Marino personDai-Jie Wu personPing-Chun Hsieh personChih-Han Yang personShao-Hua Sun personYun-Ping Huang

Topics

cs.AI