Read original ↗
paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

Language-Critique Imitation Learning from Suboptimal Demonstrations

Prior work on imitation learning from suboptimal demonstrations typically relies on compressed supervision signals such as confidence estimates, discriminator scores, or importance weights. These scalar signals are inherently limited, as they cannot explicitly express intermediate reasoning about task progress, failure modes, or corrective actions. We propose a language-critique framework for imitation learning from suboptimal demonstrations that instead leverages natural language as a structured supervision signal, avoiding the collapse of expressive feedback into scalars. Our method first co

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorChih-Han Yang

    Language-Critique Imitation Learning from Suboptimal Demonstrations

  • Linked via arxiv authorDai-Jie Wu

    Language-Critique Imitation Learning from Suboptimal Demonstrations

  • Linked via arxiv authorYun-Ping Huang

    Language-Critique Imitation Learning from Suboptimal Demonstrations

  • Linked via arxiv authorPing-Chun Hsieh

    Language-Critique Imitation Learning from Suboptimal Demonstrations

  • Linked via arxiv authorKenneth Marino

    Language-Critique Imitation Learning from Suboptimal Demonstrations

  • Linked via arxiv authorShao-Hua Sun

    Language-Critique Imitation Learning from Suboptimal Demonstrations

authored (incoming)

Related across the graph

Topics