Language-Critique Imitation Learning from Suboptimal Demonstrations
Prior work on imitation learning from suboptimal demonstrations typically relies on compressed supervision signals such as confidence estimates, discriminator scores, or importance weights. These scalar signals are inherently limited, as they cannot explicitly express intermediate reasoning about task progress, failure modes, or corrective actions. We propose a language-critique framework for imitation learning from suboptimal demonstrations that instead leverages natural language as a structured supervision signal, avoiding the collapse of expressive feedback into scalars. Our method first co
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorChih-Han Yang →
Language-Critique Imitation Learning from Suboptimal Demonstrations
- Linked via arxiv authorDai-Jie Wu →
Language-Critique Imitation Learning from Suboptimal Demonstrations
- Linked via arxiv authorYun-Ping Huang →
Language-Critique Imitation Learning from Suboptimal Demonstrations
- Linked via arxiv authorPing-Chun Hsieh →
Language-Critique Imitation Learning from Suboptimal Demonstrations
- Linked via arxiv authorKenneth Marino →
Language-Critique Imitation Learning from Suboptimal Demonstrations
- Linked via arxiv authorShao-Hua Sun →
Language-Critique Imitation Learning from Suboptimal Demonstrations
