paperarXivTrust 82 · PrimaryPublished yesterdayLive · 9h ago

DemoPSD: Disagreement-Modulated Policy Self-Distillation

On-policy self-distillation (OPSD) has emerged as a practical method for training large language models (LLMs) to reason, where a single model acts as both the teacher and the student with different levels of information access. However, recent studies have found that the teacher's dense token-level supervision, conditioned on privileged information, can lead to overfitting to in-domain patterns, suppress exploration, and hurt cross-domain generalization, while also introducing a more fundamental issue: *privileged information leakage*, where the student encodes answer-dependent shortcuts that

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorYunhe Li →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorHao Shi →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorWenhao Liu →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorMengzhe Ruan →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorHanxu Hou →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorZhongxiang Dai →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorShuang Qiu →
DemoPSD: Disagreement-Modulated Policy Self-Distillation
Linked via arxiv authorLinqi Song →
DemoPSD: Disagreement-Modulated Policy Self-Distillation

Implements

repochrisliu298/awesome-on-policy-distillation reponick7nlp/Awesome-LLM-On-Policy-Distillation repochrisliu298/awesome-llm-unlearning

Covers

newsIEEE Rolls Out Large Language Models Virtual Training Course

authored (incoming)

personYunhe Li personHao Shi personWenhao Liu personMengzhe Ruan personHanxu Hou personZhongxiang Dai personShuang Qiu personLinqi Song

Related across the graph

repochrisliu298/awesome-llm-unlearning repochrisliu298/awesome-on-policy-distillation personZhongxiang Dai personMengzhe Ruan reponick7nlp/Awesome-LLM-On-Policy-Distillation personHao Shi personHanxu Hou personShuang Qiu personWenhao Liu newsIEEE Rolls Out Large Language Models Virtual Training Course personYunhe Li personLinqi Song

Topics

cs.AI