paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

DOPD: Dual On-policy Distillation

On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged information to either teacher or student itself. However, this additional input induces a potential failure mode we dub privilege illusion: a pattern that conflates the transferable capability gap that students are meant to close, and the information asymmetry gap that can only be mimicked but never replica

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics

cs.AI