On-Policy Distillation
Training Students on Their Own Mistakes
This slide deck introduces On-Policy Distillation (OPD): a family of distillation methods where the student learns from trajectories, states, prompts, or mistakes that arise from its own policy rather than only imitating static teacher data.
It walks through the motivation, definition, taxonomy, white-box and black-box settings, on-policy self-distillation, OPD-RL hybrids, speculative-decoding distillation, practical recipes, failure modes, cost tradeoffs, and open problems.
Reference
@misc{dong2026opd,
author = {Dong, Peijie},
title = {On-Policy Distillation: Training Students on Their Own Mistakes},
year = {2026},
month = may,
day = {22},
howpublished = {\url{https://pprp.github.io/tech/opd/}},
url = {https://pprp.github.io/tech/opd/},
urldate = {2026-05-22},
note = {Blog post with PDF slides. Accessed: 2026-05-22},
language = {English}
}