paper · arXiv

Speculative decoding with draft models

Accelerating generation by drafting tokens with a small model.

Want the primary source?View original →