Topic cluster · 1 items

inference

paper

Speculative decoding with draft models

Accelerating generation by drafting tokens with a small model.

Related topics