Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished 6d agoLive · 5d ago

Does quantizing change the MTP draft rate?

Speculative decoding speeds up LLM generation by using a small "drafter" mo