Read original ↗

newsReddit r/LocalLLaMATrust 58 · CommunityPublished 8d agoLive · 7d ago

[Research] JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup with more than 1000TPS

Open Source Reddit r/LocalLLaMA

Covers

paperSpeculative decoding with draft models

Covers (incoming)

paperDepth Exploration for LLM Decoding paperBlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding reposgl-project/SpecForge

Related across the graph

paperBlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding reposgl-project/SpecForge paperDepth Exploration for LLM Decoding paperSpeculative decoding with draft models