Read original ↗
newsReddit r/LocalLLaMATrust 58 · CommunityPublished 2d agoLive · 2d ago

[audio.cpp] VibeVoice 1.5B released — 90-min podcast in 22.95 min, 4.08x real-time, 2.86x faster than Python without quantization. Native C++/ggml

I’m the author of audio.cpp, a C++/ggml runtime for local audio models. I just added VibeVoice 1.5B support and wanted to share the benchmark because long-form multi-speaker TTS is a good stress test for local inference runtimes. Result on RTX 5090: VibeVoice 1.5B Audio length: 5615.73s / 93.60 min Wall time: 1376.84s / 22.95 min RTF: 0.245 Speed: 4.08x faster than real time Python baseline: 92.66 min audio in 6