[audio.cpp] VibeVoice 1.5B released — 90-min podcast in 22.95 min, 4.08x real-time, 2.86x faster than Python without quantization. Native C++/ggml
I’m the author of audio.cpp, a C++/ggml runtime for local audio models. I just added VibeVoice 1.5B support and wanted to share the benchmark because long-form multi-speaker TTS is a good stress test for local inference runtimes. Result on RTX 5090: VibeVoice 1.5B Audio length: 5
Why it matters
This story from Reddit r/LocalLLaMA is relevant to the Open Source branch of the AI ecosystem and may affect models, products, or research direction.
Technical breakdown
I’m the author of audio.cpp, a C++/ggml runtime for local audio models. I just added VibeVoice 1.5B support and wanted to share the benchmark because long-form multi-speaker TTS is a good stress test for local inference runtimes. Result on RTX 5090: VibeVoice 1.5B Audio length: 5615.73s / 93.60 min Wall time: 1376.84s / 22.95 min RTF: 0.245 Speed: 4.08x faster than real time Python baseline: 92.66
Business impact
Watch for product launches, funding moves, or policy shifts tied to this headline.
