Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 19h ago

Towards a Phonology-Informed Evaluation of Multilingual TTS

Neural TTS systems can sound natural across languages, but naturalness does not guarantee the preservation of sound contrasts that distinguish words from their grammatical forms. Standard metrics like MOS do not test for this. We propose a classifier-based framework that audits TTS output against language-specific phonological patterns using human speech as a benchmark. Testing Assamese advanced tongue root (ATR) vowel harmony with Meta's MMS TTS, we show that a classifier trained on human speech transfers to synthesized speech with minimal loss. The faithfulness audit reveals that [+ATR] mid

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

authored (incoming)

Related across the graph

Topics