Read original ↗
paperarXivTrust 82 · PrimaryPublished 7d agoLive · 4d ago

HPRO: Hierarchical Progressive Reward Optimization via Preference Extraction for Emotional Text-to-Speech

Recently, Large Language Model (LLM)-based Text-to-Speech (TTS) models have achieved remarkable naturalness. However, the standard Supervised Fine-Tuning paradigm often converges to statistically averaged prosody, limiting emotional expressiveness. While preference-driven optimization offers a promising alternative, existing approaches suffer from two structural mismatches: information conflict, where content and emotion in a shared latent space produce conflicting gradients, leading to reward hacking and semantic degradation; and scale gap, where sparse sentence-level rewards struggle to guid

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

Related across the graph

Topics