paperarXivTrust 82 · PrimaryPublished 2d agoLive · yesterday

A Geometric Perspective on Composable Emotion Steering in Text-to-Speech Models

While prior work has explored emotion control in hybrid text-to-speech systems, the geometric properties of these modules, and their implications for steerability, remain poorly understood. We present the first comparative study of speech language model (SLM) and conditional flow-matching (CFM) modules as activation steering sites for mixed emotion speech synthesis. We first characterize emotion representations using linear probing and local intrinsic dimensionality (LID), and then evaluate single-site and joint steering for mixed-emotion synthesis. Our results show that SLM offers a clean, lo

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Has model

modelWhisper-Lite

Related across the graph

modelWhisper-Lite

Topics

cs.LG