paperarXivTrust 82 · PrimaryPublished 2d agoLive · yesterday
A Geometric Perspective on Composable Emotion Steering in Text-to-Speech Models
While prior work has explored emotion control in hybrid text-to-speech systems, the geometric properties of these modules, and their implications for steerability, remain poorly understood. We present the first comparative study of speech language model (SLM) and conditional flow-matching (CFM) modules as activation steering sites for mixed emotion speech synthesis. We first characterize emotion representations using linear probing and local intrinsic dimensionality (LID), and then evaluate single-site and joint steering for mixed-emotion synthesis. Our results show that SLM offers a clean, lo
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
