Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words
Time-normalized f0 contours of Mandarin words in conversational speech have been shown to be predictable in part from their contextualized embeddings (CEs). The present study investigates whether CEs also predict spoken word duration for 7470 tokens of Mandarin monosyllabic CV words extracted from a Mandarin corpus of spontaneous speech. We show that CEs indeed are predictive for duration, above chance level, not only at the type level, but also at the level of individual tokens, as indicated by the results obtained with the type-wise and token-wise permutation baselines. We also show that the
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorXiaoyun Jin →
Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words
- Linked via arxiv authorMirjam Ernestus →
Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words
- Linked via arxiv authorR. Harald Baayen →
Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words
