Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 20h ago

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Long-form TV dramas present a formidable challenge for comprehensive video understanding, where deciphering complex storyline often relies on \textbf{speaker recognition}, the task of accurately attributing each spoken utterance to its respective character. In this paper, we advance this field through two primary contributions. (1) We introduce \textbf{DramaSR-532K}, a large-scale benchmark comprising 532K annotated dialogue lines across more than 900 unique characters, necessitating the integration of auditory, linguistic, and visual cues for speaker recognition. (2) We propose \textbf{DramaS

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorYuxuan Li

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorLingxi Xie

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorXinyue Huo

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorJihao Qiu

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorJiacheng Shao

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorPengfei Chen

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorJiannan Ge

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorKaiwen Duan

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

  • Linked via arxiv authorQi Tian

    Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Implements

Has model

authored (incoming)

Related across the graph

Topics