Read original ↗
paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

In collaborative dialogue, shared perception does not guarantee shared interpretation. Mutual understanding must be established through interaction. We investigate whether vision-language models (VLMs) can distinguish what could be shared from what has been shared between dialogue participants through grounding. We formulate this as an interpretation-matching task on 13,077 annotated reference expressions from HCRC MapTask dialogues, and evaluate VLMs under systematically controlled manipulations of dialogue context and map-information access. Our results show that providing authentic map imag

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Has model

Implements

Related across the graph

Topics