Read original ↗
paperarXivTrust 82 · PrimaryPublished yesterdayLive · 8h ago

Show Me Examples: Inferring Visual Concepts from Image Sets

Vision-language models (VLMs) can follow complex textual instructions, yet they struggle to reason from purely visual context. In particular, current models fail to infer shared concepts from sets of example images and apply them to new inputs. We introduce Visual Concept Inference from Sets (VICIS), a task that evaluates this capability. Given a small context set of images sharing a concept and a query image, the model must generate new images that preserve the context-defined concept while remaining consistent with the query. We show that state-of-the-art VLMs perform poorly on this task, of

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorNick Stracke

    Show Me Examples: Inferring Visual Concepts from Image Sets

  • Linked via arxiv authorKolja Bauer

    Show Me Examples: Inferring Visual Concepts from Image Sets

  • Linked via arxiv authorStefan Andreas Baumann

    Show Me Examples: Inferring Visual Concepts from Image Sets

  • Linked via arxiv authorMiguel Angel Bautista

    Show Me Examples: Inferring Visual Concepts from Image Sets

  • Linked via arxiv authorJosh Susskind

    Show Me Examples: Inferring Visual Concepts from Image Sets

  • Linked via arxiv authorBjörn Ommer

    Show Me Examples: Inferring Visual Concepts from Image Sets

Implements

Has model

authored (incoming)

Related across the graph

Topics