paperarXivTrust 82 · PrimaryPublished yesterdayLive · 8h ago

Show Me Examples: Inferring Visual Concepts from Image Sets

Vision-language models (VLMs) can follow complex textual instructions, yet they struggle to reason from purely visual context. In particular, current models fail to infer shared concepts from sets of example images and apply them to new inputs. We introduce Visual Concept Inference from Sets (VICIS), a task that evaluates this capability. Given a small context set of images sharing a concept and a query image, the model must generate new images that preserve the context-defined concept while remaining consistent with the query. We show that state-of-the-art VLMs perform poorly on this task, of

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorNick Stracke →
Show Me Examples: Inferring Visual Concepts from Image Sets
Linked via arxiv authorKolja Bauer →
Show Me Examples: Inferring Visual Concepts from Image Sets
Linked via arxiv authorStefan Andreas Baumann →
Show Me Examples: Inferring Visual Concepts from Image Sets
Linked via arxiv authorMiguel Angel Bautista →
Show Me Examples: Inferring Visual Concepts from Image Sets
Linked via arxiv authorJosh Susskind →
Show Me Examples: Inferring Visual Concepts from Image Sets
Linked via arxiv authorBjörn Ommer →
Show Me Examples: Inferring Visual Concepts from Image Sets

Implements

repovlm-starter

Has model

modelVioletVision-3B

authored (incoming)

personNick Stracke personKolja Bauer personStefan Andreas Baumann personMiguel Angel Bautista personJosh Susskind personBjörn Ommer

Related across the graph

personStefan Andreas Baumann personBjörn Ommer modelVioletVision-3B personMiguel Angel Bautista personNick Stracke personKolja Bauer personJosh Susskind repovlm-starter

Topics

cs.CV