Show Me Examples: Inferring Visual Concepts from Image Sets
Vision-language models (VLMs) can follow complex textual instructions, yet they struggle to reason from purely visual context. In particular, current models fail to infer shared concepts from sets of example images and apply them to new inputs. We introduce Visual Concept Inference from Sets (VICIS), a task that evaluates this capability. Given a small context set of images sharing a concept and a query image, the model must generate new images that preserve the context-defined concept while remaining consistent with the query. We show that state-of-the-art VLMs perform poorly on this task, of
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Why these links exist
- Linked via arxiv authorNick Stracke →
Show Me Examples: Inferring Visual Concepts from Image Sets
- Linked via arxiv authorKolja Bauer →
Show Me Examples: Inferring Visual Concepts from Image Sets
- Linked via arxiv authorStefan Andreas Baumann →
Show Me Examples: Inferring Visual Concepts from Image Sets
- Linked via arxiv authorMiguel Angel Bautista →
Show Me Examples: Inferring Visual Concepts from Image Sets
- Linked via arxiv authorJosh Susskind →
Show Me Examples: Inferring Visual Concepts from Image Sets
- Linked via arxiv authorBjörn Ommer →
Show Me Examples: Inferring Visual Concepts from Image Sets
