Read original ↗
paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Dynamo: Dynamic Skill-Tool Evolution for Vision-Language Agents

Improving vision-language models (VLMs) on visual reasoning typically requires retraining or hand-designed prompts and tools. We present Dynamo, a training-free framework that adapts a frozen VLM without any weight updates. On a small labeled training subset, the agent inspects its own correct and incorrect attempts and evolves two complementary capabilities: reusable reasoning skills for cognitive bottlenecks, and executable visual tools for perceptual ones. Each generated tool is paired with a skill that specifies when to invoke it, and both capability types accumulate in a persistent librar

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

Has model

Covers

Related to

Covers (incoming)

Implements (incoming)

Related across the graph

Topics