paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
Dynamo: Dynamic Skill-Tool Evolution for Vision-Language Agents
Improving vision-language models (VLMs) on visual reasoning typically requires retraining or hand-designed prompts and tools. We present Dynamo, a training-free framework that adapts a frozen VLM without any weight updates. On a small labeled training subset, the agent inspects its own correct and incorrect attempts and evolves two complementary capabilities: reusable reasoning skills for cognitive bottlenecks, and executable visual tools for perceptual ones. Each generated tool is paired with a skill that specifies when to invoke it, and both capability types accumulate in a persistent librar
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Implements
Has model
Covers
Related to
Covers (incoming)
Implements (incoming)
Related across the graph
newsInto the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuningrepodigiteinfotech/kaironreporoboflow/supervisioncompanyNorthwind AIrepowillyfh/visualtorchmodelVioletVision-3BmodelAgentCore-8BnewsIEEE Rolls Out Large Language Models Virtual Training CoursenewsHow Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effectsreporoboflow/inferencerepovlm-starter
