Read original ↗

paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Dynamo: Dynamic Skill-Tool Evolution for Vision-Language Agents

Improving vision-language models (VLMs) on visual reasoning typically requires retraining or hand-designed prompts and tools. We present Dynamo, a training-free framework that adapts a frozen VLM without any weight updates. On a small labeled training subset, the agent inspects its own correct and incorrect attempts and evolves two complementary capabilities: reusable reasoning skills for cognitive bottlenecks, and executable visual tools for perceptual ones. Each generated tool is paired with a skill that specifies when to invoke it, and both capability types accumulate in a persistent librar

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Implements

repovlm-starter

Has model

modelAgentCore-8B modelVioletVision-3B

Covers

newsIEEE Rolls Out Large Language Models Virtual Training Course

Related to

companyNorthwind AI

Covers (incoming)

newsInto the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning newsHow Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effects

Implements (incoming)

reporoboflow/inference repodigiteinfotech/kairon reporoboflow/supervision repowillyfh/visualtorch

Related across the graph

newsInto the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning repodigiteinfotech/kairon reporoboflow/supervision companyNorthwind AI repowillyfh/visualtorch modelVioletVision-3B modelAgentCore-8B newsIEEE Rolls Out Large Language Models Virtual Training Course newsHow Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effects reporoboflow/inference repovlm-starter

Topics