paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago

Before Thinking, Learn to Decide: Proactive Routing for Efficient Visual Reasoning

Large multimodal models have achieved strong reasoning on complex visual tasks, but their inference efficiency is often restricted by long chains of thought. A promising solution is to pair a small draft model with a large target model, enabling cooperative inference employing a routing signal that adaptively routes queries to either the draft or target model based on their difficulties for optimal efficiency and accuracy. Yet, the remaining bottleneck is to establish a reliable query difficulty signal under multimodal settings. Existing approaches designed for language models either rely on p

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsNew benchmark exposes reasoning gaps in top models

Has model

modelVioletVision-3B

Covers (incoming)

newsAnyone looking into the new MARS2 Workshop/Competition @ ECCV 2026? I saw Tec-do posting it. [D]

Related across the graph

modelVioletVision-3B newsAnyone looking into the new MARS2 Workshop/Competition @ ECCV 2026? I saw Tec-do posting it. [D]newsNew benchmark exposes reasoning gaps in top models

Topics

cs.CL