paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

CoLT: Teaching Multi-Modal Models to Think with Chain of Latent Thoughts

Chain-of-thought (CoT) reasoning has enabled multi-modal large language models (MLLMs) to tackle complex visual reasoning tasks by generating explicit intermediate reasoning steps in natural language. However, this text-based reasoning paradigm is inherently slow at inference time with even thousands of tokens and fundamentally constrained by the expressiveness of natural language. In this paper, we propose CoLT, (Chain of Latent Thoughts), a novel framework that teaches multi-modal models to reason through a chain of latent thought representations instead of verbose text tokens, which can per

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics

cs.CV