paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
Does Verbose Chain-of-Thought Really Help? In-Distribution Evidence that Content, Not Length, Matters
Chain-of-thought (CoT) prompting improves LLM reasoning, but the source is contested: do the intermediate steps help because they carry useful semantic content, or because conditioning on more tokens buys extra computation before the model commits to an answer? We bring two lines of evidence to bear. First, in distribution: we repeatedly sample each model on the same question and pair a shorter with a longer of its own natural generations that follow the same reasoning plan, so nothing is rewritten and both traces are genuinely in-distribution. Across 25 models the extra tokens leave accuracy
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
