paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago

The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling

We introduce the Complexity Ceiling Benchmark (CCB), a controlled evaluation of how language-model reasoning decays as the number of required sequential steps grows. CCB fixes the semantic content of a task and varies only its depth N in {5,...,50} across three structurally distinct regimes: grounded spatial state-tracking, abstract symbolic pointer manipulation, and transitive relational inference. Across 6,000 trials over five frontier and open-weight LLMs we find a consistent pattern of geometric per-step decay with widely separated domain ceilings: on the first two regimes the strongest mo

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsNew benchmark exposes reasoning gaps in top models newsAdaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling newsNew Server Hopes to Break Through AI’s “Memory Wall”

Has model

modelRetrace-1.5B

Related across the graph

newsNew Server Hopes to Break Through AI’s “Memory Wall”modelRetrace-1.5B newsNew benchmark exposes reasoning gaps in top models newsAdaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Topics

cs.AI