paperarXivTrust 82 · PrimaryPublished yesterdayLive · 7h ago

Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Agentic coding assistants are increasingly given extra capabilities, such as browser based testing tools and design oriented system prompts, on the assumption that more capability yields better software. This study tested that assumption directly. Ninety independent agent runs built the same application, a real time retrospective board, from one detailed specification, each scored on a fixed 14 criterion functional rubric (42 point maximum) and a visual quality review. The runs spanned several model generations, two agent harnesses, two reasoning effort levels, a testing tool, and two design o

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorAchint Mehta →
Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study