paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago
Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization
Why do neural networks memorize algorithmic training data long before they generalize? We present a geometric case study demonstrating that, on tasks where generalization requires discovering structured low-dimensional circuits, the memorization-generalization delay is driven by radial inflation of hidden representations under cross-entropy optimization. We formalize a radial-angular decomposition of activation-space dynamics and derive three testable propositions: (i) that penalizing radial inflation induces anisotropic, data-dependent weight regularization; (ii) that it suppresses radial gra
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
