Read original ↗
paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

Review Residuals: Update-Conditioned Residual Gating for Transformers

Residual connections add every sublayer's proposed update with a fixed coefficient of one; the network never evaluates whether an update is reliable before committing it. Drawing on the human-factors principle of independent verification, we introduce Review Residuals, which scale each update by a learned, input-dependent gate conditioned on both the current state and the proposed update: h_l = h_{l-1} + r_l * u_l with r_l = sigmoid(W[RMSNorm(h_{l-1}), RMSNorm(u_l)]). Conditioning the gate on the update is the property that distinguishes it from prior gated and scaled residuals. We report two

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Topics