paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

High-throughput RLHF systems often decouple rollout generation from policy optimization, leading to the use of stale rollouts during learner updates. In this work, we study the effect of such staleness in asynchronous GRPO. We make the behavior policy explicit in the GRPO surrogate objective and distinguish between the surrogate-gradient mapping used by the learner and the true total derivative of a distribution-dependent population objective. Under assumptions of local boundedness, distributional smoothness, and behavior-policy smoothness, we show that stale rollouts introduce a per-step surr

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

Linked via arxiv authorJingwei Song →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorHaofeng Xu →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorJie Xiao →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorChengke Bao →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorJingwei Shi →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorPengbin Feng →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorWeixun Wang →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorYuhang Han →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorChuan Wu →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorLinfeng Zhang →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF
Linked via arxiv authorBill Shi →
Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

authored (incoming)

personJingwei Song personHaofeng Xu personJie Xiao personChengke Bao personJingwei Shi personPengbin Feng personWeixun Wang personYuhang Han personChuan Wu personLinfeng Zhang personBill Shi

Related across the graph

personPengbin Feng personJie Xiao personLinfeng Zhang personHaofeng Xu personJingwei Song personChengke Bao personChuan Wu personWeixun Wang personBill Shi personYuhang Han personJingwei Shi

Topics

cs.AI