Read original ↗
paperarXivTrust 82 · PrimaryPublished 2d agoLive · 21h ago

Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

High-throughput RLHF systems often decouple rollout generation from policy optimization, leading to the use of stale rollouts during learner updates. In this work, we study the effect of such staleness in asynchronous GRPO. We make the behavior policy explicit in the GRPO surrogate objective and distinguish between the surrogate-gradient mapping used by the learner and the true total derivative of a distribution-dependent population objective. Under assumptions of local boundedness, distributional smoothness, and behavior-policy smoothness, we show that stale rollouts introduce a per-step surr

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Why these links exist

  • Linked via arxiv authorJingwei Song

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorHaofeng Xu

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorJie Xiao

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorChengke Bao

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorJingwei Shi

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorPengbin Feng

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorWeixun Wang

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorYuhang Han

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorChuan Wu

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorLinfeng Zhang

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

  • Linked via arxiv authorBill Shi

    Staleness-Learning Rate Scaling Laws for Asynchronous RLHF

authored (incoming)

Related across the graph

Topics