paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago
Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index
Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RLVR) has emerged as a pivotal paradigm for advancing LLM reasoning. Despite its empirical success, recent studies have offered different insights. One line of inquiry advocates prioritizing high-entropy token positions during training, while another perspective cautions against allowing low-probability tokens to dominate gradient updates. Notably, although high-entro
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
