paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

Reinforcement learning (RL) has become a powerful tool for propelling Large Language Models (LLMs) beyond imitation-based training towards more robust reasoning capabilities. Among existing approaches, RL with Verifiable Rewards (RLVR) has emerged as a pivotal paradigm for advancing LLM reasoning. Despite its empirical success, recent studies have offered different insights. One line of inquiry advocates prioritizing high-entropy token positions during training, while another perspective cautions against allowing low-probability tokens to dominate gradient updates. Notably, although high-entro

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Related to

glossary_termRLHF

Covers

newsRL without TD learning

Implements (incoming)

reporllm-org/rllm

Related across the graph

newsRL without TD learning glossary_termRLHF reporllm-org/rllm

Topics

cs.AI