repoGitHubTrust 82 · PrimaryPublished yesterdayLive · 21h ago
rllm-org/rllm
Democratizing Reinforcement Learning for LLMs
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
Implements
paperReinforcement Learning without Ground-Truth Solutions can Improve LLMspaperWhich Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal IndexpaperTriadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMspaperTandem Reinforcement Learning with Verifiable RewardspaperIs One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
Related across the graph
paperTriadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMspaperWhich Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal IndexpaperTandem Reinforcement Learning with Verifiable RewardspaperIs One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL TrainingpaperReinforcement Learning without Ground-Truth Solutions can Improve LLMs
