paperarXivTrust 82 · PrimaryPublished 3d agoLive · 2d ago
TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning
Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decis
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
